Name: nemo-mbridge-perf-moe-optimization-workflow
Availability: InStock
Author: sayalinvidia

System Documentation

What problem does it solve?

This workflow provides a structured approach to diagnose and improve throughput for MoE training on Megatron Bridge, addressing memory, communication, and compute bottlenecks.

Core Features & Use Cases

Phase-driven optimization using the Three Walls framework to guide memory, communication, and compute improvements.
Parallel Folding decouples attention and MoE parallelism, enabling scalable multi-GPU configurations.
Dispatcher selection, FP8 mapping guidance, and CUDA graph considerations to accelerate MoE workloads across hardware (e.g., Hopper and Blackwell).
Use cases include diagnosing throughput regressions after commits and performing end-to-end MoE throughput tuning sweeps.

Quick Start

Initiate a three-phase MoE optimization by guiding the agent through fit, scale, profile, and retune steps using the Parallel Folding meshes and recommended dispatcher and FP8 mappings.

Please help me install this Skill: Name: nemo-mbridge-perf-moe-optimization-workflow Download link: https://github.com/sayalinvidia/sayali-skills-test/archive/main.zip#nemo-mbridge-perf-moe-optimization-workflow Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

nemo-mbridge-perf-moe-optimization-workflow

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper