Name: megatron-fsdp
Availability: InStock
Author: NVIDIA

System Documentation

What problem does it solve?

Megatron-FSDP enables Fully Sharded Data Parallel in Megatron-Bridge to optimize memory usage and training scalability for very large Megatron models.

Core Features & Use Cases

Enable Megatron's FSDP by toggling configuration flags in Bridge: dist.use_megatron_fsdp and ddp.use_megatron_fsdp.
Provide code anchors, validation steps, pitfalls guidance, and a runtime path choice between Megatron FSDP and Torch FSDP2 for robust experimentation.
Suitable for large-scale pretraining and fine-tuning workflows that demand distributed memory management and performance gains.

Quick Start

Set cfg.dist.use_megatron_fsdp and cfg.ddp.use_megatron_fsdp to True, then run the Megatron-Bridge performance harness to verify the setup.

Please help me install this Skill: Name: megatron-fsdp Download link: https://github.com/NVIDIA/skills/archive/main.zip#megatron-fsdp Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

megatron-fsdp

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper