pytorch-fsdp2

Official

Scale PyTorch training with FSDP2.

AuthorOrchestra-Research
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill enables the correct integration of PyTorch FSDP2 (Fully Sharded Data Parallelism version 2) into training scripts, addressing challenges with large models that exceed single-GPU memory and optimizing distributed training performance.

Core Features & Use Cases

  • Model Parallelism: Distributes model parameters, gradients, and optimizer states across multiple GPUs and nodes.
  • Memory Optimization: Reduces peak memory usage per GPU, allowing for training of larger models.
  • DTensor Integration: Leverages PyTorch's DTensor for more flexible and inspectable sharding.
  • Distributed Checkpointing: Integrates with PyTorch's Distributed Checkpoint (DCP) for robust saving and loading of distributed states.
  • Use Case: Training a multi-billion parameter language model that cannot fit into the memory of a single GPU.

Quick Start

Integrate PyTorch FSDP2 into your existing training script by following the step-by-step procedure outlined in the skill's documentation.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: pytorch-fsdp2
Download link: https://github.com/Orchestra-Research/AI-Research-SKILLs/archive/main.zip#pytorch-fsdp2

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.