Distributed Memory Scaling (FSDP / DeepSpeed ZeRO)
CommunityConfig-driven, portable distributed training.
Authorsovr610
Version1.0.0
Installs0
System Documentation
What problem does it solve?
The skill provides a unified, config-driven approach to running distributed training across multiple backends (DDP, FSDP, and DeepSpeed ZeRO) with portable weights and cross-strategy checkpointing.
Core Features & Use Cases
- StrategyRouter enables same training scripts to switch between DDP, FSDP, and DeepSpeed ZeRO purely via configuration.
- Lightweight wrappers and utilities handle portable full-state dict export, sharded checkpointing, and deterministic DeepSpeed JSON config generation.
- Cross-strategy portability and cross-world-size resume workflows empower scalable experimentation and production training pipelines.
- A scaling benchmark scaffold measures single-GPU vs multi-GPU performance, with a metrics schema compatible with performance regression gates.
Quick Start
Configure StrategyRouter with your chosen strategy, wrap the model, create the optimizer, and run training using a single, config-driven code path.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: Distributed Memory Scaling (FSDP / DeepSpeed ZeRO) Download link: https://github.com/sovr610/refffiy/archive/main.zip#distributed-memory-scaling-fsdp-deepspeed-zero Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.