Name: Distributed Memory Scaling (FSDP / DeepSpeed ZeRO)
Availability: InStock
Author: sovr610

System Documentation

What problem does it solve?

The skill provides a unified, config-driven approach to running distributed training across multiple backends (DDP, FSDP, and DeepSpeed ZeRO) with portable weights and cross-strategy checkpointing.

Core Features & Use Cases

StrategyRouter enables same training scripts to switch between DDP, FSDP, and DeepSpeed ZeRO purely via configuration.
Lightweight wrappers and utilities handle portable full-state dict export, sharded checkpointing, and deterministic DeepSpeed JSON config generation.
Cross-strategy portability and cross-world-size resume workflows empower scalable experimentation and production training pipelines.
A scaling benchmark scaffold measures single-GPU vs multi-GPU performance, with a metrics schema compatible with performance regression gates.

Quick Start

Configure StrategyRouter with your chosen strategy, wrap the model, create the optimizer, and run training using a single, config-driven code path.

Please help me install this Skill: Name: Distributed Memory Scaling (FSDP / DeepSpeed ZeRO) Download link: https://github.com/sovr610/refffiy/archive/main.zip#distributed-memory-scaling-fsdp-deepspeed-zero Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

Distributed Memory Scaling (FSDP / DeepSpeed ZeRO)

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper