biren-megatron-lm-br

Community

Distributed training framework for large AI models.

Authordongg622
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill streamlines the setup and execution of large-scale distributed training for AI models like Llama2-70B, automating complex configuration steps and environment setup.

Core Features & Use Cases

  • Model Pretraining and Fine-tuning: Supports training large models across single or multiple GPUs with TP/PP parallel strategies.
  • Distributed Strategy Customization: Enables users to choose and configure TP, PP, and DP parallelization for optimized performance.
  • Use Case: An AI researcher wants to pretrain a 70-billion-parameter model across 8 GPUs, utilizing TP4 and PP2 strategies for efficient training.

Quick Start

Set up your environment with the SDK and run the provided training scripts to initiate distributed training on your clusters.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: biren-megatron-lm-br
Download link: https://github.com/dongg622/china-ai-chip-skill/archive/main.zip#biren-megatron-lm-br

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.