biren-megatron-lm-br
CommunityDistributed training framework for large AI models.
Authordongg622
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill streamlines the setup and execution of large-scale distributed training for AI models like Llama2-70B, automating complex configuration steps and environment setup.
Core Features & Use Cases
- Model Pretraining and Fine-tuning: Supports training large models across single or multiple GPUs with TP/PP parallel strategies.
- Distributed Strategy Customization: Enables users to choose and configure TP, PP, and DP parallelization for optimized performance.
- Use Case: An AI researcher wants to pretrain a 70-billion-parameter model across 8 GPUs, utilizing TP4 and PP2 strategies for efficient training.
Quick Start
Set up your environment with the SDK and run the provided training scripts to initiate distributed training on your clusters.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: biren-megatron-lm-br Download link: https://github.com/dongg622/china-ai-chip-skill/archive/main.zip#biren-megatron-lm-br Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.