mcapex
CommunityAccelerate large model training with fused GPU kernels.
Software Engineering#model training#GPU acceleration#LLM#DeepSpeed#large models#Megatron#fused kernels
Authordongg622
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill streamlines the training of large language models by providing optimized fused GPU kernels that accelerate normalization, rotary position embedding, and softmax operations.
Core Features & Use Cases
- Training Acceleration: Provides fused kernels such as FusedLayerNorm, FusedRMSNorm, and FusedDense to speed up LLN training workflows.
- Framework Compatibility: Supports integration with Megatron-LM and DeepSpeed, enabling seamless deployment in large-scale training setups.
- Use Case: Use this Skill to improve training throughput for large transformer models by replacing native PyTorch layers with fused kernels, effectively reducing latency and increasing efficiency.
Quick Start
Use the mcApex skill to speed up your LL model training by replacing standard normalization and attention components with fused GPU kernels.
Dependency Matrix
Required Modules
mxmaca
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: mcapex Download link: https://github.com/dongg622/china-ai-chip-skill/archive/main.zip#mcapex Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.