Name: mcapex
Availability: InStock
Author: dongg622

System Documentation

What problem does it solve?

This Skill streamlines the training of large language models by providing optimized fused GPU kernels that accelerate normalization, rotary position embedding, and softmax operations.

Core Features & Use Cases

Training Acceleration: Provides fused kernels such as FusedLayerNorm, FusedRMSNorm, and FusedDense to speed up LLN training workflows.
Framework Compatibility: Supports integration with Megatron-LM and DeepSpeed, enabling seamless deployment in large-scale training setups.
Use Case: Use this Skill to improve training throughput for large transformer models by replacing native PyTorch layers with fused kernels, effectively reducing latency and increasing efficiency.

Quick Start

Use the mcApex skill to speed up your LL model training by replacing standard normalization and attention components with fused GPU kernels.

Please help me install this Skill: Name: mcapex Download link: https://github.com/dongg622/china-ai-chip-skill/archive/main.zip#mcapex Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

mcapex

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper