xformers
CommunityAccelerate Transformer research and deployment efficiently.
Authorjstzwj
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides optimized building blocks for Transformer models, reducing development time and improving performance across training and inference tasks.
Core Features & Use Cases
- Memory-Efficient Attention: Enables fast, exact attention computations suitable for large-scale models.
- Structured Sparse Operations: Implements 2:4 sparsity, supporting faster training and inference with reduced memory footprint.
- Research and Deployment: Supplies custom CUDA, Triton kernels, and model parallel layers for cutting-edge Transformer research, including heterogeneous batching and inference acceleration.
- Example Scenario: Use this Skill to replace standard attention with a memory-efficient version in a language model, reducing GPU memory usage and speeding up training.
Quick Start
Use the xformers skill to replace the standard attention with the memory-efficient attention function in your transformer code.
Dependency Matrix
Required Modules
torchtritonscipy
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: xformers Download link: https://github.com/jstzwj/ai-infra-plugins/archive/main.zip#xformers Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.