xformers

Community

Accelerate Transformer research and deployment efficiently.

Authorjstzwj
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides optimized building blocks for Transformer models, reducing development time and improving performance across training and inference tasks.

Core Features & Use Cases

  • Memory-Efficient Attention: Enables fast, exact attention computations suitable for large-scale models.
  • Structured Sparse Operations: Implements 2:4 sparsity, supporting faster training and inference with reduced memory footprint.
  • Research and Deployment: Supplies custom CUDA, Triton kernels, and model parallel layers for cutting-edge Transformer research, including heterogeneous batching and inference acceleration.
  • Example Scenario: Use this Skill to replace standard attention with a memory-efficient version in a language model, reducing GPU memory usage and speeding up training.

Quick Start

Use the xformers skill to replace the standard attention with the memory-efficient attention function in your transformer code.

Dependency Matrix

Required Modules

torchtritonscipy

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: xformers
Download link: https://github.com/jstzwj/ai-infra-plugins/archive/main.zip#xformers

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.