triton-operator-performance-optim

Official

Tune Triton operators for Ascend NPU performance.

AuthorAscend
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Optimizes Triton operator performance on Ascend NPU by addressing UB overflow and optimizing tiling strategies.

Core Features & Use Cases

  • Phase-driven optimization covering algorithm review, bottleneck diagnosis, hardware-specific tuning, and verification for general input shapes.
  • tiling-aware techniques to maximize Cube utilization and minimize UB footprint across kernel blocks.
  • Use Case: For Ascend NPU deployments running Triton operators, apply structured optimization to achieve stable accuracy with improved throughput.

Quick Start

Profile the Triton kernel on Ascend and apply tiling-guided optimizations to maximize Cube utilization.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: triton-operator-performance-optim
Download link: https://github.com/Ascend/agent-skills/archive/main.zip#triton-operator-performance-optim

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.