triton-ascend-fused-operator-optimization
CommunityOptimizing fused operators on Ascend NPUs.
Authorxchang1121
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Deeply analyzes and prescribes methods to optimize fused operators on Ascend NPUs, enabling higher throughput and lower latency through structured fusion strategies and performance analysis.
Core Features & Use Cases
- Two-stage fusion planning: coordinates multi-pass fusion and data-layout refinements for Ascend-compatible kernels.
- Data access pattern restructuring: reorganizes loads to maximize contiguous memory traffic and minimize bandwidth.
- Use Case: accelerate elementwise + normalization fusion, softmax+topk fusion, and matmul+activation fusion on Ascend with Triton-based strategies.
Quick Start
Analyze a fused-operator workload on Ascend NPUs and propose a Triton-based optimization plan including analysis, fusion strategies, and evaluation steps.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: triton-ascend-fused-operator-optimization Download link: https://github.com/xchang1121/AutoResearch-CC-hook/archive/main.zip#triton-ascend-fused-operator-optimization Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.