triton-ascend-fused-operator-optimization

Community

Optimizing fused operators on Ascend NPUs.

Authorxchang1121
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Deeply analyzes and prescribes methods to optimize fused operators on Ascend NPUs, enabling higher throughput and lower latency through structured fusion strategies and performance analysis.

Core Features & Use Cases

  • Two-stage fusion planning: coordinates multi-pass fusion and data-layout refinements for Ascend-compatible kernels.
  • Data access pattern restructuring: reorganizes loads to maximize contiguous memory traffic and minimize bandwidth.
  • Use Case: accelerate elementwise + normalization fusion, softmax+topk fusion, and matmul+activation fusion on Ascend with Triton-based strategies.

Quick Start

Analyze a fused-operator workload on Ascend NPUs and propose a Triton-based optimization plan including analysis, fusion strategies, and evaluation steps.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: triton-ascend-fused-operator-optimization
Download link: https://github.com/xchang1121/AutoResearch-CC-hook/archive/main.zip#triton-ascend-fused-operator-optimization

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.