triton-ascend-example-softmax
CommunityHigh-performance softmax reduction on Triton Ascend.
Authorxchang1121
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Provides a complete Triton Ascend softmax reduction implementation example, illustrating how to build a three-phase reduction (max → sum(exp) → normalize) with block tiling and precision strategies to achieve high performance on Ascend hardware.
Core Features & Use Cases
- Three-stage reduction: max, sum(exp), normalize to compute softmax efficiently on Triton Ascend.
- Block tiling and scalar accumulator techniques to improve numerical stability and throughput.
- Use Case: design and benchmark reduce operators for neural networks on Ascend devices, with a ready-to-study kernel and reference PyTorch model.
Quick Start
Run the softmax example to study the Triton Ascend reduction kernel and integrate its patterns into your own reduce operators.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: triton-ascend-example-softmax Download link: https://github.com/xchang1121/AutoResearch-CC-hook/archive/main.zip#triton-ascend-example-softmax Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.