triton-ascend-example-softmax

Community

High-performance softmax reduction on Triton Ascend.

Authorxchang1121
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Provides a complete Triton Ascend softmax reduction implementation example, illustrating how to build a three-phase reduction (max → sum(exp) → normalize) with block tiling and precision strategies to achieve high performance on Ascend hardware.

Core Features & Use Cases

  • Three-stage reduction: max, sum(exp), normalize to compute softmax efficiently on Triton Ascend.
  • Block tiling and scalar accumulator techniques to improve numerical stability and throughput.
  • Use Case: design and benchmark reduce operators for neural networks on Ascend devices, with a ready-to-study kernel and reference PyTorch model.

Quick Start

Run the softmax example to study the Triton Ascend reduction kernel and integrate its patterns into your own reduce operators.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: triton-ascend-example-softmax
Download link: https://github.com/xchang1121/AutoResearch-CC-hook/archive/main.zip#triton-ascend-example-softmax

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.