Name: triton-ascend-case-reduction-amin-large
Availability: InStock
Author: xchang1121

System Documentation

What problem does it solve?

The optimized approach for extremely large-scale 1D amin reductions, using two-stage tiling and compute reorganization to minimize UB usage and reduce the total number of reduction steps, achieving near-peak performance on Ascend AI cores.

Core Features & Use Cases

Two-stage tiling to prevent UB overflow and improve cache efficiency.
Compute reorganization to minimize reduction steps and maximize throughput for 1D data with millions of elements.
Autotune configurations to approach optimal grid settings (e.g., grid=32) and minimize latency on Ascend AI cores.
Use Case: Efficiently reduce amin across 4 million-element vectors in downstream AI workloads.

Quick Start

Run the autotuned Triton kernel for amin reduction on a 4194304-element 1D array using the provided configs.

Please help me install this Skill: Name: triton-ascend-case-reduction-amin-large Download link: https://github.com/xchang1121/AutoResearch-CC-hook/archive/main.zip#triton-ascend-case-reduction-amin-large Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

triton-ascend-case-reduction-amin-large

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper