triton-ascend-case-reduction-mean-large

Community

Large-scale mean-reduction with Triton on Ascend.

Authorxchang1121
Version1.0.0
Installs0

System Documentation

What problem does it solve?

这项技能通过在 Ascend 后端的 Triton 框架中对大规模均值归约进行行级二次切分优化,降低每个核的线程块数量并在内核内部进行二次切分以避免 UB,从而提升归约性能。

Core Features & Use Cases

  • 行级二次切分优化:通过跨多行计算来减少线程块数量,并在内核内对行进行二次切分以提升缓存命中与带宽利用率。
  • 自适应网格与自动调优:在 grid=40 的配置下,探索不同 BLOCK_SIZE_M / SUB_BLOCK_SIZE_M 的组合以达到最优性能。
  • Use Case:在二维归约场景中,非 reduce 轴中等、reduce 轴较大时可以获得显著的性能提升。

Quick Start

只需将待优化的归约核替换为本技能中的对照实现即可开始基线评测。

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: triton-ascend-case-reduction-mean-large
Download link: https://github.com/xchang1121/AutoResearch-CC-hook/archive/main.zip#triton-ascend-case-reduction-mean-large

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.