triton-ascend-case-elemwise-concat

Community

Fuse slice+concat on Ascend to save memory.

Authorxchang1121
Version1.0.0
Installs0

System Documentation

What problem does it solve?

The Slice+Concat fusion case optimizes multi-input fusion kernels by loading only the required input slices and assembling the outputs through precise indexing, eliminating unnecessary intermediate storage and memory traffic.

Core Features & Use Cases

  • Precise slicing: loads only the needed portions of each input (e.g., slices like 128, 32, 48) to form the final output without materializing full tensors.
  • In-kernel concatenation: uses calculated output offsets to write inputs contiguously, avoiding cat-like operations and extra buffers.
  • Use Case: accelerates fusion kernels where several inputs must be sliced and concatenated within a single kernel, reducing bandwidth and latency.

Quick Start

Run the provided fused kernel with the specified input slices to validate the elementwise concat optimization.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: triton-ascend-case-elemwise-concat
Download link: https://github.com/xchang1121/AutoResearch-CC-hook/archive/main.zip#triton-ascend-case-elemwise-concat

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.