triton-ascend-case-elemwise-concat
CommunityFuse slice+concat on Ascend to save memory.
Authorxchang1121
Version1.0.0
Installs0
System Documentation
What problem does it solve?
The Slice+Concat fusion case optimizes multi-input fusion kernels by loading only the required input slices and assembling the outputs through precise indexing, eliminating unnecessary intermediate storage and memory traffic.
Core Features & Use Cases
- Precise slicing: loads only the needed portions of each input (e.g., slices like 128, 32, 48) to form the final output without materializing full tensors.
- In-kernel concatenation: uses calculated output offsets to write inputs contiguously, avoiding cat-like operations and extra buffers.
- Use Case: accelerates fusion kernels where several inputs must be sliced and concatenated within a single kernel, reducing bandwidth and latency.
Quick Start
Run the provided fused kernel with the specified input slices to validate the elementwise concat optimization.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: triton-ascend-case-elemwise-concat Download link: https://github.com/xchang1121/AutoResearch-CC-hook/archive/main.zip#triton-ascend-case-elemwise-concat Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.