gluon-lds-opt
CommunityResolve LDS bank conflicts in Gluon GEMM kernels.
System Documentation
What problem does it solve?
Fix LDS (Local Data Share) bank conflicts in a Gluon GEMM kernel that loads tiles into shared memory via async_copy or buffer_load. Symptoms: high SQ_LDS_BANK_CONFLICT hardware counter, high-cycle s_waitcnt lgkmcnt(0) before MFMA in ATT traces, or ds_read instructions on the critical path in the amdgcn ISA. Two strategies: (1) swizzling — change SwizzledSharedLayout parameters from trivial (1,1,1) to bank-conflict-free (8,1,8); (2) padding — use PaddedSharedLayout with DistributedLinearLayout for global loads. Bank conflicts can reduce LDS throughput by 8–32x and dominate kernel runtime. Applies to both CDNA3 (gfx942) and CDNA4 (gfx950). Use /lds-bank-conflict to measure conflicts before and after. Trigger for any mention of LDS bank conflicts, ds_read stalls, lgkmcnt stalls, or SwizzledSharedLayout in a Gluon kernel.
Core Features & Use Cases
- Swizzle-based mitigation: convert trivial layouts to bank-conflict-free SwizzledSharedLayout parameters tuned for CDNA3/4.
- Padding-based mitigation: switch to DistributedLinearLayout + PaddedSharedLayout to explicitly control data placement.
- Profiling guidance: measure SQ_LDS_BANK_CONFLICT with /kernel-perf-analysis and inspect ATT traces for ds_read stalls.
Quick Start
Apply bank-conflict-free SwizzledSharedLayout settings (e.g., SwizzledSharedLayout(8, 2, 8, order=[1, 0])) to your Gluon GEMM tiles and profile the LDS performance.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: gluon-lds-opt Download link: https://github.com/leonling-ll/claude-skills/archive/main.zip#gluon-lds-opt Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.