Write a Triton Dequant Kernel (int4 / int8 → fp16 / bf16)
OfficialDequantize int4/int8 weights in Triton safely.
Authortensormux
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This kernel provides a structured guide to implement a Triton dequantization routine that unpacks and converts quantized int4/int8 weights into fp16 or bf16 tensors for downstream operations, enabling debugging baselines and hot-swaps of quantized weights without relying on fused GEMMs.
Core Features & Use Cases
- Bit-unpacking and codebook support for multiple packing schemes (AWQ, GPTQ, NF4) with per-group scales and zeros.
- Safe arithmetic and validation using fp32 intermediates to prevent overflow and enable round-trip checks against reference Python dequant implementations.
- Use Case: debug a quantized model by exporting dequantized weights for inspection or compare against a fused dequant-GEMM path.
Quick Start
Provide an input weight tile and run the launcher to verify correct dequantization against a reference.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: Write a Triton Dequant Kernel (int4 / int8 → fp16 / bf16) Download link: https://github.com/tensormux/kernel-skills/archive/main.zip#write-a-triton-dequant-kernel-int4-int8-fp16-bf16 Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.