Write a Triton Dequant Kernel (int4 / int8 → fp16 / bf16)

Official

Dequantize int4/int8 weights in Triton safely.

Authortensormux
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This kernel provides a structured guide to implement a Triton dequantization routine that unpacks and converts quantized int4/int8 weights into fp16 or bf16 tensors for downstream operations, enabling debugging baselines and hot-swaps of quantized weights without relying on fused GEMMs.

Core Features & Use Cases

  • Bit-unpacking and codebook support for multiple packing schemes (AWQ, GPTQ, NF4) with per-group scales and zeros.
  • Safe arithmetic and validation using fp32 intermediates to prevent overflow and enable round-trip checks against reference Python dequant implementations.
  • Use Case: debug a quantized model by exporting dequantized weights for inspection or compare against a fused dequant-GEMM path.

Quick Start

Provide an input weight tile and run the launcher to verify correct dequantization against a reference.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: Write a Triton Dequant Kernel (int4 / int8 → fp16 / bf16)
Download link: https://github.com/tensormux/kernel-skills/archive/main.zip#write-a-triton-dequant-kernel-int4-int8-fp16-bf16

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.