Name: Write a Triton Dequant Kernel (int4 / int8 → fp16 / bf16)
Availability: InStock
Author: tensormux

System Documentation

What problem does it solve?

This kernel provides a structured guide to implement a Triton dequantization routine that unpacks and converts quantized int4/int8 weights into fp16 or bf16 tensors for downstream operations, enabling debugging baselines and hot-swaps of quantized weights without relying on fused GEMMs.

Core Features & Use Cases

Bit-unpacking and codebook support for multiple packing schemes (AWQ, GPTQ, NF4) with per-group scales and zeros.
Safe arithmetic and validation using fp32 intermediates to prevent overflow and enable round-trip checks against reference Python dequant implementations.
Use Case: debug a quantized model by exporting dequantized weights for inspection or compare against a fused dequant-GEMM path.

Quick Start

Provide an input weight tile and run the launcher to verify correct dequantization against a reference.

Please help me install this Skill: Name: Write a Triton Dequant Kernel (int4 / int8 → fp16 / bf16) Download link: https://github.com/tensormux/kernel-skills/archive/main.zip#write-a-triton-dequant-kernel-int4-int8-fp16-bf16 Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

Write a Triton Dequant Kernel (int4 / int8 → fp16 / bf16)

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper