Name: vllm-quantization
Availability: InStock
Author: air-gapped

System Documentation

What problem does it solve?

This skill enables efficient quantization workflows for vLLM deployments across Hopper, Blackwell, and ROCm fleets, reducing memory and compute with validated formats like FP8, NVFP4, and MXFP4, plus integration with llm-compressor and NVIDIA ModelOpt.

Core Features & Use Cases

Supports multiple production quantization paths (FP8, NVFP4, MXFP4, and online quantization) and guidance for model export pipelines.
Provides end-to-end coverage from PTQ to deployment, including kv cache options and MoE support across vendors.
Use cases include preparing 70B-class models for datacenter inference, upgrading existing Qwen3/Qi models, and enabling offline model sharing with vendor formats.

Quick Start

Quantize a 70B Hopper model using FP8_DYNAMIC with llm-compressor and load it into vLLM.

Please help me install this Skill: Name: vllm-quantization Download link: https://github.com/air-gapped/skills/archive/main.zip#vllm-quantization Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

vllm-quantization

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper