awq-quantization
OfficialCompress LLMs for faster, cheaper inference.
Software Engineering#memory optimization#quantization#4-bit#inference optimization#llm compression#awq
AuthorOrchestra-Research
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill addresses the challenge of deploying large language models (LLMs) on hardware with limited memory and computational resources by applying advanced 4-bit quantization techniques.
Core Features & Use Cases
- 4-bit Quantization: Reduces model size and memory footprint significantly with minimal accuracy loss.
- Faster Inference: Achieves up to 3x speedup in inference times compared to FP16 models.
- Use Case: Deploying a 70B parameter LLM on a single GPU for real-time chat applications or content generation where memory and speed are critical constraints.
Quick Start
Use the awq-quantization skill to quantize the 'mistralai/Mistral-7B-Instruct-v0.2' model to 4-bit precision.
Dependency Matrix
Required Modules
autoawqtransformerstorch
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: awq-quantization Download link: https://github.com/Orchestra-Research/AI-Research-SKILLs/archive/main.zip#awq-quantization Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.