gptq
Community4-bit quantization for large LLMs on consumer GPUs.
Authorovachiever
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill explains GPTQ post-training 4-bit quantization, enabling deployment of very large models (70B+) on consumer GPUs with reduced memory and competitive accuracy, plus integration with PEFT for QLoRA fine-tuning.
Core Features & Use Cases
- Memory reduction: 4-bit quantization achieves ~4× memory savings with minimal perplexity degradation.
- Faster inference: Quantized models run significantly faster than FP16 in many setups.
- Framework integration: Works with Transformers, AutoGPTQ, and PEFT for efficient fine-tuning.
Quick Start
Load a quantized model with AutoGPTQ, perform a forward pass, and compare performance against FP16.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: gptq Download link: https://github.com/ovachiever/droid-tings/archive/main.zip#gptq Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.