gptq

Community

4-bit quantization for large LLMs on consumer GPUs.

Authorovachiever
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill explains GPTQ post-training 4-bit quantization, enabling deployment of very large models (70B+) on consumer GPUs with reduced memory and competitive accuracy, plus integration with PEFT for QLoRA fine-tuning.

Core Features & Use Cases

  • Memory reduction: 4-bit quantization achieves ~4× memory savings with minimal perplexity degradation.
  • Faster inference: Quantized models run significantly faster than FP16 in many setups.
  • Framework integration: Works with Transformers, AutoGPTQ, and PEFT for efficient fine-tuning.

Quick Start

Load a quantized model with AutoGPTQ, perform a forward pass, and compare performance against FP16.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: gptq
Download link: https://github.com/ovachiever/droid-tings/archive/main.zip#gptq

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.