llama-cpp-inference
CommunityRun GGUF models locally with llama.cpp
Software Engineering#inference#gguf#llama.cpp#gpu-offload#continuous-batching#llama-server#llama-cpp-python
Authorjayll1303
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill removes the complexity of running GGUF-format large language models locally by providing clear patterns for launching servers, using CLIs, embedding Python bindings, configuring GPU offload, and tuning multi-user batching so models run reliably and efficiently on desktop and server hardware.
Core Features & Use Cases
- OpenAI-compatible serving: Launch llama-server with health checks, embeddings, chat/completion endpoints and slot monitoring for multi-user deployments.
- CLI and programmatic inference: Use llama-cli for interactive/batch runs and llama-cpp-python bindings to embed GGUF models inside Python applications.
- GPU and backend configuration: Build and run with CUDA, Metal, or Vulkan backends, set n_gpu_layers, flash-attn, and tune context and parallel slots for throughput or low-memory setups.
- Performance tuning & troubleshooting: Guidance for context sizing, continuous batching, speculative decoding, OOM recovery, CUDA detection, and model format validation.
- Use Case: Deploy a local OpenAI-compatible GGUF server for development, perform low-latency interactive chat on a laptop with Metal/CUDA, or run multi-user inference with continuous batching in a small team.
Quick Start
Launch a local GGUF model server using llama-server with the model path, appropriate context size, and -ngl set to control GPU offload.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: llama-cpp-inference Download link: https://github.com/jayll1303/AIEKit/archive/main.zip#llama-cpp-inference Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.