text-embeddings-inference
CommunityServe fast embeddings and rerankers locally.
Authorjayll1303
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Deploy and operate a production-capable inference service for text embeddings, re-ranking, and sequence classification so teams can power semantic search and RAG pipelines without relying on external APIs.
Core Features & Use Cases
- HuggingFace TEI Deployment: Guidance for launching TEI Docker images matched to GPU/CPU architectures and CUDA compute capability.
- Embedding & Re-ranking APIs: Use OpenAI-compatible embedding endpoints, batch requests, and re-ranker endpoints for retrieval quality improvements.
- Performance, Monitoring & Air-gapped Support: Tune batching and concurrency for throughput, export Prometheus metrics and OpenTelemetry, and run fully offline with mounted model directories.
- Use Case: Run a local TEI server to produce vectors for a RAG pipeline, tune batch settings for high throughput, and integrate with a vector store for semantic search.
Quick Start
Deploy and launch a local TEI Docker instance for embeddings and test it with a sample input.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: text-embeddings-inference Download link: https://github.com/jayll1303/AIEKit/archive/main.zip#text-embeddings-inference Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.