text-embeddings-inference

Community

Serve fast embeddings and rerankers locally.

Authorjayll1303
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Deploy and operate a production-capable inference service for text embeddings, re-ranking, and sequence classification so teams can power semantic search and RAG pipelines without relying on external APIs.

Core Features & Use Cases

  • HuggingFace TEI Deployment: Guidance for launching TEI Docker images matched to GPU/CPU architectures and CUDA compute capability.
  • Embedding & Re-ranking APIs: Use OpenAI-compatible embedding endpoints, batch requests, and re-ranker endpoints for retrieval quality improvements.
  • Performance, Monitoring & Air-gapped Support: Tune batching and concurrency for throughput, export Prometheus metrics and OpenTelemetry, and run fully offline with mounted model directories.
  • Use Case: Run a local TEI server to produce vectors for a RAG pipeline, tune batch settings for high throughput, and integrate with a vector store for semantic search.

Quick Start

Deploy and launch a local TEI Docker instance for embeddings and test it with a sample input.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: text-embeddings-inference
Download link: https://github.com/jayll1303/AIEKit/archive/main.zip#text-embeddings-inference

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.