RAG Ingestion Pipeline Skill
CommunityIncremental production-grade RAG ingestion
Authorayeshakhalid192007-dev
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill removes the complexity of turning large documentation sets (Markdown, HTML, PDF) into a production-ready vector index for retrieval-augmented generation, enabling incremental updates and avoiding full re-indexing.
Core Features & Use Cases
- Incremental Change Detection: Use content hashing to detect new, modified, or deleted files and only re-process changed content.
- Semantic Chunking: Split by semantic boundaries (headers), target token sizes (400–512 tokens) and apply 10–20% overlap to preserve context.
- Batched Embeddings & Vector Uploads: Efficiently embed chunks in batches and upsert them into Qdrant with indexed payloads for filtered retrieval.
- Production Patterns: Includes crawler, frontmatter parser, deterministic chunk IDs, Qdrant-native state tracking, and background ingestion APIs with webhook triggers.
- Use Case: Ideal for course books or technical documentation where modules update frequently and filtered, contextual retrieval is required.
Quick Start
Trigger an incremental ingestion job for the repo docs path to detect changes, chunk semantically, embed in batches, and upsert into a Qdrant collection.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: RAG Ingestion Pipeline Skill Download link: https://github.com/ayeshakhalid192007-dev/humanoid-ai-studio/archive/main.zip#rag-ingestion-pipeline-skill Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.