RAG Ingestion Pipeline Skill

Community

Incremental production-grade RAG ingestion

Authorayeshakhalid192007-dev
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill removes the complexity of turning large documentation sets (Markdown, HTML, PDF) into a production-ready vector index for retrieval-augmented generation, enabling incremental updates and avoiding full re-indexing.

Core Features & Use Cases

  • Incremental Change Detection: Use content hashing to detect new, modified, or deleted files and only re-process changed content.
  • Semantic Chunking: Split by semantic boundaries (headers), target token sizes (400–512 tokens) and apply 10–20% overlap to preserve context.
  • Batched Embeddings & Vector Uploads: Efficiently embed chunks in batches and upsert them into Qdrant with indexed payloads for filtered retrieval.
  • Production Patterns: Includes crawler, frontmatter parser, deterministic chunk IDs, Qdrant-native state tracking, and background ingestion APIs with webhook triggers.
  • Use Case: Ideal for course books or technical documentation where modules update frequently and filtered, contextual retrieval is required.

Quick Start

Trigger an incremental ingestion job for the repo docs path to detect changes, chunk semantically, embed in batches, and upsert into a Qdrant collection.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: RAG Ingestion Pipeline Skill
Download link: https://github.com/ayeshakhalid192007-dev/humanoid-ai-studio/archive/main.zip#rag-ingestion-pipeline-skill

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.