document-pipeline
CommunityIngest documents and build targeted knowledge.
Authorrd162
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Ingests and transforms a mix of documents (PDF, Word, slides), diagrams, and videos into structured fragments and a knowledge base, enabling efficient downstream analysis and knowledge extraction.
Core Features & Use Cases
- Ingestion: convert raw documents, diagrams, and videos into fragments with source-tier metadata and versioned frontmatter.
- Survey: generate targeted knowledge bases and structured sections (§1–§12) from fragments with source traceability and gap/ambiguity catalogs.
- Workflow orchestration: support incremental changes, resume after interruptions, and provide outputs suitable for LLMs.
Quick Start
Run the pipeline to ingest documents into fragments and generate a targeted survey from those fragments.
Dependency Matrix
Required Modules
markitdowndoclingpyvipsPillowpython-pptxopenpyxlpython-docxdocx2pdfPyMuPDFpy7zropenai-whisperscenedetect[opencv]opencv-python-headlesstqdm
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: document-pipeline Download link: https://github.com/rd162/skills/archive/main.zip#document-pipeline Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.