doc2kb
CommunityConvert mixed docs into an LLM-ready KB
Education & Research#knowledge base#document ingestion#local-first#RAG preparation#PDF conversion#DOCX/PPTX parsing#LLM indexing
Authorzevtos
Version1.0.0
Installs0
System Documentation
What problem does it solve?
doc2kb turns a folder of heterogeneous documents into an LLM-optimized knowledge base that can be ingested in a separate Claude/Codex session, without losing content via summarization.
Core Features & Use Cases
- Structured, per-source extraction (no summarization): Keeps document content verbatim and stores each source as its own Markdown file with YAML frontmatter for indexing and citation.
- Mixed-format corpus support: Converts PDFs, DOCX, PPTX, HTML, Markdown, TXT, and Jupyter notebooks into a unified KB layout (with manifests and navigation).
- PDF robustness for math-heavy/scanned cases: Detects problematic PDFs (ligature issues, mangled visual math, dropped figures) and can optionally use a VLM-grade MinerU tier for recovery.
- Local-first ingestion workflow: Produces manifest.json, INDEX.md, llms.txt, and AGENTS.md so an AI can selectively read only relevant documents via filenames/headings.
Quick Start
Ask the agent to ingest and index a mixed document folder by running doc2kb on your corpus path so it outputs a ready-to-ingest knowledge base.
Dependency Matrix
Required Modules
pymupdf4llmpdfplumberpypdfpikepdfpython-magicpython-docxmammothpython-pptxopenpyxltrafilaturamarkdownifycharset-normalizerPillowtiktokenlxml
Components
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: doc2kb Download link: https://github.com/zevtos/agentpipe/archive/main.zip#doc2kb Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.