doc2kb

Community

Convert mixed docs into an LLM-ready KB

Authorzevtos
Version1.0.0
Installs0

System Documentation

What problem does it solve?

doc2kb turns a folder of heterogeneous documents into an LLM-optimized knowledge base that can be ingested in a separate Claude/Codex session, without losing content via summarization.

Core Features & Use Cases

  • Structured, per-source extraction (no summarization): Keeps document content verbatim and stores each source as its own Markdown file with YAML frontmatter for indexing and citation.
  • Mixed-format corpus support: Converts PDFs, DOCX, PPTX, HTML, Markdown, TXT, and Jupyter notebooks into a unified KB layout (with manifests and navigation).
  • PDF robustness for math-heavy/scanned cases: Detects problematic PDFs (ligature issues, mangled visual math, dropped figures) and can optionally use a VLM-grade MinerU tier for recovery.
  • Local-first ingestion workflow: Produces manifest.json, INDEX.md, llms.txt, and AGENTS.md so an AI can selectively read only relevant documents via filenames/headings.

Quick Start

Ask the agent to ingest and index a mixed document folder by running doc2kb on your corpus path so it outputs a ready-to-ingest knowledge base.

Dependency Matrix

Required Modules

pymupdf4llmpdfplumberpypdfpikepdfpython-magicpython-docxmammothpython-pptxopenpyxltrafilaturamarkdownifycharset-normalizerPillowtiktokenlxml

Components

scriptsreferencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: doc2kb
Download link: https://github.com/zevtos/agentpipe/archive/main.zip#doc2kb

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.