auto-ingest
CommunityContinuously enrich arXiv PDFs in the background.
Education & Research#mcp server#arxiv#pdf enrichment#retrieval augmentation#methods extraction#sqlite checkpointing#vlm processing
Authorthistleknot
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Auto-ingest removes the manual bottleneck of getting complete, extracted methods from top arXiv papers so retrieval sessions can produce higher-quality answers without waiting on one-off PDF processing.
Core Features & Use Cases
- Background PDF enrichment pipeline: Runs Phase 3–5 enrichment (VLM description, description reinsertion, and methods extraction) for the arxiv_rag corpus using a long-running daemon.
- Queue + MCP-based inspection: Provides an MCP server to check enrichment status, queue papers, list errors, and fetch extracted
_methods.mdwithout direct filesystem access. - Completion-aware extraction: Detects completion via
papers/post_processed/<stem>_methods.md, supporting both non-blocking workflows and on-demand eager extraction paths.
Quick Start
Start the background daemon with ingest_daemon.py, then call the MCP tool to queue your top paper IDs for automatic methods extraction.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: auto-ingest Download link: https://github.com/thistleknot/skills/archive/main.zip#auto-ingest Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.