auto-ingest

Community

Continuously enrich arXiv PDFs in the background.

Authorthistleknot
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Auto-ingest removes the manual bottleneck of getting complete, extracted methods from top arXiv papers so retrieval sessions can produce higher-quality answers without waiting on one-off PDF processing.

Core Features & Use Cases

  • Background PDF enrichment pipeline: Runs Phase 3–5 enrichment (VLM description, description reinsertion, and methods extraction) for the arxiv_rag corpus using a long-running daemon.
  • Queue + MCP-based inspection: Provides an MCP server to check enrichment status, queue papers, list errors, and fetch extracted _methods.md without direct filesystem access.
  • Completion-aware extraction: Detects completion via papers/post_processed/<stem>_methods.md, supporting both non-blocking workflows and on-demand eager extraction paths.

Quick Start

Start the background daemon with ingest_daemon.py, then call the MCP tool to queue your top paper IDs for automatic methods extraction.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: auto-ingest
Download link: https://github.com/thistleknot/skills/archive/main.zip#auto-ingest

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.