corpus-harvester
CommunityGrow a local corpus from diverse sources.
Authorrodgemd1-lgtm
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Build and expand a large local corpus from binary documents, repositories, crawls, locators, or generated extraction outputs, enabling centralized discovery, indexing, and reuse.
Core Features & Use Cases
- Local-first harvesting to minimize remote fetches and improve resilience
- Provenance tagging and namespace separation for source trust and traceability
- Deterministic inventories, extraction plans, and manifest summaries for reproducible work
- Locator file generation and robust ingestion paths to streamline data intake
- Comprehensive failure logs and resumable manifests to recover from interruptions
Quick Start
Perform a local harvest of your document collection to seed the corpus.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: corpus-harvester Download link: https://github.com/rodgemd1-lgtm/Startup-Intelligence-OS/archive/main.zip#corpus-harvester Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.