corpus-harvester

Community

Grow a local corpus from diverse sources.

Authorrodgemd1-lgtm
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Build and expand a large local corpus from binary documents, repositories, crawls, locators, or generated extraction outputs, enabling centralized discovery, indexing, and reuse.

Core Features & Use Cases

  • Local-first harvesting to minimize remote fetches and improve resilience
  • Provenance tagging and namespace separation for source trust and traceability
  • Deterministic inventories, extraction plans, and manifest summaries for reproducible work
  • Locator file generation and robust ingestion paths to streamline data intake
  • Comprehensive failure logs and resumable manifests to recover from interruptions

Quick Start

Perform a local harvest of your document collection to seed the corpus.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: corpus-harvester
Download link: https://github.com/rodgemd1-lgtm/Startup-Intelligence-OS/archive/main.zip#corpus-harvester

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.