split-batch
CommunitySplit one batch PDF into separate documents.
Personal & Entrepreneur#human-in-the-loop#pdf splitting#ocr post-processing#document boundaries#paperless workflow#batch scanning#content-derived filenames
Authorxxthunder
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill turns a single multi-document PDF (such as a batch of scanned paperwork) into individual PDFs by detecting document boundaries from OCR-extracted text, so you can organize, name, and process each document separately.
Core Features & Use Cases
- Detect document boundaries from searchable-page content (page-number resets, letterhead/sender changes, and address-block transitions), or from blank separator sheets in opt-in mode.
- Consent-gated refinement when rule confidence is medium/ambiguous: it can ask an LLM to reconcile boundaries with the extracted OCR text only when permitted.
- Human-in-the-loop split map: proposes a split, lets you merge/split/edit boundaries, then emits one PDF per document.
- Post-processing only: it does not scan or OCR; it requires a searchable (OCR’d) PDF.
Quick Start
Ask the agent to split your batch by proposing boundaries from the OCR text in your input PDF and then outputting one PDF per detected document after you confirm.
Dependency Matrix
Required Modules
pypdf
Components
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: split-batch Download link: https://github.com/xxthunder/xxthunder-agentic-skills/archive/main.zip#split-batch Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.