converter-pdf
CommunityTurn judicial PDFs into searchable text
Authorgeorgemarmelstein
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill converts judicial PDF documents into TXT using OCR for scanned files, removing typical PJe noise so the extracted text becomes usable for downstream analysis.
Core Features & Use Cases
- OCR-based conversion (default): Uses OCR (Tesseract) for scanned judicial PDFs to produce page-preserving, cleaned TXT output.
- Digital extraction mode: Supports a faster path for native digital PDFs using pdfplumber, with fallback to OCR when needed.
- PJe-specific text cleaning & metrics: Removes recurring headers/footers and PJe pollution patterns, and reports quality stats such as pages, character counts, and reduction percentage.
Quick Start
Ask the Skill to convert a judicial PDF into TXT by running the existing script in OCR mode and saving the results to your chosen output directory.
Dependency Matrix
Required Modules
pdfplumberpdf2imagepytesseractPyPDF2
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: converter-pdf Download link: https://github.com/georgemarmelstein/sistema-marmelstein/archive/main.zip#converter-pdf Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.