Name: converter-pdf
Availability: InStock
Author: georgemarmelstein

System Documentation

What problem does it solve?

This Skill converts judicial PDF documents into TXT using OCR for scanned files, removing typical PJe noise so the extracted text becomes usable for downstream analysis.

Core Features & Use Cases

OCR-based conversion (default): Uses OCR (Tesseract) for scanned judicial PDFs to produce page-preserving, cleaned TXT output.
Digital extraction mode: Supports a faster path for native digital PDFs using pdfplumber, with fallback to OCR when needed.
PJe-specific text cleaning & metrics: Removes recurring headers/footers and PJe pollution patterns, and reports quality stats such as pages, character counts, and reduction percentage.

Quick Start

Ask the Skill to convert a judicial PDF into TXT by running the existing script in OCR mode and saving the results to your chosen output directory.

Please help me install this Skill: Name: converter-pdf Download link: https://github.com/georgemarmelstein/sistema-marmelstein/archive/main.zip#converter-pdf Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

converter-pdf

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper