Name: pi-ceo-docparser
Availability: InStock
Author: CleanExpo

System Documentation

What problem does it solve?

It removes the manual burden of converting PDFs, DOCX files, or plain text into consistent, page-cited, structured content that downstream workflows can reliably consume.

Core Features & Use Cases

Deterministic extraction into ParsedDoc: Produces a ParsedDoc containing full text, page-separated content, tables (DOCX), and metadata such as title, without any LLM calls.
Failed-soft dependency handling: Tries PyMuPDF first for PDFs, falls back to pypdf if needed, supports DOCX via python-docx, and always supports TXT with stdlib; failures populate doc.error instead of raising.
Page-number citation preservation: Keeps page indices so later research or marketing pipelines can cite sources like “p. 3” precisely.

Quick Start

Ask your pipeline to parse the document at a local path like /path/to/customer-interview.pdf into a ParsedDoc and then read doc.text and doc.pages for downstream analysis.

Please help me install this Skill: Name: pi-ceo-docparser Download link: https://github.com/CleanExpo/Pi-Dev-Ops/archive/main.zip#pi-ceo-docparser Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

pi-ceo-docparser

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper