salvage-pdf-to-word
CommunityRebuild messy PDFs into faithful DOCX.
Software Engineering#pdf#docx#form extraction#layout reconstruction#config-driven parsing#vector operators#OCR-alternative
AuthorWkayaobama
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill fixes the common failure mode where converting unstructured, untagged, or form-like PDFs to Word produces garbled reading order and flattened layouts.
Core Features & Use Cases
- Parametric PDF salvage pipeline: Converts a structurally-unreliable PDF into a faithful DOCX by rebuilding layout from vector text/rectangle geometry, rather than trusting PDF tags.
- Config-driven pattern authoring per document family: Lets you define a per-corpus config.json (labels, regexes, thresholds, styles) so the same pipeline works across different document types by swapping the pattern vocabulary.
- Auditable intermediate representations: Writes output.ir.json to make classification and block formation inspectable while you iterate.
- Fast visual + preview feedback loop: Produces ground-truth PNG slices of the rendered page and generates a browser preview from the DOCX for sanity checking.
Quick Start
Use the salvage-pdf-to-word skill to convert your messy, untagged PDF into a structured DOCX while iterating on a per-corpus config until the preview matches the visual slices.
Dependency Matrix
Required Modules
docxmammothpdfjs-distsharppypdfium2Pillow
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: salvage-pdf-to-word Download link: https://github.com/Wkayaobama/wkayaobama-skills/archive/main.zip#salvage-pdf-to-word Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.