salvage-pdf-to-word

Community

Rebuild messy PDFs into faithful DOCX.

AuthorWkayaobama
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill fixes the common failure mode where converting unstructured, untagged, or form-like PDFs to Word produces garbled reading order and flattened layouts.

Core Features & Use Cases

  • Parametric PDF salvage pipeline: Converts a structurally-unreliable PDF into a faithful DOCX by rebuilding layout from vector text/rectangle geometry, rather than trusting PDF tags.
  • Config-driven pattern authoring per document family: Lets you define a per-corpus config.json (labels, regexes, thresholds, styles) so the same pipeline works across different document types by swapping the pattern vocabulary.
  • Auditable intermediate representations: Writes output.ir.json to make classification and block formation inspectable while you iterate.
  • Fast visual + preview feedback loop: Produces ground-truth PNG slices of the rendered page and generates a browser preview from the DOCX for sanity checking.

Quick Start

Use the salvage-pdf-to-word skill to convert your messy, untagged PDF into a structured DOCX while iterating on a per-corpus config until the preview matches the visual slices.

Dependency Matrix

Required Modules

docxmammothpdfjs-distsharppypdfium2Pillow

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: salvage-pdf-to-word
Download link: https://github.com/Wkayaobama/wkayaobama-skills/archive/main.zip#salvage-pdf-to-word

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.