parse-doc

Community

Turn office docs into usable Markdown

Authorejoongseok
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill removes the manual effort of converting scattered office documents into consistent, searchable Markdown, especially when files contain embedded text, tables, or scanned content.

Core Features & Use Cases

  • Document-to-Markdown conversion by file type: Converts HWP/HWPX/PDF/XLSX/DOCX/PPTX plus legacy XLS/PPT/ODT/ODP/ODS into Markdown saved under parsed output.
  • OCR for scanned PDFs (and image-based documents): Automatically runs OCR when text extraction appears empty, producing page-wise Markdown.
  • Image extraction and multimodal interpretation support: Extracts images from PDFs and PPTX and leaves Markdown references so you can interpret them with Claude’s multimodal reading.
  • Batch and pattern-based parsing: Supports parsing a single file, all files, or matches like *.pdf, with skip behavior for already-parsed outputs.

Quick Start

Place your file in .local.claude/docs/original/ as meeting-notes.pdf, then run /parse-doc meeting-notes.pdf to generate .local.claude/docs/parsed/meeting-notes.md.

Dependency Matrix

Required Modules

pdf2imagepytesseractPyMuPDFpptxxlrd

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: parse-doc
Download link: https://github.com/ejoongseok/claude-settings/archive/main.zip#parse-doc

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.