docx-reader
OfficialExtract Word content quickly and accurately.
AuthorStratio
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Ingest and extract content from Word documents to obtain prose, tables, images, metadata, comments, and tracked changes, enabling faster content analysis, indexing, and governance of documentation.
Core Features & Use Cases
- Two-mode extraction: quick mode for fast, one-shot outputs with a deterministic fallback to a thorough deep mode when needed.
- Rich content extraction: text, tables, images, core metadata, and surfaced tracked changes or comments when present.
- Legacy support: converts older binary .doc files to modern .docx for reliable parsing.
- Markdown output: produces Markdown-ready results suitable for feeding LLMs and downstream pipelines.
- Use case: ingest policy documents or contracts into governance workflows with structured outputs.
Quick Start
Run the quick_extract.py script on a DOCX document to obtain a Markdown-formatted summary.
Dependency Matrix
Required Modules
python-docxlxml
Components
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: docx-reader Download link: https://github.com/Stratio/genai-agents/archive/main.zip#docx-reader Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.