docx-reader

Official

Extract Word content quickly and accurately.

AuthorStratio
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Ingest and extract content from Word documents to obtain prose, tables, images, metadata, comments, and tracked changes, enabling faster content analysis, indexing, and governance of documentation.

Core Features & Use Cases

  • Two-mode extraction: quick mode for fast, one-shot outputs with a deterministic fallback to a thorough deep mode when needed.
  • Rich content extraction: text, tables, images, core metadata, and surfaced tracked changes or comments when present.
  • Legacy support: converts older binary .doc files to modern .docx for reliable parsing.
  • Markdown output: produces Markdown-ready results suitable for feeding LLMs and downstream pipelines.
  • Use case: ingest policy documents or contracts into governance workflows with structured outputs.

Quick Start

Run the quick_extract.py script on a DOCX document to obtain a Markdown-formatted summary.

Dependency Matrix

Required Modules

python-docxlxml

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: docx-reader
Download link: https://github.com/Stratio/genai-agents/archive/main.zip#docx-reader

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.