media-ocr-ai
CommunityMultimodel OCR for text, layout, handwriting.
Authordamionrashford
Version1.0.0
Installs0
System Documentation
What problem does it solve?
OCR can be slow and error-prone when handling diverse documents, languages, and formats. This skill provides a unified, offline OCR workflow that lets you choose among multiple open-source backends (PaddleOCR, EasyOCR, Tesseract) and the handwriting-focused TrOCR to extract text and structure from images and PDFs with consistent output.
Core Features & Use Cases
- Multimodel backends: select PaddleOCR for structured layouts, EasyOCR for quick reads, Tesseract for broad language coverage, and TrOCR for handwriting.
- Structured layout and table extraction: identify headers, paragraphs, and tables to produce usable JSON or CSV outputs.
- Handwriting transcription: transform handwritten notes into editable text with line-level accuracy.
- Multilingual support: handle documents containing multiple languages and scripts in one workflow.
- Output flexibility: produce plain text, JSON blocks, TSV, or CSV for downstream pipelines.
Quick Start
Install the required backends with the install command and then run the extract or layout commands to process your documents.
Dependency Matrix
Required Modules
paddlepaddle>=2.6paddleocr>=2.7easyocr>=1.7pytesseract>=0.3.10transformers>=4.40torch>=2.2opencv-python>=4.9numpy>=1.24pillow>=10.0
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: media-ocr-ai Download link: https://github.com/damionrashford/media-os/archive/main.zip#media-ocr-ai Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.