historical-ocr
CommunityTurn archival scans into searchable text.
Education & Research#ocr#tesseract#kraken#digital-humanities#historical transcription#word-confidence#post-correction
Authorxjtulyc
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps you transcribe historical documents by converting noisy, archaic scans into readable text with word-level confidence and optional correction for early-modern spelling.
Core Features & Use Cases
- Historical OCR transcription: Uses Tesseract 5 LSTM with OpenCV preprocessing for printed historical text and difficult typography.
- Handwriting/typeface support workflow: Optionally uses Kraken for historical fonts/line segmentation to improve results on specialized letterforms.
- Quality control + post-correction: Filters low-confidence words and applies symspellpy correction using a historical frequency dictionary to reduce word error rate.
Quick Start
Use the skill to transcribe a batch of scanned page images into corrected text files and a quality_report.csv by running OCR with language and confidence filtering on your preprocessed scans.
Dependency Matrix
Required Modules
pytesseractPillowopencv-pythonnumpypandassymspellpykraken
Components
assets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: historical-ocr Download link: https://github.com/xjtulyc/awesome-rosetta-skills/archive/main.zip#historical-ocr Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.