historical-ocr

Community

Turn archival scans into searchable text.

Authorxjtulyc
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps you transcribe historical documents by converting noisy, archaic scans into readable text with word-level confidence and optional correction for early-modern spelling.

Core Features & Use Cases

  • Historical OCR transcription: Uses Tesseract 5 LSTM with OpenCV preprocessing for printed historical text and difficult typography.
  • Handwriting/typeface support workflow: Optionally uses Kraken for historical fonts/line segmentation to improve results on specialized letterforms.
  • Quality control + post-correction: Filters low-confidence words and applies symspellpy correction using a historical frequency dictionary to reduce word error rate.

Quick Start

Use the skill to transcribe a batch of scanned page images into corrected text files and a quality_report.csv by running OCR with language and confidence filtering on your preprocessed scans.

Dependency Matrix

Required Modules

pytesseractPillowopencv-pythonnumpypandassymspellpykraken

Components

assets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: historical-ocr
Download link: https://github.com/xjtulyc/awesome-rosetta-skills/archive/main.zip#historical-ocr

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.