hebrew-ocr-forms
OfficialConvert Hebrew forms to structured data.
Authorskills-il
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Hebrew government forms scanned for OCR produce unstructured text with mixed Hebrew/English, making manual data extraction slow and error-prone. This skill provides automated detection and extraction of form fields from common Israeli government forms (e.g., Tabu, Tofes 106, Tofes 857) and outputs structured data for downstream workflows, including validation of Israeli ID numbers and RTL text handling.
Core Features & Use Cases
- Preprocess and OCR Hebrew forms using Tesseract with heb+eng languages, handling RTL layout.
- Extract form-type specific fields (e.g., gush, chelka, owner; tax year, employer number, salary) and validate IDs.
- Output structured JSON (and optional raw OCR text) suitable for data pipelines or dashboards.
- Use Case: Batch process scanned Tabu documents to build a property registry dataset; use Case: generate employer reports from Tofes 106 forms.
Quick Start
Preprocess a scanned Hebrew form and run the extractor to obtain structured JSON data.
Dependency Matrix
Required Modules
pytesseractPillowopencv-pythonnumpy
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: hebrew-ocr-forms Download link: https://github.com/skills-il/localization/archive/main.zip#hebrew-ocr-forms Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.