hebrew-ocr-forms

Name: hebrew-ocr-forms
Availability: InStock
Author: skills-il

Official

Convert Hebrew forms to structured data.

Data & Analytics #ocr #forms #government #data-extraction #rtl #israel #hebrew

Authorskills-il

Version1.0.0

Installs0

System Documentation

What problem does it solve?

Hebrew government forms scanned for OCR produce unstructured text with mixed Hebrew/English, making manual data extraction slow and error-prone. This skill provides automated detection and extraction of form fields from common Israeli government forms (e.g., Tabu, Tofes 106, Tofes 857) and outputs structured data for downstream workflows, including validation of Israeli ID numbers and RTL text handling.

Core Features & Use Cases

Preprocess and OCR Hebrew forms using Tesseract with heb+eng languages, handling RTL layout.
Extract form-type specific fields (e.g., gush, chelka, owner; tax year, employer number, salary) and validate IDs.
Output structured JSON (and optional raw OCR text) suitable for data pipelines or dashboards.
Use Case: Batch process scanned Tabu documents to build a property registry dataset; use Case: generate employer reports from Tofes 106 forms.

Quick Start

Preprocess a scanned Hebrew form and run the extractor to obtain structured JSON data.

hebrew-ocr-forms

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper