glmocr-sdk
OfficialInstant structured extraction from images and PDFs
Authorzai-org
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill turns scanned pages, screenshots, and PDFs into machine-readable outputs so agents and pipelines can extract text, tables, formulas, and layout regions without manual copying or rekeying.
Core Features & Use Cases
- OCR with layout awareness: Produces labeled regions (title, text, table, formula, figure, etc.) with normalized bounding boxes on a 0–1000 scale.
- Dual interfaces: Works as a one-line Python API or a CLI for batch processing, stdout-first outputs, and agent-friendly piping to tools like jq.
- Rich serialization and visualization: Exports JSON and Markdown, saves cropped images and optional layout visualizations, and supports MaaS/cloud or selfhosted modes for different deployment needs.
- Use Case: Convert a multi-page research paper or a folder of invoice scans into structured JSON for downstream analytics and summarization.
Quick Start
Call the glmocr CLI or the Python API to parse 'document.pdf' and return JSON regions plus a Markdown version.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: glmocr-sdk Download link: https://github.com/zai-org/GLM-skills/archive/main.zip#glmocr-sdk Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.