glmocr-sdk

Official

Instant structured extraction from images and PDFs

Authorzai-org
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill turns scanned pages, screenshots, and PDFs into machine-readable outputs so agents and pipelines can extract text, tables, formulas, and layout regions without manual copying or rekeying.

Core Features & Use Cases

  • OCR with layout awareness: Produces labeled regions (title, text, table, formula, figure, etc.) with normalized bounding boxes on a 0–1000 scale.
  • Dual interfaces: Works as a one-line Python API or a CLI for batch processing, stdout-first outputs, and agent-friendly piping to tools like jq.
  • Rich serialization and visualization: Exports JSON and Markdown, saves cropped images and optional layout visualizations, and supports MaaS/cloud or selfhosted modes for different deployment needs.
  • Use Case: Convert a multi-page research paper or a folder of invoice scans into structured JSON for downstream analytics and summarization.

Quick Start

Call the glmocr CLI or the Python API to parse 'document.pdf' and return JSON regions plus a Markdown version.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: glmocr-sdk
Download link: https://github.com/zai-org/GLM-skills/archive/main.zip#glmocr-sdk

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.