qianfanocr-document-intelligence
OfficialTurn visual documents into structured insights.
Software Engineering#ocr#pdf#image-processing#layout-analysis#document-intelligence#chart-understanding
Authorbaidubce
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Analyze images, image URLs, PDFs, and PDF URLs to enable recognition, extraction, and answering questions about content from visual inputs. It coordinates token setup, mode selection, and downstream tooling to produce structured results for agents.
Core Features & Use Cases
- Supports multiple input types (images and PDFs) and per-page outputs, including layout-aware parsing to preserve structure.
- Provides modes for document parsing, layout analysis, element recognition, document parsing with layout, general OCR, key information extraction, chart understanding, and doc vqa, with references and assets loaded as needed.
- Use Case: automate extraction of key fields from documents (invoices, contracts) and generate structured data for downstream automation.
Quick Start
Provide an image or PDF and the skill will orchestrate OCR and document understanding to return a structured result.
Dependency Matrix
Required Modules
Pillow
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: qianfanocr-document-intelligence Download link: https://github.com/baidubce/skills/archive/main.zip#qianfanocr-document-intelligence Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.