qianfanocr-document-intelligence

Official

Turn visual documents into structured insights.

Authorbaidubce
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Analyze images, image URLs, PDFs, and PDF URLs to enable recognition, extraction, and answering questions about content from visual inputs. It coordinates token setup, mode selection, and downstream tooling to produce structured results for agents.

Core Features & Use Cases

  • Supports multiple input types (images and PDFs) and per-page outputs, including layout-aware parsing to preserve structure.
  • Provides modes for document parsing, layout analysis, element recognition, document parsing with layout, general OCR, key information extraction, chart understanding, and doc vqa, with references and assets loaded as needed.
  • Use Case: automate extraction of key fields from documents (invoices, contracts) and generate structured data for downstream automation.

Quick Start

Provide an image or PDF and the skill will orchestrate OCR and document understanding to return a structured result.

Dependency Matrix

Required Modules

Pillow

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: qianfanocr-document-intelligence
Download link: https://github.com/baidubce/skills/archive/main.zip#qianfanocr-document-intelligence

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.