gemini-document-processing
CommunityUnlock PDF insights with AI vision.
System Documentation
What problem does it solve?
Extracting structured data, summarizing, or answering questions from complex PDF documents (especially those with images, charts, or tables) is often a manual and error-prone process. This skill leverages Google Gemini's native vision capabilities to automate deep document understanding, saving significant time and improving accuracy.
Core Features & Use Cases
- Multimodal PDF Analysis: Understands text, images, diagrams, charts, and tables within PDFs up to 1,000 pages.
- Structured Data Extraction: Extract specific information (e.g., invoice details, resume fields) into JSON format with schema validation.
- Intelligent Summarization & Q&A: Generate concise summaries or get direct answers to questions based on document content.
- Use Case: Automatically process a batch of legal contracts to extract key clauses, effective dates, and party names, then summarize each contract's obligations, all without manual review.
Quick Start
1. Get your Gemini API key: https://aistudio.google.com/apikey
2. Set API key as environment variable:
export GEMINI_API_KEY="your-api-key-here"
3. Install dependencies:
pip install google-genai python-dotenv
4. Use the provided script to summarize a PDF:
python .claude/skills/gemini-document-processing/scripts/process-document.py
--file your_document.pdf
--prompt "Provide a concise executive summary"
Dependency Matrix
Required Modules
Components
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: gemini-document-processing Download link: https://github.com/mrgoonie/claudekit-skills/archive/main.zip#gemini-document-processing Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.