extract-from-pdfs
CommunityTurn PDFs into structured data for analysis.
System Documentation
What problem does it solve?
This Skill turns large collections of scientific PDFs into structured data ready for meta-analyses, systematic reviews, and database creation, automating extraction, validation, enrichment, and export.
Core Features & Use Cases
- Organize metadata from BibTeX, RIS, directories, or DOI lists
- Filter papers by abstract (Claude or local models) to focus on relevant literature
- Extract structured data from full PDFs using Claude's vision capabilities
- Repair and validate outputs, enrich with external databases, and export to Python, R, CSV, Excel, or SQLite
- Use case: conduct a rapid systematic review across hundreds of papers
Quick Start
Start by preparing your metadata and extraction schema, then run the 6-step pipeline as outlined in the repository workflow. Example files are provided in assets; to kick off the pipeline, run the steps in sequence: 01_organize_metadata.py, 02_filter_abstracts.py, 03_extract_from_pdfs.py, 04_repair_json.py, 05_validate_with_apis.py, and 06_export_database.py.
Dependency Matrix
Required Modules
Components
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: extract-from-pdfs Download link: https://github.com/brunoasm/my_claude_skills/archive/main.zip#extract-from-pdfs Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.