extract-from-pdfs

Community

Turn PDFs into structured data for analysis.

Authorbrunoasm
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill turns large collections of scientific PDFs into structured data ready for meta-analyses, systematic reviews, and database creation, automating extraction, validation, enrichment, and export.

Core Features & Use Cases

  • Organize metadata from BibTeX, RIS, directories, or DOI lists
  • Filter papers by abstract (Claude or local models) to focus on relevant literature
  • Extract structured data from full PDFs using Claude's vision capabilities
  • Repair and validate outputs, enrich with external databases, and export to Python, R, CSV, Excel, or SQLite
  • Use case: conduct a rapid systematic review across hundreds of papers

Quick Start

Start by preparing your metadata and extraction schema, then run the 6-step pipeline as outlined in the repository workflow. Example files are provided in assets; to kick off the pipeline, run the steps in sequence: 01_organize_metadata.py, 02_filter_abstracts.py, 03_extract_from_pdfs.py, 04_repair_json.py, 05_validate_with_apis.py, and 06_export_database.py.

Dependency Matrix

Required Modules

anthropic>=0.40.0pybtex>=0.24.0rispy>=0.6.0json-repair>=0.25.0jsonschema>=4.20.0pandas>=2.0.0openpyxl>=3.1.0pyreadr>=0.5.0requests>=2.31.0

Components

scriptsreferencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: extract-from-pdfs
Download link: https://github.com/brunoasm/my_claude_skills/archive/main.zip#extract-from-pdfs

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.