scientific-papers-to-dataset

Name: scientific-papers-to-dataset
Availability: InStock
Author: eamag

Community

Extract datasets from academic papers

Education & Research #automation #pdf #data-extraction #openalex #citation-graph #dataset-creation #bfs-queue

Authoreamag

Version1.0.0

Installs0

System Documentation

What problem does it solve?

Many research questions require structured experimental data that exist only scattered across academic papers and PDFs, and manually finding, downloading, filtering, and extracting these results is slow and error-prone. This Skill automates discovery, PDF retrieval, relevance filtering, data extraction, and citation traversal so users can assemble reproducible datasets from the literature.

Core Features & Use Cases

Automated Paper Discovery: Query OpenAlex to find seed works and batch-fetch metadata and IDs.
Robust PDF Retrieval: Attempt PDF downloads from OpenAlex locations, bioRxiv, and Unpaywall with rate limiting and fallbacks.
Relevance Filtering & Extraction Pipeline: Use a queue-based BFS workflow with relevance checks and a thinking-model-driven extractor to produce per-paper JSON outputs.
Use Case: Create a dataset of compound toxicity measurements by searching OpenAlex, downloading accessible PDFs, extracting experimental values into structured JSON, and expanding via cited and citing works.

Quick Start

Create a new project by describing the dataset you want, run the initial OpenAlex search to seed the queue, then process the queue to download PDFs, filter relevance, and extract structured JSON data.

scientific-papers-to-dataset

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper