databricks-unstructured-pdf-generation

Community

Generate synthetic PDFs for RAG and testing.

Authordatasciencemonkey
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Generate realistic synthetic PDF documents to support Retrieval-Augmented Generation (RAG) pipelines and unstructured data use cases, enabling robust testing and demonstrations without relying on real data.

Core Features & Use Cases

  • Generate LLM-created PDFs with accompanying JSON metadata for RAG evaluation.
  • Support configurable catalogs, schemas, and target counts to produce diverse datasets.
  • Automatically stage PDFs for indexing or upload to Unity Catalog volumes.

Quick Start

Call the generate_pdf_documents MCP tool with catalog, schema, description, and count to generate synthetic PDFs.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: databricks-unstructured-pdf-generation
Download link: https://github.com/datasciencemonkey/coding-agents-databricks-apps/archive/main.zip#databricks-unstructured-pdf-generation

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.