databricks-synthetic-data-generation

Community

Generate realistic synthetic data for Databricks.

Authordatasciencemonkey
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Generate realistic synthetic data for Databricks to power test datasets and demonstrations.

Core Features & Use Cases

  • Generate synthetic data with Faker and Spark, including non-linear distributions and temporal patterns.
  • Preserve referential integrity across related tables (customers, orders, tickets) and optional business rules.
  • Save outputs to Databricks volumes for downstream Spark Declarative Pipelines (SDP) and dashboards.

Quick Start

Save the generator script as scripts/generate_data.py and run it on your Databricks cluster.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: databricks-synthetic-data-generation
Download link: https://github.com/datasciencemonkey/coding-agents-databricks-apps/archive/main.zip#databricks-synthetic-data-generation

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.