databricks-synthetic-data-generation
CommunityGenerate realistic synthetic data for Databricks.
Authordatasciencemonkey
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Generate realistic synthetic data for Databricks to power test datasets and demonstrations.
Core Features & Use Cases
- Generate synthetic data with Faker and Spark, including non-linear distributions and temporal patterns.
- Preserve referential integrity across related tables (customers, orders, tickets) and optional business rules.
- Save outputs to Databricks volumes for downstream Spark Declarative Pipelines (SDP) and dashboards.
Quick Start
Save the generator script as scripts/generate_data.py and run it on your Databricks cluster.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: databricks-synthetic-data-generation Download link: https://github.com/datasciencemonkey/coding-agents-databricks-apps/archive/main.zip#databricks-synthetic-data-generation Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.