messydata

Official

Generate realistic messy data for tests.

Authorsodadata
Version1.0.0
Installs0

System Documentation

What problem does it solve?

MessyData enables you to generate synthetic, realistic dirty data for testing data pipelines, data quality tooling, and ML workflows without writing custom data generators.

Core Features & Use Cases

  • Declarative YAML config: define datasets, distributions, and anomalies without writing procedural code.
  • Date-aware generation and both CLI and Python APIs for end-to-end data generation workflows.
  • Use cases include validating pipelines, stress-testing anomaly detection, and simulating real-world data quality issues.

Quick Start

Create a MessyData YAML config, then run the validate command before generating.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: messydata
Download link: https://github.com/sodadata/messydata/archive/main.zip#messydata

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.