generate-synthetic-data

Community

Generate diverse LLM test data.

Authormarchatton
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the challenge of creating comprehensive and diverse test datasets for LLM pipelines, especially when real user data is scarce or specific failure scenarios need to be tested.

Core Features & Use Cases

  • Dimension-based Tuple Generation: Defines axes of variation (dimensions) relevant to potential LLM failures.
  • Iterative Tuple Refinement: Involves user feedback to ensure generated tuples reflect realistic scenarios.
  • LLM-assisted Query Generation: Converts refined tuples into natural language queries for pipeline testing.
  • Use Case: Bootstrapping an evaluation dataset for a customer support chatbot by defining dimensions like 'user intent', 'customer sentiment', and 'product type', then generating varied queries to test the bot's responses.

Quick Start

Define dimensions for your application and generate synthetic data tuples.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: generate-synthetic-data
Download link: https://github.com/marchatton/agent-skills/archive/main.zip#generate-synthetic-data

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.