dataset-synthesizer

Name: dataset-synthesizer
Availability: InStock
Author: joleques

Community

Generate JSONL fine-tuning datasets from logs

Data & Analytics #logs #fine-tuning #langfuse #jsonl #dataset #vertex-ai #data-augmentation

Authorjoleques

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This skill automates the creation of high-quality JSONL datasets for LLM fine-tuning by combining user interaction logs with product documentation, enforcing cleaning rules and data augmentation so models learn correct behaviors rather than noisy error traces.

Core Features & Use Cases

Log cleansing and filtering: removes internal agent errors, limits trivial greetings, and refines incomplete or vague responses into technical explanations using documentation.
Data augmentation and synthesis: expands sparse logs with synthetic but domain-consistent Q&A derived from documentation to reach the requested sample count.
Format and delivery: outputs strict Vertex AI (Gemini) JSONL lines with systemInstruction and contents fields and saves the dataset to the required ./agentAI/fine-tuning/[Title]/dataset/[Title].jsonl path.
Correction mode: supports targeted fixes of existing datasets guided by an audit report without redoing augmentation or recreating the entire dataset.

Quick Start

Generate a 300-line Gemini-format JSONL fine-tuning dataset titled MyProduct by merging logs from /path/to/langfuse.jsonl with product documentation in /path/to/docs and save it to ./agentAI/fine-tuning/MyProduct/dataset/MyProduct.jsonl.

dataset-synthesizer

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper