eval-bootstrap

Official

Bootstrap evaluators from production traces.

Authordatadog-labs
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Bootstrap evaluators from production traces to rapidly generate ready-to-use evaluation code for ML apps.

Core Features & Use Cases

  • Ground truth-driven evaluator generation from LLM traces to seed evaluation prompts.
  • Generates BaseEvaluator and/or LLMJudge-based code from traces for offline experiments.
  • Works with ml_app context and optional RCA reports or failure hypotheses to seed a failure taxonomy.

Quick Start

Run eval-bootstrap to generate evaluator code from production traces for your ml_app.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: eval-bootstrap
Download link: https://github.com/datadog-labs/agent-skills/archive/main.zip#eval-bootstrap

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.