adk-eval-guide

Community

Run reliable ADK agent evaluations

AuthorSeanChenR
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This guide reduces guesswork when ADK evaluation metrics fail by explaining evaluation metrics, evalset schemas, judge-model patterns, and common failure modes so engineers can diagnose, fix, and iterate on agent quality reliably.

Core Features & Use Cases

  • Metric guidance: Explains metric selection and configuration including tool_trajectory_avg_score, final_response_match_v2, rubric-based metrics, and hallucinations_v1.
  • Evalset & config patterns: Shows evalset schema, session overrides, multimodal and user-simulation schemas, and how to wire custom metrics and judge model options.
  • Debugging workflow: Provides an iterative eval-fix loop, common gotchas (google_search behavior, app name mismatches, non-determinism), and practical fixes for CI and local testing.

Quick Start

Run a focused evaluation using your evalset, inspect per-invocation metric breakdowns, and apply the recommended fixes to prompts, evalsets, or config until scores stabilize.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: adk-eval-guide
Download link: https://github.com/SeanChenR/maestro-agent/archive/main.zip#adk-eval-guide

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.