adk-eval-guide
CommunityRun reliable ADK agent evaluations
AuthorSeanChenR
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This guide reduces guesswork when ADK evaluation metrics fail by explaining evaluation metrics, evalset schemas, judge-model patterns, and common failure modes so engineers can diagnose, fix, and iterate on agent quality reliably.
Core Features & Use Cases
- Metric guidance: Explains metric selection and configuration including tool_trajectory_avg_score, final_response_match_v2, rubric-based metrics, and hallucinations_v1.
- Evalset & config patterns: Shows evalset schema, session overrides, multimodal and user-simulation schemas, and how to wire custom metrics and judge model options.
- Debugging workflow: Provides an iterative eval-fix loop, common gotchas (google_search behavior, app name mismatches, non-determinism), and practical fixes for CI and local testing.
Quick Start
Run a focused evaluation using your evalset, inspect per-invocation metric breakdowns, and apply the recommended fixes to prompts, evalsets, or config until scores stabilize.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: adk-eval-guide Download link: https://github.com/SeanChenR/maestro-agent/archive/main.zip#adk-eval-guide Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.