adk-eval-guide

Name: adk-eval-guide
Availability: InStock
Author: SeanChenR

Community

Run reliable ADK agent evaluations

Software Engineering #metrics #evaluation #adk #agent-testing #llm-judge #evalsets #user-simulation

AuthorSeanChenR

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This guide reduces guesswork when ADK evaluation metrics fail by explaining evaluation metrics, evalset schemas, judge-model patterns, and common failure modes so engineers can diagnose, fix, and iterate on agent quality reliably.

Core Features & Use Cases

Metric guidance: Explains metric selection and configuration including tool_trajectory_avg_score, final_response_match_v2, rubric-based metrics, and hallucinations_v1.
Evalset & config patterns: Shows evalset schema, session overrides, multimodal and user-simulation schemas, and how to wire custom metrics and judge model options.
Debugging workflow: Provides an iterative eval-fix loop, common gotchas (google_search behavior, app name mismatches, non-determinism), and practical fixes for CI and local testing.

Quick Start

Run a focused evaluation using your evalset, inspect per-invocation metric breakdowns, and apply the recommended fixes to prompts, evalsets, or config until scores stabilize.

adk-eval-guide

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper