Name: eval-framework
Availability: InStock
Author: Cheggin

System Documentation

What problem does it solve?

Evaluates AI agent reliability by focusing testing on changes that affect behavior, reducing wasted compute and surfacing flaky or brittle interactions early.

Core Features & Use Cases

Diff-driven eval execution: runs only evals impacted by a git diff to minimize CI time.
Pass@k and pass^k analytics: measures partial and full success rates with per-eval diagnostics.
CI integration: hooks and dashboards to detect regressions and track performance across releases.
Real-world use cases include validating agent safety, stability after feature changes, and regression testing of capabilities.

Quick Start

Run the evaluation suite against your agent changes to measure pass@k metrics.

Please help me install this Skill: Name: eval-framework Download link: https://github.com/Cheggin/request-for-startups/archive/main.zip#eval-framework Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

eval-framework

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper