eval-execution
CommunityRun repo-focused evaluation runs for AI agents.
Software Engineering#e2e#evaluation#benchmark#local-server#workflow-testing#ai-agents-workflows#smoke-regression
Authoreugene-belkovich
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Evaluates AI agent workflows locally by running end-to-end tests, benchmarks, and smoke checks to validate changes before PRs.
Core Features & Use Cases
- Start a local evaluation server for ai-agents-workflows.
- Run local E2E tests against the workflow server.
- Execute benchmarks and smoke regression checks to verify behavior before PRs.
- Inspect health and logs using the provided utilities and commands.
Quick Start
Start the local evaluation server on an available port, run a sample E2E flow, and verify the results.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: eval-execution Download link: https://github.com/eugene-belkovich/ai-setup/archive/main.zip#eval-execution Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.