eval-writer
CommunityGenerate eval suites for agent benchmarks
Software Engineering#vitest#benchmarking#agent testing#LangSmith#scoring logic#eval suite#dataset fixtures
Authoranukkrit149
Version1.0.0
Installs0
System Documentation
What problem does it solve?
It helps you turn an evaluation idea into a runnable, repeatable benchmark for agent behavior, complete with test cases and scoring.
Core Features & Use Cases
- Eval suite scaffolding for deepagentsjs: Creates independent workspace eval packages under evals/ and wires them to the eval harness.
- Test case design for multiple data sources: Supports inline cases, fixture-based JSON/JSONL, external datasets, and LangSmith dataset examples.
- Scoring and reporting: Implements trajectory matchers, output comparisons (exact/fuzzy), optional LLM-as-judge evaluators, and hooks up results to LangSmith via the vitest reporter.
- Optional sandbox-backed execution: Enables containerized execution when the benchmark requires running generated code or shell commands.
Quick Start
Ask the AI to create a new eval suite for your benchmark by specifying the capability to evaluate, the dataset source to use, and the scoring approach you want.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: eval-writer Download link: https://github.com/anukkrit149/anukkrit-skills/archive/main.zip#eval-writer Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.