Name: eval-writer
Availability: InStock
Author: anukkrit149

System Documentation

What problem does it solve?

It helps you turn an evaluation idea into a runnable, repeatable benchmark for agent behavior, complete with test cases and scoring.

Core Features & Use Cases

Eval suite scaffolding for deepagentsjs: Creates independent workspace eval packages under evals/ and wires them to the eval harness.
Test case design for multiple data sources: Supports inline cases, fixture-based JSON/JSONL, external datasets, and LangSmith dataset examples.
Scoring and reporting: Implements trajectory matchers, output comparisons (exact/fuzzy), optional LLM-as-judge evaluators, and hooks up results to LangSmith via the vitest reporter.
Optional sandbox-backed execution: Enables containerized execution when the benchmark requires running generated code or shell commands.

Quick Start

Ask the AI to create a new eval suite for your benchmark by specifying the capability to evaluate, the dataset source to use, and the scoring approach you want.

Please help me install this Skill: Name: eval-writer Download link: https://github.com/anukkrit149/anukkrit-skills/archive/main.zip#eval-writer Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

eval-writer

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper