Name: production-evals-framework
Availability: InStock
Author: nathankoerschner

System Documentation

What problem does it solve?

This Skill provides a comprehensive framework for building, implementing, and operating production-grade evaluation systems for LLM applications, ensuring quality, reliability, and cost-effectiveness.

Core Features & Use Cases

End-to-End Evals Stack: Covers everything from defining systems under test to running experiments and establishing operating cadences.
Multi-Stage Evaluation: Implements golden sets, labeled scenarios, replay harnesses, rubric scoring, and experiment comparisons.
Use Case: You've deployed a new AI chatbot for customer support. Use this Skill to set up automated tests that continuously check its accuracy, relevance, and tone against predefined benchmarks and user feedback, ensuring it meets quality standards before and after updates.

Quick Start

Use the production-evals-framework skill to define a system under test for your RAG application.

Please help me install this Skill: Name: production-evals-framework Download link: https://github.com/nathankoerschner/dotfiles/archive/main.zip#production-evals-framework Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

production-evals-framework

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper