Name: braintrust-evals
Availability: InStock
Author: thensls

System Documentation

What problem does it solve?

Systematic LLM evaluation orchestration for Braintrust experiments, datasets, and scoring to compare models, prompts, and configurations across NSLS projects.

Core Features & Use Cases

Create and manage evaluation datasets with input/expected results for model benchmarking.
Run experiments across model configurations, prompts, and scoring functions; track results and comparisons in the Braintrust UI.
Use structured outputs and dashboards to diagnose weaknesses, iterate prompts, and improve model performance.

Quick Start

Run a Braintrust evaluation on a sample dataset and compare two model configurations.

Please help me install this Skill: Name: braintrust-evals Download link: https://github.com/thensls/nsls-builder-toolkit/archive/main.zip#braintrust-evals Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

braintrust-evals

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper