byob

Official

Build and evaluate BYOB benchmarks for LLMs.

AuthorNVIDIA
Version1.0.0
Installs0

System Documentation

What problem does it solve?

BYOB enables researchers and developers to build, customize, and evaluate large language model benchmarks using the BYOB decorator framework, providing reproducible evaluation workflows.

Core Features & Use Cases

  • Stepwise workflow guiding users through 5 steps to construct and assess bespoke benchmarks.
  • BYOB API integration with datasets, prompts, and scoring methods, enabling repeatable experiments and reporting.
  • LLM-as-Judge support and built-in scorers for objective and subjective evaluation.

Quick Start

Guide the user through 5 steps to build and evaluate a BYOB benchmark from a dataset.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: byob
Download link: https://github.com/NVIDIA/skills/archive/main.zip#byob

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.