llmeval-tracking

Community

Track LLM eval stability across CI runs.

AuthorC-Ross
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Track and diagnose the stability of LLM evaluation tests across CI runs, surfacing flaky tests and trends to speed up debugging and reliability.

Core Features & Use Cases

  • Result aggregation: collects per-run evaluation outcomes and stores them as JSONL for easy analysis.
  • Flake detection: identifies flaky tests and provides trend insights over time.
  • CI integration: works with daily CI workflows to fetch and report results from multiple runs.
  • Guided analysis: helps engineers pinpoint failing tests and compare against historical baselines.

Quick Start

Run the llmeval-tracker CLI to fetch the latest results and generate a stability report.

Dependency Matrix

Required Modules

None required

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: llmeval-tracking
Download link: https://github.com/C-Ross/LlamaOfFate/archive/main.zip#llmeval-tracking

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.