llmeval-tracking
CommunityTrack LLM eval stability across CI runs.
AuthorC-Ross
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Track and diagnose the stability of LLM evaluation tests across CI runs, surfacing flaky tests and trends to speed up debugging and reliability.
Core Features & Use Cases
- Result aggregation: collects per-run evaluation outcomes and stores them as JSONL for easy analysis.
- Flake detection: identifies flaky tests and provides trend insights over time.
- CI integration: works with daily CI workflows to fetch and report results from multiple runs.
- Guided analysis: helps engineers pinpoint failing tests and compare against historical baselines.
Quick Start
Run the llmeval-tracker CLI to fetch the latest results and generate a stability report.
Dependency Matrix
Required Modules
None requiredComponents
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: llmeval-tracking Download link: https://github.com/C-Ross/LlamaOfFate/archive/main.zip#llmeval-tracking Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.