eval-mlflow

Official

MLflow-backed eval: log, sync, feedback.

Authoropendatahub-io
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Bridges evaluation data flow by logging run results and syncing datasets with MLflow, enabling end-to-end tracking of experiments and feedback between the harness and MLflow traces.

Core Features & Use Cases

  • MLflow experiment tracking: log parameters, metrics, and artifacts from evaluation runs to MLflow for consistent analytics.
  • Dataset synchronization: push evaluation cases to MLflow datasets and maintain cross-run traceability.
  • Feedback integration: attach judge and human feedback to traces and pull annotations back into the evaluation pipeline for optimization.
  • Use case: a data science team validates model performance across many runs and compares results in a single MLflow dashboard.

Quick Start

Install the skill and configure MLflow tracking, then run the evaluation workflow to start logging results and syncing data.

Dependency Matrix

Required Modules

mlflowpyyaml

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: eval-mlflow
Download link: https://github.com/opendatahub-io/agent-eval-harness/archive/main.zip#eval-mlflow

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.