codex-promptfoo-agentic-eval

Community

Run and review Promptfoo AFM agentic evaluation.

Authorscouzi1966
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Run and interpret the Promptfoo-based AFM agentic evaluation suite to enable end-to-end functional QA and model-quality assessment for agentic workflows.

Core Features & Use Cases

  • Run, expand, and interpret the Promptfoo agentic evaluation suite for AFM validation and quality measurement.
  • Distinguish failure types and report provenance (afm_bug, model_quality, harness_bug) with explicit provenance tags (afm_internal, primary_source, public_benchmark_inspired, synthetic).
  • Manage multiple harness configurations (structured, structured-stress, toolcall, agentic, frameworks, opencode) and review results from prepared matrices and datasets.
  • Provide ready-to-use prompts, configs, and datasets, with references to test reports and failure classifications for analysis.

Quick Start

Start by running the harness with the agentic profile to begin evaluating AFM.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: codex-promptfoo-agentic-eval
Download link: https://github.com/scouzi1966/maclocal-api/archive/main.zip#codex-promptfoo-agentic-eval

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.