codex-promptfoo-agentic-eval
CommunityRun and review Promptfoo AFM agentic evaluation.
Authorscouzi1966
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Run and interpret the Promptfoo-based AFM agentic evaluation suite to enable end-to-end functional QA and model-quality assessment for agentic workflows.
Core Features & Use Cases
- Run, expand, and interpret the Promptfoo agentic evaluation suite for AFM validation and quality measurement.
- Distinguish failure types and report provenance (afm_bug, model_quality, harness_bug) with explicit provenance tags (afm_internal, primary_source, public_benchmark_inspired, synthetic).
- Manage multiple harness configurations (structured, structured-stress, toolcall, agentic, frameworks, opencode) and review results from prepared matrices and datasets.
- Provide ready-to-use prompts, configs, and datasets, with references to test reports and failure classifications for analysis.
Quick Start
Start by running the harness with the agentic profile to begin evaluating AFM.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: codex-promptfoo-agentic-eval Download link: https://github.com/scouzi1966/maclocal-api/archive/main.zip#codex-promptfoo-agentic-eval Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.