codex-session-benchmark-maintainer
CommunityCompare Codex sessions with defensible proxy metrics.
Data & Analytics#benchmarking#proxy metrics#failure classification#token cost#workflow evaluation#codex sessions#follow-up analysis
AuthorUndertone0809
Version1.0.0
Installs0
System Documentation
What problem does it solve?
It helps you evaluate how efficiently and effectively a specific Codex session performed by benchmarking it against a recent local cohort and turning log evidence into actionable workflow guidance.
Core Features & Use Cases
- Target-vs-cohort proxy benchmarking: Compares a named Codex session against a deduped recent window to estimate efficiency, rework, follow-up, interruptions, and handoff quality without assuming ground-truth success.
- Evidence-first metric extraction: Computes or labels metrics as direct, derived, proxy, or missing using local JSONL/SQLite sources, including validation/commit/PR handoff indicators when present.
- Reusable failure-class outcomes: Classifies the session outcome (e.g., clean handoff, completed with rework, partial/blocked, diagnosis only, ambiguous) and maps underperformance to reusable workflow failure classes, then recommends next skill/workflow changes (often by handing off to skill-optimizer).
Example use cases:
- Assess whether a target Codex session (given by id or prefix) was notably worse or better than the surrounding 30–100 session cohort for the same Rudder development work context.
- Diagnose whether a session’s low effectiveness came from wrong skill routing, weak source-of-truth checks, validation gaps, wrong worktree/runtime, tooling/permission blockers, or unclear user-intent extraction.
Quick Start
Compare the Codex session id 019e3f7c-d54b-71b3-ba08-2ee20ce6be27 against the most recent 100 local Rudder-related Codex sessions and report which proxy metrics and failure classes explain any performance gap.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: codex-session-benchmark-maintainer Download link: https://github.com/Undertone0809/rudder/archive/main.zip#codex-session-benchmark-maintainer Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.