codex-session-benchmark-maintainer

Community

Compare Codex sessions with defensible proxy metrics.

AuthorUndertone0809
Version1.0.0
Installs0

System Documentation

What problem does it solve?

It helps you evaluate how efficiently and effectively a specific Codex session performed by benchmarking it against a recent local cohort and turning log evidence into actionable workflow guidance.

Core Features & Use Cases

  • Target-vs-cohort proxy benchmarking: Compares a named Codex session against a deduped recent window to estimate efficiency, rework, follow-up, interruptions, and handoff quality without assuming ground-truth success.
  • Evidence-first metric extraction: Computes or labels metrics as direct, derived, proxy, or missing using local JSONL/SQLite sources, including validation/commit/PR handoff indicators when present.
  • Reusable failure-class outcomes: Classifies the session outcome (e.g., clean handoff, completed with rework, partial/blocked, diagnosis only, ambiguous) and maps underperformance to reusable workflow failure classes, then recommends next skill/workflow changes (often by handing off to skill-optimizer).

Example use cases:

  • Assess whether a target Codex session (given by id or prefix) was notably worse or better than the surrounding 30–100 session cohort for the same Rudder development work context.
  • Diagnose whether a session’s low effectiveness came from wrong skill routing, weak source-of-truth checks, validation gaps, wrong worktree/runtime, tooling/permission blockers, or unclear user-intent extraction.

Quick Start

Compare the Codex session id 019e3f7c-d54b-71b3-ba08-2ee20ce6be27 against the most recent 100 local Rudder-related Codex sessions and report which proxy metrics and failure classes explain any performance gap.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: codex-session-benchmark-maintainer
Download link: https://github.com/Undertone0809/rudder/archive/main.zip#codex-session-benchmark-maintainer

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.