megaplan-bakeoff
CommunityRun fair multi-profile LLM bake-offs with megaplan.
Product & Management#benchmarking#cost optimization#quality scoring#prompt hygiene#llm bake-off#blind evaluation#pipeline robustness
Authorpeteromallet
Version1.0.0
Installs0
System Documentation
What problem does it solve?
It helps you compare different megaplan profile mixes on the same task without wasting money or trusting misleading outputs, producing a fair “winner” based on blind, rubric-driven assessment.
Core Features & Use Cases
- Multi-profile concurrent bake-offs: Run the same idea across N profiles to test which mix delivers better quality per cost.
- Smoke testing and launch hygiene: Validate routing/model behavior in doc-mode first to catch failures cheaply before code-mode runs.
- Blind assessment workflow: Enforce sub-agent blinding, rubric scoring, and style quotes so evaluation is consistent and not profile-aware.
- Pre-merge validation gate: Detect empty diffs and other misdirections before selecting or merging results into main.
- Reporting patterns for decision-making: Produce comparison tables and cost-adjusted conclusions that summarize trade-offs and production readiness.
Quick Start
Tell the megaplan bakeoff runner to execute a light-robustness, blind-scored bake-off for your task idea across your chosen profiles, then pick and merge the winner.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: megaplan-bakeoff Download link: https://github.com/peteromallet/arnold/archive/main.zip#megaplan-bakeoff Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.