aibdd-red-evaluate

Official

Veto false red before Green proceeds

AuthorWaterball-Software-Academy
Version1.0.0
Installs0

System Documentation

What problem does it solve?

It prevents downstream Green execution from consuming an invalid or misleading Red run by evaluating whether the red signals are both legal (runner correctness) and meaningful (user-observable behavior gap).

Core Features & Use Cases

  • Hard-gate validation (Type A): Detects false red caused by runner errors, invalid step definitions, missing/incorrect scenario-to-step grounding, and evidence boundary issues.
  • Hollow-red semantic detection (Type B): Flags red failures that do not establish a real user-observable behavior gap (e.g., harness/environment issues or assertions against non-public internals).
  • Evidence-driven reporting: Reads runner-native test reports, runtime-visible feature files, and runtime-visible step definition files, then emits a structured PASS/FAIL evaluation report.
  • Workflow guardrail for AIBDD: Produces a verdict that either allows Green to proceed or vetoes it when findings exist, enforcing disciplined RED → GREEN → REFACTOR flow.

Quick Start

Provide the Red handoff payload (or explicit artifact pointers) including the target feature files, runner-native test report path, runtime feature/step roots, and the step-definition files touched, then run this skill to output a PASS/FAIL red evaluation report.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: aibdd-red-evaluate
Download link: https://github.com/Waterball-Software-Academy/aixbdd/archive/main.zip#aibdd-red-evaluate

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.