coding-agent-robustness

Name: coding-agent-robustness
Availability: InStock
Author: daedalus

Community

Benchmark coding agents under stress.

Software Engineering #automation #testing #llm #evaluation #robustness #probes #coding-agent

Authordaedalus

Version1.0.0

Installs0

System Documentation

What problem does it solve?

Systematic stress-testing and robustness measurement of coding agents (AI coding assistants, LLM-based code generators, or agentic coding systems). Use this skill whenever you want to evaluate an agent's reliability, resilience to adversarial inputs, and ability to recover from errors, culminating in a structured robustness report.

Core Features & Use Cases

Taxonomy of eight orthogonal robustness dimensions, covering adversarial correctness, spec underspecification tolerance, consistency under reformulation, error recovery, security awareness, hallucination rate, graceful degradation, and refusal calibration.
Probe generation protocol, scoring rubrics, and a structured report template to guide comprehensive audits of any coding system.
A complete execution pipeline with references to probe templates, automation via run_probes.py, and a sandboxed execution workflow for safe evaluation across chat, IDE plugins, or API wrappers.

Quick Start

Run the probe suite against your coding agent to generate a robustness report.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences