coding-agent-robustness
CommunityBenchmark coding agents under stress.
Authordaedalus
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Systematic stress-testing and robustness measurement of coding agents (AI coding assistants, LLM-based code generators, or agentic coding systems). Use this skill whenever you want to evaluate an agent's reliability, resilience to adversarial inputs, and ability to recover from errors, culminating in a structured robustness report.
Core Features & Use Cases
- Taxonomy of eight orthogonal robustness dimensions, covering adversarial correctness, spec underspecification tolerance, consistency under reformulation, error recovery, security awareness, hallucination rate, graceful degradation, and refusal calibration.
- Probe generation protocol, scoring rubrics, and a structured report template to guide comprehensive audits of any coding system.
- A complete execution pipeline with references to probe templates, automation via run_probes.py, and a sandboxed execution workflow for safe evaluation across chat, IDE plugins, or API wrappers.
Quick Start
Run the probe suite against your coding agent to generate a robustness report.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: coding-agent-robustness Download link: https://github.com/daedalus/skills/archive/main.zip#coding-agent-robustness Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.