coding-agent-robustness

Community

Benchmark coding agents under stress.

Authordaedalus
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Systematic stress-testing and robustness measurement of coding agents (AI coding assistants, LLM-based code generators, or agentic coding systems). Use this skill whenever you want to evaluate an agent's reliability, resilience to adversarial inputs, and ability to recover from errors, culminating in a structured robustness report.

Core Features & Use Cases

  • Taxonomy of eight orthogonal robustness dimensions, covering adversarial correctness, spec underspecification tolerance, consistency under reformulation, error recovery, security awareness, hallucination rate, graceful degradation, and refusal calibration.
  • Probe generation protocol, scoring rubrics, and a structured report template to guide comprehensive audits of any coding system.
  • A complete execution pipeline with references to probe templates, automation via run_probes.py, and a sandboxed execution workflow for safe evaluation across chat, IDE plugins, or API wrappers.

Quick Start

Run the probe suite against your coding agent to generate a robustness report.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: coding-agent-robustness
Download link: https://github.com/daedalus/skills/archive/main.zip#coding-agent-robustness

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.