redteam-autoresearch
OfficialBounded red-team autoresearch for guardrails.
Authorsuperagent-ai
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This skill orchestrates a bounded red-team autoresearch loop to generate labeled guardrail training data for LLM safety. It enables attackers and judges to simulate real-world adversarial probing under explicit authorization, producing studyable datasets locally.
Core Features & Use Cases
- End-to-end autoresearch workflow: target profiling, seed research, batch generation, deterministic mutators, model querying, judging with StrongREJECT, recording, and archive-based novelty tracking.
- Data export for guardrails: prepares labeled prompts, responses, and metadata for training guardrail classifiers and detectors.
- Benchmark-ready: supports holdout seeds, difficulty strata, and macro/micro ASR reporting for model comparisons.
Quick Start
Set up a run workspace and start the bounded red-team autoresearch workflow to generate guardrail training data.
Dependency Matrix
Required Modules
openaipyyamlpython-dotenvtenacitytqdmsentence-transformers
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: redteam-autoresearch Download link: https://github.com/superagent-ai/skills/archive/main.zip#redteam-autoresearch Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.