redteam-autoresearch

Official

Bounded red-team autoresearch for guardrails.

Authorsuperagent-ai
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This skill orchestrates a bounded red-team autoresearch loop to generate labeled guardrail training data for LLM safety. It enables attackers and judges to simulate real-world adversarial probing under explicit authorization, producing studyable datasets locally.

Core Features & Use Cases

  • End-to-end autoresearch workflow: target profiling, seed research, batch generation, deterministic mutators, model querying, judging with StrongREJECT, recording, and archive-based novelty tracking.
  • Data export for guardrails: prepares labeled prompts, responses, and metadata for training guardrail classifiers and detectors.
  • Benchmark-ready: supports holdout seeds, difficulty strata, and macro/micro ASR reporting for model comparisons.

Quick Start

Set up a run workspace and start the bounded red-team autoresearch workflow to generate guardrail training data.

Dependency Matrix

Required Modules

openaipyyamlpython-dotenvtenacitytqdmsentence-transformers

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: redteam-autoresearch
Download link: https://github.com/superagent-ai/skills/archive/main.zip#redteam-autoresearch

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.