Name: benchjack
Availability: InStock
Author: benchjack

System Documentation

What problem does it solve?

BenchJack provides a security-audit workflow for AI benchmarks to uncover vulnerabilities in evaluation pipelines, such as data leakage, isolation issues, and prompt-injection risks.

Core Features & Use Cases

Automated security audit of AI benchmarks to reveal evaluation vulnerabilities (V1–V8).
Static + AI hybrid analysis using a suite of checks and custom rules to map risk across tasks.
Generates structured findings and task mappings to guide authors in hardening benchmarks and improving reliability.

Quick Start

Run the BenchJack scanner on a benchmark repository to reveal evaluation vulnerabilities.

Please help me install this Skill: Name: benchjack Download link: https://github.com/benchjack/benchjack/archive/main.zip#benchjack Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

benchjack

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper