flinch-probe
CommunityMeasure LM word suppression with a flinch radar.
Authordaedalus
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Measures how much a language model suppresses charged vocabulary relative to fluency, enabling objective audits of model behavior and the detection of hidden censorship or bias in generated text.
Core Features & Use Cases
- Six-axis flinch profiling across Anti-China, Anti-America, Anti-Europe, Slurs, Sexual, and Violence to quantify suppression.
- Local or API-based probing using log-probabilities, with a fixed 0–100 flinch scale for cross-model comparisons.
- Quick benchmarking against baselines and generation of visualizations (radar charts and per-axis reports) for governance and safety teams.
Quick Start
Run a full flinch scan on a compatible model and review the generated flinch_results.json and radar chart.
Dependency Matrix
Required Modules
torchnumpytransformersopenaianthropicmatplotlib
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: flinch-probe Download link: https://github.com/daedalus/skills/archive/main.zip#flinch-probe Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.