mechanistic-interventions

Official

Plan and validate causal interventions.

Authorconcordance-co
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Move from benchmark-driven analysis into causal or mechanism-oriented interventions. Covers activation patching, interchange, control design, read-vs-write distinctions, and intervention-specific success criteria, while pointing to future attention and routing follow-up work.

Core Features & Use Cases

  • Intervention framing: define exact behavior to change, success criteria, malformed-output criteria, and intended direction of change.
  • Site choice from the computation story: select plausible sites based on localization cues and timing, not ease of patching.
  • Paired design and controls: prefer matched donor-target pairs, single-layer first tests, same-label controls, and purposeful control strategies.
  • Interpretation discipline: track intended-direction flips, reverse-direction flips, malformed outputs, and same-label instability; separate causal evidence from broad destabilization.
  • Follow-on mechanism work: point to likely next steps such as attention follow-up, routing/MoE follow-up, narrower span decomposition, and read-vs-write comparisons.

Quick Start

Draft an intervention plan detailing the target behavior, success criteria, plausible intervention sites, and a paired-control design.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: mechanistic-interventions
Download link: https://github.com/concordance-co/xenon/archive/main.zip#mechanistic-interventions

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.