activation-patching

Official

Activation patching for causal discovery

Authorndif-team
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Activation patching provides a causal intervention framework to identify which model components (layers, heads, or positions) drive a specific behavior by swapping activations and observing changes in outputs.

Core Features & Use Cases

  • Causal intervention across transformer components (layers, attention heads, token positions) to locate components critical for a behavior.
  • Three-run patching workflow: run clean, run corrupted, patch activations and measure impact to quantify causality.
  • Interpretability outputs and guidance for diagnosing circuits or computational stages in neural networks.

Quick Start

Run a clean prompt and a corrupted prompt, patch activations layer-by-layer, and measure the effect on outputs.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: activation-patching
Download link: https://github.com/ndif-team/skills/archive/main.zip#activation-patching

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.