activation-patching
OfficialActivation patching for causal discovery
Data & Analytics#neural-networks#transformers#model-interpretability#activation-patching#ablation-study#causal-intervention#logit-diff
Authorndif-team
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Activation patching provides a causal intervention framework to identify which model components (layers, heads, or positions) drive a specific behavior by swapping activations and observing changes in outputs.
Core Features & Use Cases
- Causal intervention across transformer components (layers, attention heads, token positions) to locate components critical for a behavior.
- Three-run patching workflow: run clean, run corrupted, patch activations and measure impact to quantify causality.
- Interpretability outputs and guidance for diagnosing circuits or computational stages in neural networks.
Quick Start
Run a clean prompt and a corrupted prompt, patch activations layer-by-layer, and measure the effect on outputs.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: activation-patching Download link: https://github.com/ndif-team/skills/archive/main.zip#activation-patching Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.