attribution-patching
OfficialScale circuit analysis with gradient patching.
Data & Analytics#analysis#attribution#neural-networks#transformers#gradient#interpretability#activation-patching
Authorndif-team
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Gradient-based attribution patching provides a scalable alternative to exact activation patching by using gradients to approximate patch effects, enabling analysis across thousands of components without dozens of forward passes.
Core Features & Use Cases
- Efficiently estimate per-component contributions by combining clean vs. corrupted activations with backward gradients.
- Supports batch processing across multiple prompts and layers, enabling large-scale circuit analysis.
- Useful for rapid hypothesis testing, instrumentation, and screening before targeted, exact patching.
Quick Start
Run attribution-patching on a model to generate layer-wise attributions for a given prompt.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: attribution-patching Download link: https://github.com/ndif-team/skills/archive/main.zip#attribution-patching Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.