causal-tracing
OfficialTrace how model components causally shape outputs.
Data & Analytics#transformers#interpretability#language-models#causal-analysis#causal-tracing#activation-tracing
Authorndif-team
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Causal tracing identifies which intermediate computations causally mediate the relationship between inputs and outputs, revealing not just what correlates with behavior but what causes it.
Core Features & Use Cases
- Three Types of Causal Effects: total, direct, and indirect effects, plus the interchange intervention for testing causal relationships across runs.
- Position- and layer-specific tracing: diagnose how different tokens and network layers contribute to final predictions.
- Use cases include debugging model behavior, validating hypotheses about information flow, and guiding interventions to alter outputs.
Quick Start
Run a tracing session on a language model with base and source prompts to compute total, direct, and indirect causal effects.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: causal-tracing Download link: https://github.com/ndif-team/skills/archive/main.zip#causal-tracing Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.