causal-tracing

Official

Trace how model components causally shape outputs.

Authorndif-team
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Causal tracing identifies which intermediate computations causally mediate the relationship between inputs and outputs, revealing not just what correlates with behavior but what causes it.

Core Features & Use Cases

  • Three Types of Causal Effects: total, direct, and indirect effects, plus the interchange intervention for testing causal relationships across runs.
  • Position- and layer-specific tracing: diagnose how different tokens and network layers contribute to final predictions.
  • Use cases include debugging model behavior, validating hypotheses about information flow, and guiding interventions to alter outputs.

Quick Start

Run a tracing session on a language model with base and source prompts to compute total, direct, and indirect causal effects.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: causal-tracing
Download link: https://github.com/ndif-team/skills/archive/main.zip#causal-tracing

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.