dataops-root-cause-analysis

Community

Pinpoint DataOps failures in minutes.

Authorivanshamaev
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps teams quickly identify the true root cause of DataOps incidents across pipeline failures, data quality anomalies, and downstream operational impact, instead of repeatedly applying superficial fixes.

Core Features & Use Cases

  • Failure taxonomy + structured RCA workflow: Classify incidents by infrastructure, data, logic, dependencies, configuration, and concurrency, then drive investigation in that order.
  • Airflow, Spark, and Kafka diagnosis playbooks: Triage task states, scheduler health, retry patterns, Spark OOM/skew/FetchFailed/serialization issues, and Kafka consumer lag spikes with targeted checks.
  • Data quality anomaly investigation + timeline reconstruction: Detect volume/freshness/distribution anomalies (e.g., z-scores, null spikes) and rebuild an incident timeline across components for postmortems.

Quick Start

Ask the AI to perform a root cause analysis for an Airflow DAG failure on your incident date and to produce the most likely failure taxonomy path with the exact SQL and command checks to run.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: dataops-root-cause-analysis
Download link: https://github.com/ivanshamaev/de-agent-skills/archive/main.zip#dataops-root-cause-analysis

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.