dataops-root-cause-analysis
CommunityPinpoint DataOps failures in minutes.
Authorivanshamaev
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps teams quickly identify the true root cause of DataOps incidents across pipeline failures, data quality anomalies, and downstream operational impact, instead of repeatedly applying superficial fixes.
Core Features & Use Cases
- Failure taxonomy + structured RCA workflow: Classify incidents by infrastructure, data, logic, dependencies, configuration, and concurrency, then drive investigation in that order.
- Airflow, Spark, and Kafka diagnosis playbooks: Triage task states, scheduler health, retry patterns, Spark OOM/skew/FetchFailed/serialization issues, and Kafka consumer lag spikes with targeted checks.
- Data quality anomaly investigation + timeline reconstruction: Detect volume/freshness/distribution anomalies (e.g., z-scores, null spikes) and rebuild an incident timeline across components for postmortems.
Quick Start
Ask the AI to perform a root cause analysis for an Airflow DAG failure on your incident date and to produce the most likely failure taxonomy path with the exact SQL and command checks to run.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: dataops-root-cause-analysis Download link: https://github.com/ivanshamaev/de-agent-skills/archive/main.zip#dataops-root-cause-analysis Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.