query-dag-failure-debugger
OfficialClassify Airflow DAG failures with live triage.
Authorrazorpay
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Triages and investigates Airflow DAG failures across Spark/EMR and Trino/Presto, providing a bucketed classification, concise root-cause reasoning, and live investigation options using MCP data sources.
Core Features & Use Cases
- Classifies failures into 9 buckets (Airflow orchestration, upstream data, SPARK_RESOURCE, SPARK_RUNTIME, SQL_FAILURE, QUERY_ENGINE_TRANSIENT, PERMISSION_ACCESS, INFRASTRUCTURE, UNKNOWN)
- Pulls evidence from Coralogix, Friday MCPs (AWS, K8s), and Grafana to inform triage
- Performs Phase 1 classification quickly from the DAG code and task logs, with an optional Phase 2 live investigation
- Suitable for Airflow-backed DAGs spanning Spark/EMR and Trino/Presto workloads
Quick Start
Paste an Airflow task error or log snippet to initialize a triage for the target DAG.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: query-dag-failure-debugger Download link: https://github.com/razorpay/trino-gateway/archive/main.zip#query-dag-failure-debugger Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.