trino-self-healing-platform
CommunityKeep Trino running with autonomous self-healing.
Software Engineering#trino#airflow#self-healing#watchdog#iceberg maintenance#query anomaly detection#rca automation
Authorivanshamaev
Version1.0.0
Installs0
System Documentation
What problem does it solve?
It reduces on-call burden by autonomously detecting and recovering from common Trino cluster failures (hung queries, worker loss, OOM-killed queries) and by proactively maintaining query performance and Iceberg health.
Core Features & Use Cases
- Hung Query Killer: Classifies queries by source (interactive vs batch) and kills those exceeding time thresholds.
- Memory Pressure Relief: Detects critical memory pressure and terminates low-priority queries to restore stability.
- Iceberg Auto-Maintenance: Triggers Iceberg compaction and stale statistics ANALYZE based on small-file and staleness signals.
- RCA Generation: Produces structured incident root-cause analysis using a Claude-based workflow when alerts fire.
- Use Case: When Prometheus/AlertManager signals elevated latency or query failures, run this skill to kill stuck queries, relieve memory pressure, compact/ANALYZE impacted Iceberg tables, and generate an RCA report for operators.
Quick Start
Ask the agent to enable the Trino self-healing watchdog and configure it to run every 15 minutes against your Trino coordinator, using AlertManager webhooks and Airflow scheduling.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: trino-self-healing-platform Download link: https://github.com/ivanshamaev/de-agent-skills/archive/main.zip#trino-self-healing-platform Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.