aiops-autonomous-incident-response

Community

Diagnose incidents and start recovery fast.

Authorivanshamaev
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill reduces MTTR by autonomously diagnosing data platform incidents from alert signals and assembling a structured draft for RCA and runbook actions.

Core Features & Use Cases

  • Alert-to-diagnosis loop: classifies severity, then runs an LLM-driven diagnosis workflow using Kubernetes, Prometheus, Airflow metadata, and pod logs to confirm likely failure modes.
  • Automated RCA draft generation: correlates logs/metrics/events into a failure taxonomy and produces an incident report suitable for postmortem follow-up.
  • Self-healing with approval gate: executes only low-risk corrective actions automatically and routes medium/high-risk actions through Slack for human approval.

Quick Start

Use the aiops-autonomous-incident-response skill to handle an Alertmanager firing alert for your data platform and produce an incident report with recommended follow-up actions.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: aiops-autonomous-incident-response
Download link: https://github.com/ivanshamaev/de-agent-skills/archive/main.zip#aiops-autonomous-incident-response

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.