Alerting & Incident Management

Official

Ensure service reliability with robust alerting.

Authorvertivolatam
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the critical need for proactive service monitoring and rapid response to system failures, minimizing downtime and impact on users.

Core Features & Use Cases

  • Effective Alerting: Design and implement alerts that signal actual problems, not just noise.
  • Incident Response: Establish clear workflows for handling and resolving incidents efficiently.
  • On-Call Management: Configure on-call rotations and escalation policies to ensure timely responses.
  • Runbook Automation: Create and link runbooks for guided incident resolution.
  • Use Case: When a critical service experiences a spike in errors, this Skill ensures the right on-call engineer is paged immediately via PagerDuty, provided with a link to a runbook detailing troubleshooting steps, and a dedicated Slack channel is created for incident coordination.

Quick Start

Configure a complete alerting and incident management system using Prometheus, Alertmanager, and PagerDuty.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: Alerting & Incident Management
Download link: https://github.com/vertivolatam/monorepo/archive/main.zip#alerting-incident-management

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.