sre-expert

Official

Master SRE: SLOs, incidents, and ops excellence.

Authorpersonamanagmentlayer
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides expert guidance and tools to implement and manage Site Reliability Engineering (SRE) best practices, ensuring high availability, reliability, and performance of systems.

Core Features & Use Cases

  • SLO/SLI Management: Define, track, and calculate compliance for Service Level Objectives and Indicators.
  • Incident Management: Create, update, and report on incidents, including MTTR calculation.
  • Monitoring & Alerting: Implement best practices for the four golden signals and define alert rules.
  • Chaos Engineering: Design and run experiments to proactively identify system weaknesses.
  • Use Case: A team can use this Skill to define SLOs for their API, track them against real-time metrics, and manage any incidents that arise, ensuring they meet their reliability targets.

Quick Start

Use the sre-expert skill to define standard SLOs for a web service.

Dependency Matrix

Required Modules

prometheus_clientnumpyrandomdatetimetypingenumtimedataclasses

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: sre-expert
Download link: https://github.com/personamanagmentlayer/pcl/archive/main.zip#sre-expert

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.