sre-expert
OfficialMaster SRE: SLOs, incidents, and ops excellence.
Authorpersonamanagmentlayer
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides expert guidance and tools to implement and manage Site Reliability Engineering (SRE) best practices, ensuring high availability, reliability, and performance of systems.
Core Features & Use Cases
- SLO/SLI Management: Define, track, and calculate compliance for Service Level Objectives and Indicators.
- Incident Management: Create, update, and report on incidents, including MTTR calculation.
- Monitoring & Alerting: Implement best practices for the four golden signals and define alert rules.
- Chaos Engineering: Design and run experiments to proactively identify system weaknesses.
- Use Case: A team can use this Skill to define SLOs for their API, track them against real-time metrics, and manage any incidents that arise, ensuring they meet their reliability targets.
Quick Start
Use the sre-expert skill to define standard SLOs for a web service.
Dependency Matrix
Required Modules
prometheus_clientnumpyrandomdatetimetypingenumtimedataclasses
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: sre-expert Download link: https://github.com/personamanagmentlayer/pcl/archive/main.zip#sre-expert Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.