sre
CommunityDiagnose Kubernetes incidents with confidence.
Authorionfury
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Diagnose Kubernetes incidents by performing a structured, read-only investigation to identify root causes.
Core Features & Use Cases
- 5 Whys Analysis: Apply iterative questioning to reach actionable root causes without jumping to conclusions.
- Phased Investigation: Triage, data collection, correlation, root-cause determination, and remediation guidance.
- Multi-Source Data: Integrates logs, events, metrics, and configurations to build a cohesive incident picture.
- Guided Playbook: Provides repeatable steps for common Kubernetes incidents (pod failures, CrashLoopBackOff, networking issues, deployment failures, and Flux/Helm problems).
- Remediation Guidance: Offers immediate and longer-term mitigation strategies and prevention suggestions.
Quick Start
Run the cluster-health.sh script to capture a read-only health snapshot of the target Kubernetes cluster and namespace.
Dependency Matrix
Required Modules
None requiredComponents
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: sre Download link: https://github.com/ionfury/homelab/archive/main.zip#sre Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.