sre

Community

Diagnose Kubernetes incidents with confidence.

Authorionfury
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Diagnose Kubernetes incidents by performing a structured, read-only investigation to identify root causes.

Core Features & Use Cases

  • 5 Whys Analysis: Apply iterative questioning to reach actionable root causes without jumping to conclusions.
  • Phased Investigation: Triage, data collection, correlation, root-cause determination, and remediation guidance.
  • Multi-Source Data: Integrates logs, events, metrics, and configurations to build a cohesive incident picture.
  • Guided Playbook: Provides repeatable steps for common Kubernetes incidents (pod failures, CrashLoopBackOff, networking issues, deployment failures, and Flux/Helm problems).
  • Remediation Guidance: Offers immediate and longer-term mitigation strategies and prevention suggestions.

Quick Start

Run the cluster-health.sh script to capture a read-only health snapshot of the target Kubernetes cluster and namespace.

Dependency Matrix

Required Modules

None required

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: sre
Download link: https://github.com/ionfury/homelab/archive/main.zip#sre

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.