talos-cluster-ops
OfficialDiagnose and fix Talos Kubernetes issues fast
Software Engineering#incident response#cluster operations#longhorn#argo cd#cilium#talos linux#kubernetes troubleshooting
AuthorProbably-Group
Version1.0.0
Installs0
System Documentation
What problem does it solve?
It resolves slow, broken, or unstable Talos-based Kubernetes clusters by guiding investigation across hardware, Talos, Kubernetes, Cilium networking, Longhorn storage, GitOps drift, and security signals, then proposing remediation that requires explicit approval.
Core Features & Use Cases
- Autonomous investigation workflows for pod crashes, node failures, networking symptoms, storage problems, and GitOps sync failures with structured, multi-phase evidence gathering.
- Approval-gated remediation with explicit risk levels, rollback expectations, and optional snapshot requirements to reduce operational mistakes.
- Cross-layer correlation using Talos health/logs, Kubernetes events/logs/metrics, Cilium/Hubble flow evidence, Longhorn replica health, ArgoCD drift checks, and security signals (Falco/Tetragon/Trivy/SPIRE).
Quick Start
Tell your AI to investigate a failing workload with: "Investigate why my pod rabbitmq-0 in production is crashing using talos-cluster-ops and propose the safest remediation I can approve."
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: talos-cluster-ops Download link: https://github.com/Probably-Group/Dev-AID/archive/main.zip#talos-cluster-ops Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.