troubleshoot-kubernetes-cluster-state

Official

Diagnose Kubernetes control plane failures fast

Authornetdata
Version1.0.0
Installs0

System Documentation

What problem does it solve?

It helps you pinpoint why Kubernetes Cluster State is failing or degraded by guiding structured troubleshooting of control plane availability, performance, etcd health, node health, kubelet health, pod lifecycle, and related signals observed through Netdata’s MCP surface.

Core Features & Use Cases

  • Signal-driven triage: Routes an investigation through distinct failure archetypes such as control plane unavailable/slow, reconciliation stall, node failures, and admission blockage, using Netdata operator-playbook-aligned signal domains.
  • MCP-based evidence gathering: Performs MCP queries for Kubernetes Cluster State health, including focused checks for api server health, etcd leader existence/latency, API request tail latency (p99), WAL fsync latency, and leader changes.
  • Remediation validation loop: Re-runs the same MCP verification queries after applying remediation to confirm signals return to expected ranges and the issue is truly resolved.

Quick Start

Use this skill when diagnosing Kubernetes Cluster State incidents by asking your agent to troubleshoot k8s cluster state for control plane unavailability and validate the fix by re-running the MCP verification queries.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: troubleshoot-kubernetes-cluster-state
Download link: https://github.com/netdata/skills/archive/main.zip#troubleshoot-kubernetes-cluster-state

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.