troubleshoot-zookeeper
OfficialTriage ZooKeeper incidents fast via Netdata MCP
System Documentation
What problem does it solve?
It helps diagnose Apache ZooKeeper failures by mapping common incident archetypes (quorum loss, write stalls, GC pause cascades, session storms, heap/OOM, and snapshot stalls) to the exact Netdata signals you need, using MCP queries for structured triage and remediation recommendations.
Core Features & Use Cases
- Operator-playbook diagnostic tree: Routes a coding agent through ZooKeeper health domains (availability, latency, throughput, connections/sessions, replication/sync, data tree/memory, errors/integrity, JVM/resources, security) to narrow root cause.
- MCP-based signal verification: Performs discovery and verification via
list_metrics,query_metrics, and anomaly ranking viafind_anomalous_metricsagainst the relevantzookeeper.*contexts. - Remediation confirmation loop: Re-runs the same MCP verification queries after applying remediation to confirm signals return to expected ranges.
Quick Start
Ask an AI agent to troubleshoot your Apache ZooKeeper with Netdata using MCP by querying the ZooKeeper zookeeper.* contexts for the last 15 to 30 minutes, selecting the correct failure archetype, then applying the remediation and verifying the signals recover.
Dependency Matrix
Required Modules
None requiredComponents
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: troubleshoot-zookeeper Download link: https://github.com/netdata/skills/archive/main.zip#troubleshoot-zookeeper Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.