troubleshoot-zookeeper

Official

Triage ZooKeeper incidents fast via Netdata MCP

Authornetdata
Version1.0.0
Installs0

System Documentation

What problem does it solve?

It helps diagnose Apache ZooKeeper failures by mapping common incident archetypes (quorum loss, write stalls, GC pause cascades, session storms, heap/OOM, and snapshot stalls) to the exact Netdata signals you need, using MCP queries for structured triage and remediation recommendations.

Core Features & Use Cases

  • Operator-playbook diagnostic tree: Routes a coding agent through ZooKeeper health domains (availability, latency, throughput, connections/sessions, replication/sync, data tree/memory, errors/integrity, JVM/resources, security) to narrow root cause.
  • MCP-based signal verification: Performs discovery and verification via list_metrics, query_metrics, and anomaly ranking via find_anomalous_metrics against the relevant zookeeper.* contexts.
  • Remediation confirmation loop: Re-runs the same MCP verification queries after applying remediation to confirm signals return to expected ranges.

Quick Start

Ask an AI agent to troubleshoot your Apache ZooKeeper with Netdata using MCP by querying the ZooKeeper zookeeper.* contexts for the last 15 to 30 minutes, selecting the correct failure archetype, then applying the remediation and verifying the signals recover.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: troubleshoot-zookeeper
Download link: https://github.com/netdata/skills/archive/main.zip#troubleshoot-zookeeper

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.