agent-health-monitoring

Name: agent-health-monitoring
Availability: InStock
Author: cosmicstack-labs

Official

Keep multi-agent systems healthy in production

Software Engineering #multi-agent #observability #incident response #alerting #anomaly detection #agent monitoring #latency metrics

Authorcosmicstack-labs

Version1.0.0

Installs0

System Documentation

What problem does it solve?

Production multi-agent systems can fail silently—agents may become unresponsive, loop, degrade latency, or produce errors without triggering standard infrastructure alerts.

Core Features & Use Cases

Agent vital-sign monitoring: Tracks response rate, latency (P50/P95/P99), error rate, step count, tool success rate, token consumption, context pressure, and hallucination/quality proxies.
Liveness, readiness, and deep checks: Verifies the agent is alive, ready to accept work, and capable of producing valid outputs via diagnostic test tasks.
Anomaly detection and severity-based alerting: Detects statistically unusual behavior and routes incidents by P0–P3 severity to the appropriate channels.
Operational dashboards and runbooks: Provides the panel set and incident playbooks needed to investigate and recover quickly (unresponsive agents, error spikes, token budget anomalies).

Quick Start

Ask your agent to “Check agent health for all agents, detect anomalies from the last metrics window, and generate the monitoring dashboard plus the recommended incident response steps for any P0/P1 alerts.”

agent-health-monitoring

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper