quota_throttle_expert

Name: quota_throttle_expert
Availability: InStock
Author: aiappsgbb

Official

Diagnose AOAI TPM throttling and fix capacity

Software Engineering #rate limiting #capacity planning #foundry #azure cli #app insights #aoai #tpm throttling

Authoraiappsgbb

Version1.0.0

Installs0

System Documentation

What problem does it solve?

It helps you determine why Foundry hosted agents are hitting 429 “Quota exceeded”/“OperationLimitExceeded” by linking App Insights token utilization to the deployment’s configured TPM capacity.

Core Features & Use Cases

Capacity vs utilization diagnosis: Pulls deployment sku.capacity (TPM) and correlates it with token usage patterns over the failing window in App Insights.
Burst and bottleneck classification: Detects whether throttling is sustained saturation, short-lived burst behavior, noisy-neighbor dominance, or regional/standard-SKU quota limits.
Actionable remediation: Recommends the most appropriate next step (scale capacity or migrate to PTU, introduce rate limiting via APIM, or request quota increase / SKU strategy).

Quick Start

Use quota_throttle_expert to diagnose 429 throttling for your Foundry hosted model deployment over the last two hours and output peak TPM, capacity, utilization percentage, classification, and the single recommended Azure CLI action for remediation.

quota_throttle_expert

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper