quota_throttle_expert

Official

Diagnose AOAI TPM throttling and fix capacity

Authoraiappsgbb
Version1.0.0
Installs0

System Documentation

What problem does it solve?

It helps you determine why Foundry hosted agents are hitting 429 “Quota exceeded”/“OperationLimitExceeded” by linking App Insights token utilization to the deployment’s configured TPM capacity.

Core Features & Use Cases

  • Capacity vs utilization diagnosis: Pulls deployment sku.capacity (TPM) and correlates it with token usage patterns over the failing window in App Insights.
  • Burst and bottleneck classification: Detects whether throttling is sustained saturation, short-lived burst behavior, noisy-neighbor dominance, or regional/standard-SKU quota limits.
  • Actionable remediation: Recommends the most appropriate next step (scale capacity or migrate to PTU, introduce rate limiting via APIM, or request quota increase / SKU strategy).

Quick Start

Use quota_throttle_expert to diagnose 429 throttling for your Foundry hosted model deployment over the last two hours and output peak TPM, capacity, utilization percentage, classification, and the single recommended Azure CLI action for remediation.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: quota_throttle_expert
Download link: https://github.com/aiappsgbb/awesome-gbb/archive/main.zip#quota-throttle-expert

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.