quota_throttle_expert
OfficialDiagnose AOAI TPM throttling and fix capacity
Software Engineering#rate limiting#capacity planning#foundry#azure cli#app insights#aoai#tpm throttling
Authoraiappsgbb
Version1.0.0
Installs0
System Documentation
What problem does it solve?
It helps you determine why Foundry hosted agents are hitting 429 “Quota exceeded”/“OperationLimitExceeded” by linking App Insights token utilization to the deployment’s configured TPM capacity.
Core Features & Use Cases
- Capacity vs utilization diagnosis: Pulls deployment
sku.capacity(TPM) and correlates it with token usage patterns over the failing window in App Insights. - Burst and bottleneck classification: Detects whether throttling is sustained saturation, short-lived burst behavior, noisy-neighbor dominance, or regional/standard-SKU quota limits.
- Actionable remediation: Recommends the most appropriate next step (scale capacity or migrate to PTU, introduce rate limiting via APIM, or request quota increase / SKU strategy).
Quick Start
Use quota_throttle_expert to diagnose 429 throttling for your Foundry hosted model deployment over the last two hours and output peak TPM, capacity, utilization percentage, classification, and the single recommended Azure CLI action for remediation.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: quota_throttle_expert Download link: https://github.com/aiappsgbb/awesome-gbb/archive/main.zip#quota-throttle-expert Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.