ops-resilience

Official

Build AWS resilience with time-bound retries.

Authoraws-samples
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Handling AWS control plane failures with time-limited retries and fallback approaches. Use when API calls fail, time out, or return 5xx errors. It covers retry strategies, control plane vs data plane distinction, and alternative approaches when the control plane is unresponsive.

Core Features & Use Cases

  • Standard retry patterns with exponential backoff and a hard deadline to avoid infinite loops.
  • Time-limited retry logic and environment-based configuration for adaptive retries.
  • Guidance on choosing control plane vs data plane fallbacks across services (EC2, RDS, S3, ECS/EKS, Lambda, etc.)
  • Real-world use cases: implementing resilience in automated AWS workflows, incident recovery, and cross-region failover strategies.

Quick Start

Configure your tooling to detect control plane failures and apply a timeout-bound retry loop, then switch to data-plane fallbacks when the control plane is impaired.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: ops-resilience
Download link: https://github.com/aws-samples/sample-migration-agentic-cli-assistant/archive/main.zip#ops-resilience

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.