sagemaker-hyperpod

Community

Provision and manage SageMaker HyperPod clusters with ease.

Authordgallitelli
Version1.0.0
Installs0

System Documentation

What problem does it solve?

HyperPod cluster provisioning and management for distributed ML training in SageMaker, enabling teams to deploy scalable GPU and Trainium-based clusters quickly and with governance.

Core Features & Use Cases

  • Unified provisioning across EKS and Slurm for HyperPod clusters
  • Seamless job submission and monitoring for distributed training workloads
  • Prerequisites checks, quota awareness, and add-on compatibility validation
  • Troubleshooting guidance and production-grade best practices

Quick Start

Install the HyperPod CLI, run hyp init to create a cluster stack, validate the configuration with hyp validate, and deploy the cluster with hyp create.

Dependency Matrix

Required Modules

jqawsclibc

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: sagemaker-hyperpod
Download link: https://github.com/dgallitelli/aws-hyperpod-skill/archive/main.zip#sagemaker-hyperpod

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.