sagemaker-hyperpod
CommunityProvision and manage SageMaker HyperPod clusters with ease.
Authordgallitelli
Version1.0.0
Installs0
System Documentation
What problem does it solve?
HyperPod cluster provisioning and management for distributed ML training in SageMaker, enabling teams to deploy scalable GPU and Trainium-based clusters quickly and with governance.
Core Features & Use Cases
- Unified provisioning across EKS and Slurm for HyperPod clusters
- Seamless job submission and monitoring for distributed training workloads
- Prerequisites checks, quota awareness, and add-on compatibility validation
- Troubleshooting guidance and production-grade best practices
Quick Start
Install the HyperPod CLI, run hyp init to create a cluster stack, validate the configuration with hyp validate, and deploy the cluster with hyp create.
Dependency Matrix
Required Modules
jqawsclibc
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: sagemaker-hyperpod Download link: https://github.com/dgallitelli/aws-hyperpod-skill/archive/main.zip#sagemaker-hyperpod Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.