launch-nemo-rl
CommunityLaunch and debug NeMo-RL on Kubernetes
Software Engineering#kubernetes#ray#job debugging#nemo-rl#cli orchestration#hydra overrides#kubectl logs
Authoryo-steven
Version1.0.0
Installs0
System Documentation
What problem does it solve?
It helps you reliably launch, monitor, stop, and debug NeMo-RL training recipes on a shared Kubernetes cluster using the nrl-k8s CLI, including troubleshooting hung or failed jobs and retrieving logs.
Core Features & Use Cases
- Ephemeral vs long-lived execution: Run one-shot RayJobs that tear down automatically, or iterate on a reusable long-lived RayCluster.
- Config-safe iteration with infra+recipe pairs: Use the correct NeMo-RL recipe and matching K8s/Ray infra files, then apply Hydra entrypoint overrides to test changes without forking recipes.
- Operational observability & debugging: Validate with
check, inspect withstatus, list and fetch logs with job/role commands, and use Ray dashboard APIs when needed. - Cluster lifecycle control: Bring clusters up/down, reuse clusters when specs drift, and manage deployments alongside RayClusters.
Quick Start
Ask the AI to launch your NeMo-RL recipe on Kubernetes by running the appropriate nrl-k8s command for either an ephemeral RayJob or a long-lived RayCluster.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: launch-nemo-rl Download link: https://github.com/yo-steven/skills-exploration-20260522/archive/main.zip#launch-nemo-rl Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.