launch-nemo-rl

Community

Launch and debug NeMo-RL on Kubernetes

Authoryo-steven
Version1.0.0
Installs0

System Documentation

What problem does it solve?

It helps you reliably launch, monitor, stop, and debug NeMo-RL training recipes on a shared Kubernetes cluster using the nrl-k8s CLI, including troubleshooting hung or failed jobs and retrieving logs.

Core Features & Use Cases

  • Ephemeral vs long-lived execution: Run one-shot RayJobs that tear down automatically, or iterate on a reusable long-lived RayCluster.
  • Config-safe iteration with infra+recipe pairs: Use the correct NeMo-RL recipe and matching K8s/Ray infra files, then apply Hydra entrypoint overrides to test changes without forking recipes.
  • Operational observability & debugging: Validate with check, inspect with status, list and fetch logs with job/role commands, and use Ray dashboard APIs when needed.
  • Cluster lifecycle control: Bring clusters up/down, reuse clusters when specs drift, and manage deployments alongside RayClusters.

Quick Start

Ask the AI to launch your NeMo-RL recipe on Kubernetes by running the appropriate nrl-k8s command for either an ephemeral RayJob or a long-lived RayCluster.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: launch-nemo-rl
Download link: https://github.com/yo-steven/skills-exploration-20260522/archive/main.zip#launch-nemo-rl

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.