Name: verl-rl-training
Availability: InStock
Author: Orchestra-Research

System Documentation

What problem does it solve?

This Skill addresses the complexity and resource demands of training Large Language Models (LLMs) using Reinforcement Learning (RL) at scale, providing a robust framework for advanced post-training techniques.

Core Features & Use Cases

Scalable RL Training: Supports training up to 671B parameter models using distributed backends like FSDP and Megatron-LM.
Flexible RL Algorithms: Implements PPO, GRPO, RLOO, REINFORCE++, and more, with support for custom reward functions.
Backend Agnosticism: Seamlessly switch between rollout engines (vLLM, SGLang) and training backends (FSDP, Megatron-LM).
Use Case: Fine-tune a Llama-3 model using GRPO on a math reasoning dataset to improve its problem-solving capabilities, leveraging a multi-GPU cluster for efficient training.

Quick Start

Use the verl-rl-training skill to launch a GRPO training job for a Qwen2.5-7B model on the GSM8K dataset using 8 GPUs.

Please help me install this Skill: Name: verl-rl-training Download link: https://github.com/Orchestra-Research/AI-Research-SKILLs/archive/main.zip#verl-rl-training Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

verl-rl-training

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper