rl

Name: rl
Availability: InStock
Author: hrushi2501

Community

Boost RL generalization via epistemic reward shaping.

Education & Research #generalization #reinforcement-learning #ppo #teacher-student #epistemic-reward-shaping #gridworld #onnx-export

Authorhrushi2501

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill captures a reproducible, self-contained dual-agent reinforcement learning workflow that demonstrates epistemic reward shaping, enabling researchers to study how a teacher network can shape a student's learning via task-specific reward maps and per-task delta signals, improving generalization to unseen environments.

Core Features & Use Cases

Dual PPO Student networks with separate optimizers and memory buffers.
Teacher network that outputs an 8x8 reward map conditioned on an epistemic state.
Deterministic, notebook-based training without external RL libraries.
Per-task delta-based teacher reward, per-task success tracking, and a curriculum of 10 training tasks plus 3 unseen test tasks.
ONNX export for frontend visualization and JSON/episode logs for analysis.
Generalization evaluation on unseen gridworld layouts to quantify transfer.

Quick Start

Open the epistemic_rl.ipynb notebook and run the training loop for the teacher-guided or baseline random conditions to reproduce PPOStudent and TeacherAgent interactions and export ONNX models.

rl

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper