rl
CommunityBoost RL generalization via epistemic reward shaping.
System Documentation
What problem does it solve?
This Skill captures a reproducible, self-contained dual-agent reinforcement learning workflow that demonstrates epistemic reward shaping, enabling researchers to study how a teacher network can shape a student's learning via task-specific reward maps and per-task delta signals, improving generalization to unseen environments.
Core Features & Use Cases
- Dual PPO Student networks with separate optimizers and memory buffers.
- Teacher network that outputs an 8x8 reward map conditioned on an epistemic state.
- Deterministic, notebook-based training without external RL libraries.
- Per-task delta-based teacher reward, per-task success tracking, and a curriculum of 10 training tasks plus 3 unseen test tasks.
- ONNX export for frontend visualization and JSON/episode logs for analysis.
- Generalization evaluation on unseen gridworld layouts to quantify transfer.
Quick Start
Open the epistemic_rl.ipynb notebook and run the training loop for the teacher-guided or baseline random conditions to reproduce PPOStudent and TeacherAgent interactions and export ONNX models.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: rl Download link: https://github.com/hrushi2501/rl-teacher/archive/main.zip#rl Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.