deep-q-rl

Name: deep-q-rl
Availability: InStock
Author: thistleknot

Community

Train smarter policies from scored moves.

Software Engineering #reinforcement learning #self-play #value function #mcts #dqn #action sampling #mistake correction

Authorthistleknot

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill turns scored, discrete-action decision problems into an efficient training loop by learning a value function while using a progressive, search-guided policy improvement strategy.

Core Features & Use Cases

Dense score-based learning: builds a Q-style value head from a per-state evaluate(state) correlate instead of relying only on sparse terminal rewards.
Russian Doll MCTS with value-head leaves: runs progressive narrowing search so wide action spaces remain tractable, using the network (and a heuristic fallback) to evaluate search leaves.
AHA mistake correction: detects evaluation drops after a chosen action during training and applies immediate corrective replay signal.
Training progress annealing: anneals MCTS iteration counts, exploration, and funnel widths as the value function becomes more reliable.

Use it for environments like board games, turn-based strategy, or any simulation where you can enumerate discrete actions, encode state tensors, and compute a current-player-perspective scalar score that correlates with ultimate success.

Quick Start

Use the deep-q-rl skill to train an agent by implementing the ScoredEnvironment interface with encode_state, evaluate, legal_actions, apply, and is_terminal for your environment, then run self-play or rollout-based training with Russian Doll MCTS and AHA enabled for training.

deep-q-rl

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper