reasoning_curation_sampler
CommunityBalance reasoning classes by token budget.
Education & Research#curriculum design#token budgeting#dataset curation#reasoning transfer#stratified sampling#SFT training
Authorthistleknot
Version1.0.0
Installs0
System Documentation
What problem does it solve?
It solves unstable training behavior caused by mixing reasoning classes with very different lengths and structures, which leads to gradient variance and poor reasoning transfer.
Core Features & Use Cases
- Conditional Stratification: samples by class first, then applies per-class length filtering to preserve each class’s natural length profile.
- Token Budget Equality: up-weights classes by inverse expected token cost so each class contributes equal token exposure.
- Isomorphic Anchoring: pairs structurally similar but semantically distinct samples to encourage reasoning transfer rather than topical clustering.
- Batch Construction Artifacts: produces a batched stream plus guidance maps for sampling weights, length filters, and curriculum pairing.
Quick Start
Use reasoning_curation_sampler to build a stratified, token-balanced SFT batch stream for a multi-class reasoning dataset with tight per-class length constraints.
Dependency Matrix
Required Modules
None requiredComponents
Standard packageđź’» Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: reasoning_curation_sampler Download link: https://github.com/thistleknot/skills/archive/main.zip#reasoning-curation-sampler Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.