checkpoints

Official

Reliable ML checkpointing for resume and sampling

Authorthinking-machines-lab
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Checkpointing solves the challenge of restarting interrupted training runs and exporting sampler weights for evaluation by providing structured saving, loading, and lifecycle management through a CheckpointRecord.

Core Features & Use Cases

  • Save full training state with save_state to resume training, including weights and optimizer state.
  • Save only sampler weights with save_weights_for_sampler for sampling/export, or use save_weights_and_get_sampling_client to obtain a ready-to-use client.
  • Manage checkpoints via REST API and CLI (list, publish, set TTL, delete) to maintain lifecycle and provenance.
  • Serialize and deserialize checkpoint records with CheckpointRecord for robust tracking and reproducibility.

Quick Start

Save the full training state with save_state for resumption, or export only the sampler weights with save_weights_for_sampler for fast evaluation.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: checkpoints
Download link: https://github.com/thinking-machines-lab/tinker-cookbook/archive/main.zip#checkpoints

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.