Precision + Numerics Stabilizer (bf16/fp16 Done Right)
CommunitySafe, reliable mixed-precision training
Authorsovr610
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Add a precision layer and numerics instrumentation that makes mixed-precision training safe to run unattended by detecting NaNs/Inf, monitoring gradient norms, and providing reproducible failure snapshots.
Core Features & Use Cases
- Automatic detection of NaNs, Inf, and gradient overflow with auto-abort or debug snapshots.
- GradScaler integration for fp16 with automatic scale management and safe unscale/clip ordering.
- Flexible precision modes: bf16, fp16, or fp32 with autocast constraints and master-weights-in-fp32 invariant.
- Numerics sentinel: gradient norms, logit monitoring, activation checks, and weight non-finite checks.
- Snapshotting: 7-file failure snapshots with atomic writes for reproducibility, diagnostic data, and repro scripts.
Quick Start
Integrate this stabilizer into your PyTorch training workflow and run your loop to enable automatic NaN detection, mixed-precision safety, and reproducible failure snapshots.
Dependency Matrix
Required Modules
torchnumpypytest
Components
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: Precision + Numerics Stabilizer (bf16/fp16 Done Right) Download link: https://github.com/sovr610/refffiy/archive/main.zip#precision-numerics-stabilizer-bf16-fp16-done-right Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.