Name: Data Loader Throughput + Sequence Packing
Availability: InStock
Author: sovr610

System Documentation

What problem does it solve?

DataLoader Throughput + Sequence Packing provides a structured approach to measuring and optimizing the end-to-end data pipeline for large-scale training, identifying data stalls, and eliminating wasted compute from padding.

Core Features & Use Cases

Audits and improves input data throughput by instrumenting the data loading and GPU compute phases.
Supports deterministic per-rank sharding, streaming and memmap backends, and bucketing/padding strategies to maximize effective tokens per second.
Offers utilities for sequence packing (pretraining blocks and SFT boundary-aware packing) and integrated metrics reporting to guide configuration.

Quick Start

Configure a synthetic dataset and run the six-phase pipeline to observe throughput gains and iterate on packing and sharding settings.

Please help me install this Skill: Name: Data Loader Throughput + Sequence Packing Download link: https://github.com/sovr610/refffiy/archive/main.zip#data-loader-throughput-sequence-packing Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

Data Loader Throughput + Sequence Packing

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper