book-sft-pipeline

Community

Turn books into model-ready SFT datasets.

Authoryeeehaooo
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates turning full-length books into SFT datasets and author-style training pipelines. It handles extraction from books, segmentation into training chunks, generation of diverse prompts, and packaging of data for LoRA fine-tuning on base models.

Core Features & Use Cases

  • Text extraction: Convert books (ePub/text) into clean, model-ready text.
  • Intelligent segmentation: Produce 150-400 word chunks with overlaps to preserve narrative style.
  • Diverse instruction generation: Use multiple templates and system prompts to prevent memorization of source text.
  • Dataset construction: Build JSONL datasets compatible with Tinker-style LoRA fine-tuning workflows.
  • Use Case: Train a small to mid-size model to imitate a specific author’s voice using a modular SFT pipeline.

Quick Start

Copy the SKILL.md into your project skills folder and run the conceptual pipeline example to validate the flow and artifacts.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: book-sft-pipeline
Download link: https://github.com/yeeehaooo/agent-kit/archive/main.zip#book-sft-pipeline

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.