book-sft-pipeline
CommunityTurn books into model-ready SFT datasets.
Authoryeeehaooo
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill automates turning full-length books into SFT datasets and author-style training pipelines. It handles extraction from books, segmentation into training chunks, generation of diverse prompts, and packaging of data for LoRA fine-tuning on base models.
Core Features & Use Cases
- Text extraction: Convert books (ePub/text) into clean, model-ready text.
- Intelligent segmentation: Produce 150-400 word chunks with overlaps to preserve narrative style.
- Diverse instruction generation: Use multiple templates and system prompts to prevent memorization of source text.
- Dataset construction: Build JSONL datasets compatible with Tinker-style LoRA fine-tuning workflows.
- Use Case: Train a small to mid-size model to imitate a specific author’s voice using a modular SFT pipeline.
Quick Start
Copy the SKILL.md into your project skills folder and run the conceptual pipeline example to validate the flow and artifacts.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: book-sft-pipeline Download link: https://github.com/yeeehaooo/agent-kit/archive/main.zip#book-sft-pipeline Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.