hai-datapipe

Official

Clean, chunk, and pipeline text for AI-ready data

Authorhai-series
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Text data often arrives noisy and unstructured, requiring repetitive cleaning, chunking, and pipeline orchestration before AI processing. This Skill provides a deterministic, inline solution to preprocess and structure text for downstream tasks.

Core Features & Use Cases

  • Text Cleaning: remove HTML tags, URLs, emails, and normalize whitespace.
  • Flexible Chunking: multiple modes (sentence, paragraph, markdown, word, character, custom) to fit downstream models.
  • Pipeline Orchestration: chain clean, transform, and chunk steps into a reusable workflow for RAG readiness or document ingestion.
  • Use Case: prepare large collections of web content for embedding and indexing by a vector store.

Quick Start

Invoke the datapipe API to clean and chunk text, then run a pipeline on your raw input to obtain structured chunks.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: hai-datapipe
Download link: https://github.com/hai-series/hai-framework/archive/main.zip#hai-datapipe

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.