hai-datapipe
OfficialClean, chunk, and pipeline text for AI-ready data
Authorhai-series
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Text data often arrives noisy and unstructured, requiring repetitive cleaning, chunking, and pipeline orchestration before AI processing. This Skill provides a deterministic, inline solution to preprocess and structure text for downstream tasks.
Core Features & Use Cases
- Text Cleaning: remove HTML tags, URLs, emails, and normalize whitespace.
- Flexible Chunking: multiple modes (sentence, paragraph, markdown, word, character, custom) to fit downstream models.
- Pipeline Orchestration: chain clean, transform, and chunk steps into a reusable workflow for RAG readiness or document ingestion.
- Use Case: prepare large collections of web content for embedding and indexing by a vector store.
Quick Start
Invoke the datapipe API to clean and chunk text, then run a pipeline on your raw input to obtain structured chunks.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: hai-datapipe Download link: https://github.com/hai-series/hai-framework/archive/main.zip#hai-datapipe Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.