dataset-cleaning

Community

Clean and normalize scraped skill records

Authorzhang-ming-hui
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Scraped skill records often contain duplicates, malformed fields, and inconsistent normalized values that degrade indexing quality and retrieval reliability.

Core Features & Use Cases

  • Deterministic record normalization: Applies repeatable rules to standardize formats while preserving the underlying meaning.
  • Deduplication and validation for indexing compatibility: Reduces duplicate/malformed entries and ensures the cleaned dataset still matches the index-ready record shape.
  • Checkpoint-friendly repair workflow: Uses crawl checkpoints and sample datasets to safely improve quality without losing traceability.

Quick Start

Use the dataset-cleaning skill to clean skills_data_500.json/csv using skills_checkpoint.json and produce safer, index-compatible cleaned output with consistent normalized fields.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: dataset-cleaning
Download link: https://github.com/zhang-ming-hui/ackownledge/archive/main.zip#dataset-cleaning

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.