manual-dataset-cleaning

Community

Keep provenance while cleaning mirrored datasets.

AuthorPans0020
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps you collect and clean interview/dataset records that are duplicated across multiple final artifacts while preserving source provenance and preventing CSV/JSON/Markdown drift.

Core Features & Use Cases

  • Provenance-first collection: capture platform, source URL, post ID, title, author, and raw snapshots before any normalization.
  • Conservative extraction & normalization: keep raw question text, create a cleaned/standardized version separately, and avoid hallucinating missing fields.
  • Synchronized multi-file cleaning: choose a source of truth (CSV/JSON) and apply narrow, anchored edits so mirrored artifacts stay consistent.
  • Deduplication with auditability: merge duplicates only when meaning matches, while preserving all source links and metadata.
  • Verification-driven safety: use parser-based checks, row counts, field allowlists, and mismatch reporting instead of visual inspection.

Quick Start

Ask for the Skill's workflow to manually collect Nowcoder interview records and then synchronize cleanup across your Markdown table, CSV, and JSON while preserving source URLs and raw text snapshots.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: manual-dataset-cleaning
Download link: https://github.com/Pans0020/opencode-skills/archive/main.zip#manual-dataset-cleaning

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.