manual-dataset-cleaning

Name: manual-dataset-cleaning
Availability: InStock
Author: Pans0020

Community

Keep provenance while cleaning mirrored datasets.

Data & Analytics #provenance #data validation #deduplication #data synchronization #dataset cleaning #nowcoder #csv json markdown

AuthorPans0020

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill helps you collect and clean interview/dataset records that are duplicated across multiple final artifacts while preserving source provenance and preventing CSV/JSON/Markdown drift.

Core Features & Use Cases

Provenance-first collection: capture platform, source URL, post ID, title, author, and raw snapshots before any normalization.
Conservative extraction & normalization: keep raw question text, create a cleaned/standardized version separately, and avoid hallucinating missing fields.
Synchronized multi-file cleaning: choose a source of truth (CSV/JSON) and apply narrow, anchored edits so mirrored artifacts stay consistent.
Deduplication with auditability: merge duplicates only when meaning matches, while preserving all source links and metadata.
Verification-driven safety: use parser-based checks, row counts, field allowlists, and mismatch reporting instead of visual inspection.

Quick Start

Ask for the Skill's workflow to manually collect Nowcoder interview records and then synchronize cleanup across your Markdown table, CSV, and JSON while preserving source URLs and raw text snapshots.

manual-dataset-cleaning

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper