manual-dataset-cleaning
CommunityKeep provenance while cleaning mirrored datasets.
Data & Analytics#provenance#data validation#deduplication#data synchronization#dataset cleaning#nowcoder#csv json markdown
AuthorPans0020
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps you collect and clean interview/dataset records that are duplicated across multiple final artifacts while preserving source provenance and preventing CSV/JSON/Markdown drift.
Core Features & Use Cases
- Provenance-first collection: capture platform, source URL, post ID, title, author, and raw snapshots before any normalization.
- Conservative extraction & normalization: keep raw question text, create a cleaned/standardized version separately, and avoid hallucinating missing fields.
- Synchronized multi-file cleaning: choose a source of truth (CSV/JSON) and apply narrow, anchored edits so mirrored artifacts stay consistent.
- Deduplication with auditability: merge duplicates only when meaning matches, while preserving all source links and metadata.
- Verification-driven safety: use parser-based checks, row counts, field allowlists, and mismatch reporting instead of visual inspection.
Quick Start
Ask for the Skill's workflow to manually collect Nowcoder interview records and then synchronize cleanup across your Markdown table, CSV, and JSON while preserving source URLs and raw text snapshots.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: manual-dataset-cleaning Download link: https://github.com/Pans0020/opencode-skills/archive/main.zip#manual-dataset-cleaning Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.