stata-data-cleaning
CommunityClean and transform messy data in Stata fast
Data & Analytics#data validation#data cleaning#reproducibility#missing values#stata#data wrangling#econ research
Authorfranklee16
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Stata researchers often struggle with messy, inconsistent datasets that contain missing-value codes, duplicates, outliers, and poorly labeled variables, which prevents reliable analysis.
Core Features & Use Cases
- Reproducible cleaning pipeline: Generates a Stata do-file structure with setup, logging, and documented transformations so results can be replicated.
- Common data-quality fixes: Handles duplicates, missing value decoding, string normalization, derived variables, and basic integrity checks using assertions.
- Analysis-ready outputs: Produces a validated, labeled dataset plus a codebook for downstream modeling and reporting.
- Use Case: Turn raw survey or administrative data with coded missing values and inconsistent text fields into a clean, analysis-ready dataset suitable for regression or panel construction.
Quick Start
Ask the AI to create a Stata cleaning do-file for my raw dataset raw_survey_data.dta, documenting each transformation, validating assumptions with asserts, and saving cleaned_analysis_data.dta with a codebook.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: stata-data-cleaning Download link: https://github.com/franklee16/academic-research-skills/archive/main.zip#stata-data-cleaning Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.