stata-data-cleaning

Community

Clean and transform messy data in Stata fast

Authorfranklee16
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Stata researchers often struggle with messy, inconsistent datasets that contain missing-value codes, duplicates, outliers, and poorly labeled variables, which prevents reliable analysis.

Core Features & Use Cases

  • Reproducible cleaning pipeline: Generates a Stata do-file structure with setup, logging, and documented transformations so results can be replicated.
  • Common data-quality fixes: Handles duplicates, missing value decoding, string normalization, derived variables, and basic integrity checks using assertions.
  • Analysis-ready outputs: Produces a validated, labeled dataset plus a codebook for downstream modeling and reporting.
  • Use Case: Turn raw survey or administrative data with coded missing values and inconsistent text fields into a clean, analysis-ready dataset suitable for regression or panel construction.

Quick Start

Ask the AI to create a Stata cleaning do-file for my raw dataset raw_survey_data.dta, documenting each transformation, validating assumptions with asserts, and saving cleaned_analysis_data.dta with a codebook.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: stata-data-cleaning
Download link: https://github.com/franklee16/academic-research-skills/archive/main.zip#stata-data-cleaning

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.