chinese-social-data
CommunityAccess and preprocess China social datasets fast.
Data & Analytics#encoding#data cleaning#survey data#china#variable harmonization#administrative codes#chinese text preprocessing
AuthorYuuqq
Version1.0.0
Installs0
System Documentation
What problem does it solve?
It removes friction in finding, loading, cleaning, harmonizing, and preparing Chinese social science data so you can move from raw files to analysis-ready datasets quickly.
Core Features & Use Cases
- Survey dataset access & loading: Handles common Chinese survey microdata formats (Stata/SPSS/SAS) and typical encoding issues.
- Cleaning & harmonization workflows: Converts survey-specific missing-value codes to standard missing values and harmonizes variables across waves (e.g., CGSS).
- China-specific administrative and text workflows: Supports administrative division code parsing (GB/T 2260) and Chinese text preprocessing (jieba segmentation, stopwords, term dictionary).
- Social media collection & privacy guardrails: Provides a research-oriented approach for collecting and cleaning social media text while emphasizing PIPL compliance and de-identification.
Quick Start
Load your CGSS or CFPS dataset file, clean its missing-value codes, harmonize key variables across waves, and output an analysis-ready pandas DataFrame for downstream modeling.
Dependency Matrix
Required Modules
numpypandasjiebarequeststimere
Components
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: chinese-social-data Download link: https://github.com/Yuuqq/claude-social-science-skills/archive/main.zip#chinese-social-data Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.