research-data-management
CommunityManage FAIR research data with anonymization and DVC.
System Documentation
What problem does it solve?
This Skill solves the challenge of keeping research datasets well-documented, privacy-protecting, and reproducibly versioned from collection through archival, without relying on manual or ad-hoc procedures.
Core Features & Use Cases
- Automated codebook generation: Create variable-level documentation (types, missingness, ranges, and example values) directly from pandas DataFrames.
- Privacy-preserving release workflows: Apply pseudonymization to identifier columns (e.g., salted SHA-256 hashing) and reduce re-identification risk with k-anonymity checks plus quasi-identifier generalization (e.g., age binning, ZIP generalization).
- Reproducible data versioning with DVC: Track large data files with DVC and define cacheable pipelines using dvc.yaml stages and params.yaml.
- FAIR-compliant dataset documentation: Produce a README_data.md template capturing provenance, licensing, citation, and contact fields.
Quick Start
Use the research-data-management skill to generate a codebook for your DataFrame, pseudonymize specified PII columns, shift and generalize quasi-identifiers, validate k-anonymity, and output a FAIR-ready README_data.md along with the DVC pipeline files you need for reproducible versioning.
Dependency Matrix
Required Modules
Components
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: research-data-management Download link: https://github.com/xjtulyc/awesome-rosetta-skills/archive/main.zip#research-data-management Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.