research-data-management

Community

Manage FAIR research data with anonymization and DVC.

Authorxjtulyc
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill solves the challenge of keeping research datasets well-documented, privacy-protecting, and reproducibly versioned from collection through archival, without relying on manual or ad-hoc procedures.

Core Features & Use Cases

  • Automated codebook generation: Create variable-level documentation (types, missingness, ranges, and example values) directly from pandas DataFrames.
  • Privacy-preserving release workflows: Apply pseudonymization to identifier columns (e.g., salted SHA-256 hashing) and reduce re-identification risk with k-anonymity checks plus quasi-identifier generalization (e.g., age binning, ZIP generalization).
  • Reproducible data versioning with DVC: Track large data files with DVC and define cacheable pipelines using dvc.yaml stages and params.yaml.
  • FAIR-compliant dataset documentation: Produce a README_data.md template capturing provenance, licensing, citation, and contact fields.

Quick Start

Use the research-data-management skill to generate a codebook for your DataFrame, pseudonymize specified PII columns, shift and generalize quasi-identifiers, validate k-anonymity, and output a FAIR-ready README_data.md along with the DVC pipeline files you need for reproducible versioning.

Dependency Matrix

Required Modules

pythonpandasdvcnumpypython-dotenv

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: research-data-management
Download link: https://github.com/xjtulyc/awesome-rosetta-skills/archive/main.zip#research-data-management

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.