research-data-management

Name: research-data-management
Availability: InStock
Author: xjtulyc

Community

Manage FAIR research data with anonymization and DVC.

Education & Research #anonymization #research data management #fair metadata #dataset documentation #codebook generation #dvc version control #k-anonymity

Authorxjtulyc

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill solves the challenge of keeping research datasets well-documented, privacy-protecting, and reproducibly versioned from collection through archival, without relying on manual or ad-hoc procedures.

Core Features & Use Cases

Automated codebook generation: Create variable-level documentation (types, missingness, ranges, and example values) directly from pandas DataFrames.
Privacy-preserving release workflows: Apply pseudonymization to identifier columns (e.g., salted SHA-256 hashing) and reduce re-identification risk with k-anonymity checks plus quasi-identifier generalization (e.g., age binning, ZIP generalization).
Reproducible data versioning with DVC: Track large data files with DVC and define cacheable pipelines using dvc.yaml stages and params.yaml.
FAIR-compliant dataset documentation: Produce a README_data.md template capturing provenance, licensing, citation, and contact fields.

Quick Start

Use the research-data-management skill to generate a codebook for your DataFrame, pseudonymize specified PII columns, shift and generalize quasi-identifiers, validate k-anonymity, and output a FAIR-ready README_data.md along with the DVC pipeline files you need for reproducible versioning.

research-data-management

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper