hugging-face-datasets

Official

Manage Hugging Face datasets with SQL.

Authorhuggingface
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Hugging Face datasets often require manual setup for repo creation, configuration, and data processing. This Skill provides an end-to-end workflow to initialize, configure, and edit datasets, plus SQL-based discovery, transformation, and export capabilities.

Core Features & Use Cases

  • Dataset lifecycle management: initialize repos, configure system prompts, and manage content with templates.
  • SQL-based querying and transformation: query HF datasets using DuckDB, describe schemas, sample data, join datasets, and export to Parquet/JSONL.
  • HF Hub integration: push results to new datasets, manage access, and organize multi-split workflows.

Quick Start

Use uv run scripts/dataset_manager.py init to create a new dataset, then uv run scripts/dataset_manager.py quick_setup --template chat --repo_id "your-username/your-dataset" to bootstrap a dataset with chat templates. Then run uv run scripts/sql_manager.py query --dataset "your-username/your-dataset" --sql "SELECT * FROM data" to inspect.

Dependency Matrix

Required Modules

duckdbhuggingface_hubdatasets

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: hugging-face-datasets
Download link: https://github.com/huggingface/skills/archive/main.zip#hugging-face-datasets

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.