convention-data-handling

Community

Standardize data handling for quality and efficiency.

AuthorsunLeee
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Data engineers and analysts often struggle with inconsistent handling of missing data, outliers, vectorization, and scaling across pipelines, which can cause unreliable results and inefficient processing. This skill provides a standardized, actionable guide to data-handling conventions that improve quality, reproducibility, and performance.

Core Features & Use Cases

  • Missing data strategies: guidelines for detection, imputation, and validation to ensure data completeness.
  • Outlier handling: detection using robust methods and practical remediation like capping or transformation.
  • Vectorization best practices: promote Pandas/Numpy vectorized operations over explicit Python loops for speed and memory efficiency.
  • Scaling for large datasets: memory-aware data types and chunked processing to enable scalable analytics.
  • Use Case: apply conventions to a customer analytics dataset to improve model reliability and processing times.

Quick Start

Apply these conventions to a new dataset by identifying missing values, selecting an imputation strategy, detecting outliers, choosing vectorized operations, and selecting memory-efficient data types.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: convention-data-handling
Download link: https://github.com/sunLeee/optimization/archive/main.zip#convention-data-handling

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.