convention-data-handling
CommunityStandardize data handling for quality and efficiency.
System Documentation
What problem does it solve?
Data engineers and analysts often struggle with inconsistent handling of missing data, outliers, vectorization, and scaling across pipelines, which can cause unreliable results and inefficient processing. This skill provides a standardized, actionable guide to data-handling conventions that improve quality, reproducibility, and performance.
Core Features & Use Cases
- Missing data strategies: guidelines for detection, imputation, and validation to ensure data completeness.
- Outlier handling: detection using robust methods and practical remediation like capping or transformation.
- Vectorization best practices: promote Pandas/Numpy vectorized operations over explicit Python loops for speed and memory efficiency.
- Scaling for large datasets: memory-aware data types and chunked processing to enable scalable analytics.
- Use Case: apply conventions to a customer analytics dataset to improve model reliability and processing times.
Quick Start
Apply these conventions to a new dataset by identifying missing values, selecting an imputation strategy, detecting outliers, choosing vectorized operations, and selecting memory-efficient data types.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: convention-data-handling Download link: https://github.com/sunLeee/optimization/archive/main.zip#convention-data-handling Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.