data-patterns

Community

Build robust RAG data pipelines.

Authorpvliesdonk
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides essential patterns and best practices for building reliable and efficient data pipelines for Retrieval Augmented Generation (RAG) systems, addressing challenges in data preparation, storage, and validation.

Core Features & Use Cases

  • Chunking Strategies: Offers various methods (fixed-size, semantic, document-aware) for optimal text splitting.
  • Vector Store Selection: Guides users on choosing the right vector database based on scale, features, and hosting needs.
  • Data Validation: Implements robust validation using Pydantic and Pandera for structured data and embeddings.
  • Schema Evolution: Defines strategies for managing changes in data schemas over time.
  • RAG Evaluation: Provides metrics and approaches for assessing retrieval and end-to-end RAG performance.
  • Use Case: When developing a RAG system for customer support documentation, this Skill helps select the best chunking strategy for technical articles, choose an appropriate vector store like Qdrant for scalability, and implement validation to ensure data quality before indexing.

Quick Start

Use the data-patterns skill to explore chunking strategies for technical documentation.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: data-patterns
Download link: https://github.com/pvliesdonk/agents.md/archive/main.zip#data-patterns

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.