airflow-starrocks-backfill

Community

Backfill StarRocks safely with Airflow

Authorivanshamaev
Version1.0.0
Installs0

System Documentation

What problem does it solve?

It helps you reprocess and reload historical StarRocks partitions reliably without duplicates, race conditions, or data loss when upstream data or transformation logic has changed.

Core Features & Use Cases

  • Idempotent partition backfill: Uses deterministic Broker Load labels per (table, date) so reruns can safely skip already-finished loads.
  • Atomic partition replacement: Recomputes partitions with StarRocks INSERT OVERWRITE semantics (and ensures correct preconditions like partition existence).
  • Airflow-driven orchestration: Provides two DAG patterns (catchup-based and programmatic date-range) with safety guardrails like max_active_runs=1.
  • Operational safety: Includes partition pre-creation, FINISHED/CANCELLED polling, and clear anti-patterns (e.g., append-based reloads).
  • Progress tracking & observability: Suggests a backfill tracking table and example queries to monitor durations and row counts.
  • Concurrency control for speed: Shows parallel backfill with a configurable worker limit to avoid overwhelming StarRocks BE.

Quick Start

Run the backfill DAG in Airflow with max_active_runs=1 for a defined historical date range, ensuring partitions exist first and using deterministic Broker Load labels to make reruns safe and idempotent.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: airflow-starrocks-backfill
Download link: https://github.com/ivanshamaev/de-agent-skills/archive/main.zip#airflow-starrocks-backfill

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.