airflow-starrocks-pipeline

Community

Orchestrate StarRocks loads with Airflow

Authorivanshamaev
Version1.0.0
Installs0

System Documentation

What problem does it solve?

It helps you reliably orchestrate StarRocks ingestion workflows from Airflow, including async Broker Load polling and Routine Load lifecycle management, so ETL pipelines don’t get stuck or load with stale metadata.

Core Features & Use Cases

  • Broker Load orchestration (S3 batch ingestion): Create partitions, trigger Broker Load jobs with a deterministic label, and poll SHOW LOAD ... until FINISHED.
  • Stream Load micro-batch pattern: Send NDJSON payloads via HTTP with a computed label and validate StarRocks response status.
  • Routine Load lifecycle control: Pause for schema changes, apply DDL, and resume while handling job state transitions safely.
  • Post-load optimization: Run ANALYZE TABLE for partitions to refresh statistics and improve query planning.
  • Airflow integration details: Use MySqlHook for MySQL-compatible DDL/DML and Http/requests for Stream Load; pass labels via XCom and template partition-aware DAG parameters.

Quick Start

Ask the agent to generate an Airflow DAG that performs partition-aware Broker Load into a StarRocks table, polls until completion, then runs ANALYZE TABLE ... PARTITION (...) and basic row-count validation for the same date.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: airflow-starrocks-pipeline
Download link: https://github.com/ivanshamaev/de-agent-skills/archive/main.zip#airflow-starrocks-pipeline

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.