openlineage

Name: openlineage
Availability: InStock
Author: ivanshamaev

Community

Track data lineage across your pipelines

Software Engineering #dbt #data lineage #airflow #spark #openlineage #marquez #column lineage

Authorivanshamaev

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill helps you capture and visualize end-to-end data lineage so you can understand how datasets are produced, how columns flow through transformations, and what would be impacted by changing upstream data.

Core Features & Use Cases

OpenLineage-ready lineage events: Model and emit RunEvent/Job/Dataset relationships with correct START/COMPLETE/FAIL semantics.
Marquez backend setup & API workflows: Run a local/reference lineage backend and query namespaces, jobs, datasets, and lineage graphs for impact analysis.
Integration across Airflow, Spark, and dbt: Configure common emitters and enrich events to link parent orchestration runs to child execution runs.
Column-level lineage and facets: Attach schema facets, columnLineage mappings, SQL facets, and output statistics for fine-grained impact analysis and auditing.
Custom emitters: Build bespoke OpenLineage clients in Python to emit lineage when tooling does not provide automatic instrumentation.

Quick Start

Run this skill for a Spark/airflow pipeline by installing and configuring the OpenLineage integration to send lineage events to a Marquez backend, then query the lineage graph for the affected dataset.

openlineage

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper