openlineage

Community

Track data lineage across your pipelines

Authorivanshamaev
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps you capture and visualize end-to-end data lineage so you can understand how datasets are produced, how columns flow through transformations, and what would be impacted by changing upstream data.

Core Features & Use Cases

  • OpenLineage-ready lineage events: Model and emit RunEvent/Job/Dataset relationships with correct START/COMPLETE/FAIL semantics.
  • Marquez backend setup & API workflows: Run a local/reference lineage backend and query namespaces, jobs, datasets, and lineage graphs for impact analysis.
  • Integration across Airflow, Spark, and dbt: Configure common emitters and enrich events to link parent orchestration runs to child execution runs.
  • Column-level lineage and facets: Attach schema facets, columnLineage mappings, SQL facets, and output statistics for fine-grained impact analysis and auditing.
  • Custom emitters: Build bespoke OpenLineage clients in Python to emit lineage when tooling does not provide automatic instrumentation.

Quick Start

Run this skill for a Spark/airflow pipeline by installing and configuring the OpenLineage integration to send lineage events to a Marquez backend, then query the lineage graph for the affected dataset.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: openlineage
Download link: https://github.com/ivanshamaev/de-agent-skills/archive/main.zip#openlineage

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.