openlineage
CommunityTrack data lineage across your pipelines
Authorivanshamaev
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps you capture and visualize end-to-end data lineage so you can understand how datasets are produced, how columns flow through transformations, and what would be impacted by changing upstream data.
Core Features & Use Cases
- OpenLineage-ready lineage events: Model and emit RunEvent/Job/Dataset relationships with correct START/COMPLETE/FAIL semantics.
- Marquez backend setup & API workflows: Run a local/reference lineage backend and query namespaces, jobs, datasets, and lineage graphs for impact analysis.
- Integration across Airflow, Spark, and dbt: Configure common emitters and enrich events to link parent orchestration runs to child execution runs.
- Column-level lineage and facets: Attach schema facets, columnLineage mappings, SQL facets, and output statistics for fine-grained impact analysis and auditing.
- Custom emitters: Build bespoke OpenLineage clients in Python to emit lineage when tooling does not provide automatic instrumentation.
Quick Start
Run this skill for a Spark/airflow pipeline by installing and configuring the OpenLineage integration to send lineage events to a Marquez backend, then query the lineage graph for the affected dataset.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: openlineage Download link: https://github.com/ivanshamaev/de-agent-skills/archive/main.zip#openlineage Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.