datahub-catalog

Community

Ingest and connect data lineage in DataHub.

Authorivanshamaev
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill solves the problem of setting up DataHub metadata ingestion so you can centralize datasets, capture lineage, and enable search/discovery for data engineering assets.

Core Features & Use Cases

  • Deploy and operate DataHub: Use Docker Compose quickstart or Kubernetes Helm deployment, including Kafka/Elasticsearch/MySQL components.
  • Ingest metadata from many sources: Generate and run ingestion recipes for PostgreSQL, Hive, Spark, dbt, Airflow, Kafka, and S3, including stateful ingestion and optional profiling.
  • Emit metadata programmatically with Python SDK: Use DatahubRestEmitter and MCP wrappers to publish entity aspects like schema, ownership, tags, glossary terms, and lineage.
  • Support lineage at dataset and column granularity: Attach table-level upstream lineage and FineGrainedLineage field-to-field mappings.
  • Discover assets: Use DataHub UI, REST search APIs, or GraphQL lineage traversal to navigate impact and dependencies.

Quick Start

Activate this skill and ask your agent to ingest metadata from a dbt project by preparing a dbt ingestion recipe and running datahub ingest against your DataHub GMS endpoint.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: datahub-catalog
Download link: https://github.com/ivanshamaev/de-agent-skills/archive/main.zip#datahub-catalog

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.