Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/OpenLineage/OpenLineage
An Open Standard for lineage metadata collection
https://github.com/OpenLineage/OpenLineage
Last synced: 2 months ago
JSON representation
An Open Standard for lineage metadata collection
- Host: GitHub
- URL: https://github.com/OpenLineage/OpenLineage
- Owner: OpenLineage
- License: apache-2.0
- Created: 2020-10-24T21:45:05.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-10-29T09:59:40.000Z (3 months ago)
- Last Synced: 2024-10-29T11:58:03.401Z (3 months ago)
- Language: Java
- Homepage: http://openlineage.io
- Size: 44.2 MB
- Stars: 1,750
- Watchers: 46
- Forks: 304
- Open Issues: 257
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Governance: GOVERNANCE.md
Awesome Lists containing this project
- awesome-data-catalogs - OpenLineage
- jimsghstars - OpenLineage/OpenLineage - An Open Standard for lineage metadata collection (Java)
README
## Badges
[![CircleCI](https://circleci.com/gh/OpenLineage/OpenLineage/tree/main.svg?style=shield)](https://circleci.com/gh/OpenLineage/OpenLineage/tree/main)
[![status](https://img.shields.io/badge/status-active-brightgreen.svg)](#status)
[![Slack](https://img.shields.io/badge/slack-chat-blue.svg)](http://bit.ly/OpenLineageSlack)
[![license](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](https://github.com/OpenLineage/OpenLineage/blob/main/LICENSE)
[![maven](https://img.shields.io/maven-central/v/io.openlineage/openlineage-java.svg)](https://search.maven.org/search?q=g:io.openlineage)
[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/4888/badge)](https://bestpractices.coreinfrastructure.org/projects/4888)## Overview
OpenLineage is an Open standard for metadata and lineage collection designed to instrument jobs as they are running.
It defines a generic model of run, job, and dataset entities identified using consistent naming strategies.
The core lineage model is extensible by defining specific facets to enrich those entities.## Status
OpenLineage is an [LF AI & Data Foundation](https://lfaidata.foundation/projects/openlineage) incubation project under active development, and we'd love your help!
## Problem
### Before
- Duplication of effort: each project has to instrument all jobs
- Integrations are external and can break with new versions![Before OpenLineage](doc/before-ol.svg)
### With OpenLineage
- Effort of integration is shared
- Integration can be pushed in each project: no need to play catch up![With OpenLineage](doc/with-ol.svg)
## Scope
OpenLineage defines the metadata for running jobs and the corresponding events.
A configurable backend allows the user to choose what protocol to send the events to.
![Scope](doc/scope.svg)## Core model
![Model](doc/datamodel.svg)
A facet is an atomic piece of metadata attached to one of the core entities.
See the spec for more details.## Spec
The [specification](spec/OpenLineage.md) is defined using OpenAPI and allows extension through custom facets.## Integration matrix
The OpenLineage repository contains integrations with several systems.
| Name| Table-level lineage| Column-level lineage |
| ----| ------------------ | -------------------- |
|[Apache Spark](https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark)| :white_check_mark: | :white_check_mark:1 |
|[Apache Airflow](https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow)| :white_check_mark: | :white_check_mark:2 |
|[Dagster](https://github.com/OpenLineage/OpenLineage/tree/main/integration/dagster)| :white_check_mark: | :x: |
|[dbt](https://github.com/OpenLineage/OpenLineage/tree/main/integration/dbt) |:white_check_mark: | :x: |
|[Flink](https://github.com/OpenLineage/OpenLineage/tree/main/integration/flink)|:white_check_mark: | :x: |1. Does not support `SELECT *` queries with JDBC.
2. Supports SQL-based operators other than BigQuery.## Related projects
- [Marquez](https://marquezproject.ai/): Marquez is an [LF AI & DATA](https://lfaidata.foundation/) project to collect, aggregate, and visualize a data ecosystem's metadata. It is the reference implementation of the OpenLineage API.
- [OpenLineage collection implementation](https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/api/OpenLineageResource.java)
- [Egeria](https://egeria.odpi.org/): Egeria offers open metadata and governance for enterprises - automatically capturing, managing and exchanging metadata between tools and platforms, no matter the vendor.## Community
- Website: [openlineage.io](http://openlineage.io)
- Slack: [OpenLineage.slack.com](http://bit.ly/OpenLineageSlack)
- Not a member? Join [here](https://bit.ly/lineageslack).
- Twitter: [@OpenLineage](https://twitter.com/OpenLineage)
- Mailing list: [openlineage-tsc](https://lists.lfaidata.foundation/g/openlineage-tsc)
- Wiki: [OpenLineage+Home](https://wiki.lfaidata.foundation/display/OpenLineage/OpenLineage+Home)
- LinkedIn: [13927795](https://www.linkedin.com/groups/13927795/)
- YouTube: [channel](https://www.youtube.com/channel/UCRMLy4AaSw_ka-gNV9nl7VQ)
- Mastodon: [@[email protected]]([email protected])## Talks
- [Data+AI Summit June 2023. Cross-Platform Data Lineage with OpenLineage](https://www.databricks.com/dataaisummit/session/cross-platform-data-lineage-openlineage/)
- [Berlin Buzzwords June 2023. Column-Level Lineage is Coming to the Rescue](https://youtu.be/xFVSZCCbZlY)
- [Berlin Buzzwords June 2022. Cross-Platform Data Lineage with OpenLineage](https://www.youtube.com/watch?v=pLBVGIPuwEo)
- [Berlin Buzzwords June 2021. Observability for Data Pipelines with OpenLineage](https://2021.berlinbuzzwords.de/member/julien-le-dem)
- [Data Driven NYC February 2021. Data Observability and Pipelines: OpenLineage and Marquez](https://mattturck.com/datakin/)
- [Big Data Technology Warsaw Summit February 2021. Data lineage and Observability with Marquez and OpenLineage](https://bigdatatechwarsaw.eu/edition-2021/)
- [Metadata Day 2020. OpenLineage Lightning Talk](https://www.youtube.com/watch?v=anlV5Er_BpM)
- [Open Core Summit 2020. Observability for Data Pipelines: OpenLineage Project Launch](https://www.coss.community/coss/ocs-2020-breakout-julien-le-dem-3eh4)## Contributing
See [CONTRIBUTING.md](https://github.com/OpenLineage/OpenLineage/blob/main/CONTRIBUTING.md) for more details about how to contribute.
## Report a Vulnerability
If you discover a vulnerability in the project, please [open an issue](https://github.com/OpenLineage/OpenLineage/issues/new/choose) and attach the "security" label.
----
SPDX-License-Identifier: Apache-2.0\
Copyright 2018-2024 contributors to the OpenLineage project