Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mobiletelesystems/data-rentgen
NextGen DataLineage service
https://github.com/mobiletelesystems/data-rentgen
kafka-consumer lineage openlineage rest-api
Last synced: 3 days ago
JSON representation
NextGen DataLineage service
- Host: GitHub
- URL: https://github.com/mobiletelesystems/data-rentgen
- Owner: MobileTeleSystems
- License: apache-2.0
- Created: 2024-06-27T08:01:47.000Z (5 months ago)
- Default Branch: develop
- Last Pushed: 2024-11-11T13:38:49.000Z (9 days ago)
- Last Synced: 2024-11-11T14:33:13.868Z (9 days ago)
- Topics: kafka-consumer, lineage, openlineage, rest-api
- Language: Python
- Homepage: https://data-rentgen.readthedocs.io/
- Size: 946 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE.txt
- Security: SECURITY.rst
Awesome Lists containing this project
README
.. _readme:
|Logo|
.. |Logo| image:: docs/_static/logo_wide_white_text.svg
:alt: Data.Rentgen logo
:target: https://github.com/MobileTeleSystems/data-rentgen|Repo Status| |PyPI| |PyPI License| |PyPI Python Version| |Docker image| |Documentation|
|Build Status| |Coverage| |pre-commit.ci|.. |Repo Status| image:: https://www.repostatus.org/badges/latest/concept.svg
:target: https://www.repostatus.org/#concept
.. |PyPI| image:: https://img.shields.io/pypi/v/data-rentgen
:target: https://pypi.org/project/data-rentgen/
.. |PyPI License| image:: https://img.shields.io/pypi/l/data-rentgen.svg
:target: https://github.com/MobileTeleSystems/data-rentgen/blob/develop/LICENSE.txt
.. |PyPI Python Version| image:: https://img.shields.io/pypi/pyversions/data-rentgen.svg
:target: https://badge.fury.io/py/data-rentgen
.. |Docker image| image:: https://img.shields.io/docker/v/mtsrus/data-rentgen?sort=semver&label=docker
:target: https://hub.docker.com/r/mtsrus/data-rentgen
.. |Documentation| image:: https://readthedocs.org/projects/data-rentgen/badge/?version=stable
:target: https://data-rentgen.readthedocs.io/
.. |Build Status| image:: https://github.com/MobileTeleSystems/data-rentgen/workflows/Tests/badge.svg
:target: https://github.com/MobileTeleSystems/data-rentgen/actions
.. |Coverage| image:: https://codecov.io/github/MobileTeleSystems/data-rentgen/graph/badge.svg?token=s0JztGZbq3
:target: https://codecov.io/github/MobileTeleSystems/data-rentgen
.. |pre-commit.ci| image:: https://results.pre-commit.ci/badge/github/MobileTeleSystems/data-rentgen/develop.svg
:target: https://results.pre-commit.ci/latest/github/MobileTeleSystems/data-rentgen/developWhat is Data.Rentgen?
---------------------Data.Rentgen is a Data Motion Lineage service, compatible with `OpenLineage `_ specification.
**Note**: service is under active development, and is not ready to use yet.
Goals
-----* Collect lineage events produced by OpenLineage clients & integrations (Spark, Airflow).
* Support consuming large amounts of lineage events, by using Kafka as event buffer and storing data in tables partitioned by event timestamp.
* Store operation-grained events (instead of job grained `Marquez `_), for better detalization.
* Provide API for building run ↔ dataset lineage, as well as parent run → children run lineage.
* Ability to build lineage graph with specific time boundaries (unlike Marquez there lineage is build only for last job run).
* Ability to build lineage graph with different granularity. e.g. merge all individual Spark operations into Spark applicationId or Spark applicationName.Non-goals
---------* This is **not** a Data Catalog. Use `Datahub `_ or `OpenMetadata `_ instead.
* Static Data Lineage like view → table is not supported.
* Currently column-level lineage is collected by OpenLineage, but not yet consumed by Data.Rentgen... documentation
Documentation
-------------See https://data-rentgen.readthedocs.io/