{"id":20710423,"url":"https://github.com/mobiletelesystems/data-rentgen","last_synced_at":"2026-01-26T10:15:22.526Z","repository":{"id":257362936,"uuid":"820804591","full_name":"MobileTeleSystems/data-rentgen","owner":"MobileTeleSystems","description":"NextGen DataMotion Lineage","archived":false,"fork":false,"pushed_at":"2026-01-20T14:15:40.000Z","size":17781,"stargazers_count":14,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"develop","last_synced_at":"2026-01-20T22:28:55.832Z","etag":null,"topics":["airflow","dbt","flink","hive","lineage","openlineage","rest-api","spark"],"latest_commit_sha":null,"homepage":"https://data-rentgen.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MobileTeleSystems.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.rst","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-06-27T08:01:47.000Z","updated_at":"2026-01-20T14:16:52.000Z","dependencies_parsed_at":"2024-10-28T14:42:57.470Z","dependency_job_id":"fc728ee0-10b9-4266-adc4-85634f710f5d","html_url":"https://github.com/MobileTeleSystems/data-rentgen","commit_stats":null,"previous_names":["mobiletelesystems/data-rentgen"],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/MobileTeleSystems/data-rentgen","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MobileTeleSystems%2Fdata-rentgen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MobileTeleSystems%2Fdata-rentgen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MobileTeleSystems%2Fdata-rentgen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MobileTeleSystems%2Fdata-rentgen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MobileTeleSystems","download_url":"https://codeload.github.com/MobileTeleSystems/data-rentgen/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MobileTeleSystems%2Fdata-rentgen/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28774301,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-26T09:42:00.929Z","status":"ssl_error","status_checked_at":"2026-01-26T09:42:00.591Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","dbt","flink","hive","lineage","openlineage","rest-api","spark"],"created_at":"2024-11-17T02:11:54.571Z","updated_at":"2026-01-26T10:15:22.496Z","avatar_url":"https://github.com/MobileTeleSystems.png","language":"Python","readme":".. _readme:\n\n|Logo|\n\n.. |Logo| image:: docs/_static/logo_wide_white_text.svg\n    :alt: Data.Rentgen logo\n    :target: https://github.com/MobileTeleSystems/data-rentgen\n\n|Repo Status| |Docker image| |PyPI| |PyPI License| |PyPI Python Version| |Documentation|\n|Build Status| |Coverage| |pre-commit.ci|\n\n.. |Repo Status| image:: https://www.repostatus.org/badges/latest/wip.svg\n    :target: https://www.repostatus.org/#wip\n.. |Docker image| image:: https://img.shields.io/docker/v/mtsrus/data-rentgen?sort=semver\u0026label=docker\n    :target: https://hub.docker.com/r/mtsrus/data-rentgen\n.. |PyPI| image:: https://img.shields.io/pypi/v/data-rentgen\n    :target: https://pypi.org/project/data-rentgen/\n.. |PyPI License| image:: https://img.shields.io/pypi/l/data-rentgen.svg\n    :target: https://github.com/MobileTeleSystems/data-rentgen/blob/develop/LICENSE.txt\n.. |PyPI Python Version| image:: https://img.shields.io/pypi/pyversions/data-rentgen.svg\n    :target: https://badge.fury.io/py/data-rentgen\n.. |Documentation| image:: https://readthedocs.org/projects/data-rentgen/badge/?version=stable\n    :target: https://data-rentgen.readthedocs.io/\n.. |Build Status| image:: https://github.com/MobileTeleSystems/data-rentgen/workflows/Tests/badge.svg\n    :target: https://github.com/MobileTeleSystems/data-rentgen/actions\n.. |Coverage| image:: https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/\n    MTSOnGithub/03e73a82ecc4709934540ce8201cc3b4/raw/data-rentgen_badge.json\n    :target: https://github.com/MobileTeleSystems/data-rentgen/actions\n.. |pre-commit.ci| image:: https://results.pre-commit.ci/badge/github/MobileTeleSystems/data-rentgen/develop.svg\n    :target: https://results.pre-commit.ci/latest/github/MobileTeleSystems/data-rentgen/develop\n\nWhat is Data.Rentgen?\n---------------------\n\nData.Rentgen is a Data Motion Lineage service, compatible with `OpenLineage \u003chttps://openlineage.io/\u003e`_ specification.\n\nCurrently we support consuming lineage from:\n\n* Apache Spark\n* Apache Airflow\n* Apache Hive\n* Apache Flink\n* dbt\n\n**Note**: service is under active development, so it doesn't have stable API for now.\n\nGoals\n-----\n\n* Collect lineage events produced by OpenLineage clients \u0026 integrations.\n* Store operation-grained events for better detalization.\n* Provide API for fetching both job/run ↔ dataset lineage and dataset ↔ dataset lineage.\n\nFeatures\n--------\n\n* Support consuming large amounts of lineage events, use Apache Kafka as event buffer.\n* Store data in tables partitioned by event timestamp, to speed up lineage graph resolution.\n* Lineage graph is build with user-specified time boundaries.\n* Lineage graph can be build with different granularity. e.g. merge all individual Spark commands into Spark applicationId or Spark applicationName.\n* Column-level lineage support.\n* Authentication support.\n\nNon-goals\n---------\n\n* This is **not** a Data Catalog. DataRentgen doesn't track dataset schema change, owner and so on. Use `Datahub \u003chttps://datahubproject.io/\u003e`_ or `OpenMetadata \u003chttps://open-metadata.org/\u003e`_ instead.\n* Static Data Lineage like view → table is not supported.\n\nLimitations\n-----------\n\n* OpenLineage have integrations with Trino, Debezium and some other lineage sources. DataRentgen support may be added later.\n* DataRentgen parses only limited set of OpenLineage facets, and doesn't store custom facets. This can be changed in future.\n\n.. documentation\n\nDocumentation\n-------------\n\nSee https://data-rentgen.readthedocs.io/\n\nScreenshots\n-----------\n\nLineage graph\n~~~~~~~~~~~~~\n\nDataset-level lineage graph\n\n.. image:: docs/entities/dataset_lineage.png\n    :alt: Dataset-level lineage graph\n\nDataset column-level lineage graph\n\n.. image:: docs/entities/dataset_column_lineage.png\n    :alt: Dataset column-level lineage graph\n\nJob-level lineage graph\n\n.. image:: docs/entities/job_lineage.png\n    :alt: Job-level lineage graph\n\nRun-level lineage graph\n\n.. image:: docs/entities/run_lineage.png\n    :alt: Job-level lineage graph\n\nDatasets\n~~~~~~~~\n\n.. image:: docs/entities/dataset_list.png\n    :alt: Datasets list\n\nRuns\n~~~~\n\n.. image:: docs/entities/run_list.png\n    :alt: Runs list\n\nSpark application\n~~~~~~~~~~~~~~~~~\n\n.. image:: docs/integrations/spark/job_details.png\n    :alt: Spark application details\n\nSpark run\n~~~~~~~~~\n\n.. image:: docs/integrations/spark/run_details.png\n    :alt: Spark run details\n\nSpark command\n~~~~~~~~~~~~~~~\n\n.. image:: docs/integrations/spark/operation_details.png\n    :alt: Spark command details\n\nHive query\n~~~~~~~~~~\n\n.. image:: docs/integrations/hive/operation_details.png\n    :alt: Hive query details\n\nAirflow DagRun\n~~~~~~~~~~~~~~~\n\n.. image:: docs/integrations/airflow/dag_run_details.png\n    :alt: Airflow DagRun details\n\nAirflow TaskInstance\n~~~~~~~~~~~~~~~~~~~~~\n\n.. image:: docs/integrations/airflow/task_run_details.png\n    :alt: Airflow TaskInstance details\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmobiletelesystems%2Fdata-rentgen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmobiletelesystems%2Fdata-rentgen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmobiletelesystems%2Fdata-rentgen/lists"}