{"id":19774015,"url":"https://github.com/badoo/exasol-data-lineage","last_synced_at":"2025-04-30T18:32:48.730Z","repository":{"id":66759189,"uuid":"244689903","full_name":"badoo/exasol-data-lineage","owner":"badoo","description":"Exasol data lineage scripts","archived":false,"fork":false,"pushed_at":"2021-07-26T14:25:41.000Z","size":23,"stargazers_count":7,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-06T03:51:05.578Z","etag":null,"topics":["data-lineage","exasol","exasol-db","lua"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/badoo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-03T16:44:50.000Z","updated_at":"2025-02-19T13:08:49.000Z","dependencies_parsed_at":"2023-09-09T17:30:57.435Z","dependency_job_id":null,"html_url":"https://github.com/badoo/exasol-data-lineage","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/badoo%2Fexasol-data-lineage","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/badoo%2Fexasol-data-lineage/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/badoo%2Fexasol-data-lineage/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/badoo%2Fexasol-data-lineage/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/badoo","download_url":"https://codeload.github.com/badoo/exasol-data-lineage/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251761432,"owners_count":21639616,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-lineage","exasol","exasol-db","lua"],"created_at":"2024-11-12T05:11:45.099Z","updated_at":"2025-04-30T18:32:48.723Z","avatar_url":"https://github.com/badoo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Exasol Data Lineage\n\nAn Exasol script written on Lua that allows to perform Data Lineage analysis.\n\n## How it works?\n\nThe script analyzes SQL without running it by means of in-built SQL parsing library. For each output column it identifies a list of source columns.  \n\n## Features\n\n* determines output columns origin\n* multiple schemas\n* multiple source columns\n\n### Supported SQL constructions\n\n* CTE\n* UNION\n* FROM\n* JOINS\n* EMITS\n* Subqueries\n* LOCAL keyword\n* quoted identifiers\n* expression columns\n* table and column aliases\n\n## Limitations\n\n* the script doesn't check SQL syntax\n* only one statement at a time\n* ON and USING clauses, WHERE conditions are not analyzed yet\n\n## Installation\n\n1. Connect to Exasol cluster\n2. Open schema in which you want to install scripts\n3. Execute *.sql files from scripts directory  \n\n## How to use?\n\nSQL_DATA_LINEAGE script expects 2 arguments:\n1. SQL statement. It is allowed to pass SELECT or CREATE VIEW statements.\n2. Current schema. If null value passed, script takes current schema from session.\n\n## Examples\n \n```sql\nEXECUTE SCRIPT FN.SQL_DATA_LINEAGE(\n    'CREATE OR REPLACE VIEW test_view AS SELECT * FROM users',\n    'TEST_DATA_LINEAGE'\n)\n```\n\nOutput\n\n```text\n+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+\n| COLUMN_NAME | SOURCE_SCHEMA_NAME | SOURCE_OBJECT_NAME | SOURCE_COLUMN_NAME | FNAME    | IS_AGG | ORDINAL_POSITION |\n+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+\n| USER_ID     | TEST_DATA_LINEAGE  | USERS              | USER_ID            | (null)   | false  | 1                |\n| NAME        | TEST_DATA_LINEAGE  | USERS              | NAME               | (null)   | false  | 2                |\n| REGISTERED  | TEST_DATA_LINEAGE  | USERS              | REGISTERED         | (null)   | false  | 3                |\n| STATUS      | TEST_DATA_LINEAGE  | USERS              | STATUS             | (null)   | false  | 4                |\n+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+\n```\n\n```sql\nEXECUTE SCRIPT FN.SQL_DATA_LINEAGE(\n    '\n    WITH\n        users AS (\n            SELECT\n                  user_id\n                , name\n                , status AS status_id\n            FROM users\n            WHERE status != 3\n        ),\n\n        status AS (\n            SELECT\n                  id AS status_id\n                , name AS status_name\n            FROM dim_status\n        )\n\n    SELECT\n          a.*\n        , COALESCE(b.status_name, ''Unknown'') AS status_name\n    FROM users a\n    LEFT JOIN status b ON (a.status_id = b.status_id)\n    ',\n    'TEST_DATA_LINEAGE'\n)\n```\n\nOutput\n\n```text\n+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+\n| COLUMN_NAME | SOURCE_SCHEMA_NAME | SOURCE_OBJECT_NAME | SOURCE_COLUMN_NAME | FNAME    | IS_AGG | ORDINAL_POSITION |\n+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+\n| USER_ID     | TEST_DATA_LINEAGE  | USERS              | USER_ID            | (null)   | false  | 1                |\n| NAME        | TEST_DATA_LINEAGE  | USERS              | NAME               | (null)   | false  | 2                |\n| STATUS_ID   | TEST_DATA_LINEAGE  | USERS              | STATUS             | (null)   | false  | 3                |\n| STATUS_NAME | TEST_DATA_LINEAGE  | DIM_STATUS         | NAME               | COALESCE | false  | 4                |\n+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+\n```\n\n## Running tests\n\n* install [PyEXASOL](https://github.com/badoo/pyexasol) driver\n* set Exasol credentials in tests/config.py\n\n```shell script\ncd tests/\npython -m unittest test_sql_data_lineage.py\n```\n\n## Authors\n\n* Dmitry Umarov \u003cd.umarov@team.bumble.com\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbadoo%2Fexasol-data-lineage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbadoo%2Fexasol-data-lineage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbadoo%2Fexasol-data-lineage/lists"}