https://github.com/badoo/exasol-data-lineage
Exasol data lineage scripts
https://github.com/badoo/exasol-data-lineage
data-lineage exasol exasol-db lua
Last synced: 9 months ago
JSON representation
Exasol data lineage scripts
- Host: GitHub
- URL: https://github.com/badoo/exasol-data-lineage
- Owner: badoo
- License: mit
- Created: 2020-03-03T16:44:50.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2021-07-26T14:25:41.000Z (over 4 years ago)
- Last Synced: 2025-04-06T03:51:05.578Z (10 months ago)
- Topics: data-lineage, exasol, exasol-db, lua
- Language: Python
- Size: 22.5 KB
- Stars: 7
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Exasol Data Lineage
An Exasol script written on Lua that allows to perform Data Lineage analysis.
## How it works?
The script analyzes SQL without running it by means of in-built SQL parsing library. For each output column it identifies a list of source columns.
## Features
* determines output columns origin
* multiple schemas
* multiple source columns
### Supported SQL constructions
* CTE
* UNION
* FROM
* JOINS
* EMITS
* Subqueries
* LOCAL keyword
* quoted identifiers
* expression columns
* table and column aliases
## Limitations
* the script doesn't check SQL syntax
* only one statement at a time
* ON and USING clauses, WHERE conditions are not analyzed yet
## Installation
1. Connect to Exasol cluster
2. Open schema in which you want to install scripts
3. Execute *.sql files from scripts directory
## How to use?
SQL_DATA_LINEAGE script expects 2 arguments:
1. SQL statement. It is allowed to pass SELECT or CREATE VIEW statements.
2. Current schema. If null value passed, script takes current schema from session.
## Examples
```sql
EXECUTE SCRIPT FN.SQL_DATA_LINEAGE(
'CREATE OR REPLACE VIEW test_view AS SELECT * FROM users',
'TEST_DATA_LINEAGE'
)
```
Output
```text
+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+
| COLUMN_NAME | SOURCE_SCHEMA_NAME | SOURCE_OBJECT_NAME | SOURCE_COLUMN_NAME | FNAME | IS_AGG | ORDINAL_POSITION |
+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+
| USER_ID | TEST_DATA_LINEAGE | USERS | USER_ID | (null) | false | 1 |
| NAME | TEST_DATA_LINEAGE | USERS | NAME | (null) | false | 2 |
| REGISTERED | TEST_DATA_LINEAGE | USERS | REGISTERED | (null) | false | 3 |
| STATUS | TEST_DATA_LINEAGE | USERS | STATUS | (null) | false | 4 |
+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+
```
```sql
EXECUTE SCRIPT FN.SQL_DATA_LINEAGE(
'
WITH
users AS (
SELECT
user_id
, name
, status AS status_id
FROM users
WHERE status != 3
),
status AS (
SELECT
id AS status_id
, name AS status_name
FROM dim_status
)
SELECT
a.*
, COALESCE(b.status_name, ''Unknown'') AS status_name
FROM users a
LEFT JOIN status b ON (a.status_id = b.status_id)
',
'TEST_DATA_LINEAGE'
)
```
Output
```text
+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+
| COLUMN_NAME | SOURCE_SCHEMA_NAME | SOURCE_OBJECT_NAME | SOURCE_COLUMN_NAME | FNAME | IS_AGG | ORDINAL_POSITION |
+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+
| USER_ID | TEST_DATA_LINEAGE | USERS | USER_ID | (null) | false | 1 |
| NAME | TEST_DATA_LINEAGE | USERS | NAME | (null) | false | 2 |
| STATUS_ID | TEST_DATA_LINEAGE | USERS | STATUS | (null) | false | 3 |
| STATUS_NAME | TEST_DATA_LINEAGE | DIM_STATUS | NAME | COALESCE | false | 4 |
+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+
```
## Running tests
* install [PyEXASOL](https://github.com/badoo/pyexasol) driver
* set Exasol credentials in tests/config.py
```shell script
cd tests/
python -m unittest test_sql_data_lineage.py
```
## Authors
* Dmitry Umarov