Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/soyhuce/etl_snowflake_lineage_purview
Send Snowflake Lineage at Purview Atlas API via snowflake objects dependencies
https://github.com/soyhuce/etl_snowflake_lineage_purview
Last synced: about 2 months ago
JSON representation
Send Snowflake Lineage at Purview Atlas API via snowflake objects dependencies
- Host: GitHub
- URL: https://github.com/soyhuce/etl_snowflake_lineage_purview
- Owner: Soyhuce
- Created: 2024-09-05T08:33:01.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-10-29T14:24:25.000Z (3 months ago)
- Last Synced: 2024-11-24T20:50:09.777Z (about 2 months ago)
- Language: Python
- Size: 316 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Script ETL Snowflake Lineage to Purview
![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)
Extract Snowflake dependencies via a Snowflake view, transform view lines to Apache Atlas Objects Process and load Atlas Process to Purview Atlas API via bulk create request.
## Run ETL Script
### Prerequisites
- Snowflake view `PRD_RAW.PURVIEW.BLOB_AND_TABLE_MAPPING_VW` must be created in Snowflake account and read access right to the view must be granted to snowflake used user.
- Azure Active Directory Entreprise Application must exists in Azure AD (identified by client_id and client_secret) and correct access write to data domain must be granted in Purview console.### Expected environment variables
These varenv are extracted and validated by *pydantic-settings*
| Varenv | Description |
| ------------------------ | -------------------------------------------------------------------- |
| PURVIEW_TENANT_ID | Purview account tenant id |
| PURVIEW_CLIENT_ID | Azure AD Entreprise Application client id for Purview API access |
| PURVIEW_CLIENT_SECRET | Azure AD Entreprise Application client secret for Purview API access |
| PURVIEW_ACCOUNT | Purview account name |
| SNOWFLAKE_USER | Snowflake user name used to query lineage view |
| SNOWFLAKE_PASSWORD | Snowflake user password |
| SNOWFLAKE_WAREHOUSE | Snowflake warehouse used to query the view |
| SNOWFLAKE_AUDIT_DATABASE | Snowflake view database |
| SNOWFLAKE_AUDIT_SCHEMA | Snowflake view schema |### Run Script
1. Install dependancies via `poetry install`
2. Run script via poetry `poetry run python main.py`## ETL
### ETL Process
```mermaid
sequenceDiagram
participant snowflake as Snowflake
participant etl as ETL
participant ad as Azure AD
participant pur as Purviewetl ->> ad: login to azure ad application using login / password
activate ad
ad -->> etl: return token for Purview login
deactivate ad
etl ->> snowflake: create snowflake connection using snowflake user / password
activate snowflake
snowflake -->> etl: snowflake connection
etl ->> snowflake: query lineage view to extract snowflake objects dependencies
snowflake -->> etl: object dependencies rows
etl ->> snowflake: close connection
deactivate snowflake
etl ->> etl: validate result and transform rows as Atlas Entity
loop for each snowflake dependencies
etl ->> pur: check if Atlas Entities source and target exist via qualifiedName
pur -->> etl: entity found or not found
alt entities exists
etl ->> etl: create Atlas Process
else one or both entities does not exits
etl ->> etl: pass without create Atlas Process
end
end
etl ->> pur: send created atlas processes via bulk create
```### Mapping Snowflake --> Apache Atlas
| Snowflake view column | Apache Atlas Json key |
| --------------------- | --------------------- |
| *_DATABASE | qualified_name |
| *_SCHEMA | qualified_name |
| *_OBJECT_NAME | name & qualified_name |
| *_OBJECT_DOMAIN | typeName |
| DEPENDENCY_TYPE | NOT_USED |
| STAGE_URL | qualified_name |