Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tokern/data-lineage
Generate and Visualize Data Lineage from query history
https://github.com/tokern/data-lineage
data-governance data-lineage jupyter postgresql python
Last synced: about 1 month ago
JSON representation
Generate and Visualize Data Lineage from query history
- Host: GitHub
- URL: https://github.com/tokern/data-lineage
- Owner: tokern
- License: mit
- Created: 2020-03-17T04:55:39.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2023-08-04T07:24:15.000Z (over 1 year ago)
- Last Synced: 2024-11-20T05:53:24.039Z (about 2 months ago)
- Topics: data-governance, data-lineage, jupyter, postgresql, python
- Language: Python
- Homepage: https://tokern.io/data-lineage/
- Size: 2.46 MB
- Stars: 311
- Watchers: 9
- Forks: 46
- Open Issues: 32
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - tokern/data-lineage - Generate and Visualize Data Lineage from query history (Python)
README
# Tokern Lineage Engine
[![CircleCI](https://circleci.com/gh/tokern/data-lineage.svg?style=svg)](https://circleci.com/gh/tokern/data-lineage)
[![codecov](https://codecov.io/gh/tokern/data-lineage/branch/master/graph/badge.svg)](https://codecov.io/gh/tokern/data-lineage)
[![PyPI](https://img.shields.io/pypi/v/data-lineage.svg)](https://pypi.python.org/pypi/data-lineage)
[![image](https://img.shields.io/pypi/l/data-lineage.svg)](https://pypi.org/project/data-lineage/)
[![image](https://img.shields.io/pypi/pyversions/data-lineage.svg)](https://pypi.org/project/data-lineage/)Tokern Lineage Engine is _fast_ and _easy to use_ application to collect, visualize and analyze
column-level data lineage in databases, data warehouses and data lakes in AWS and RDS.Tokern Lineage helps you browse column-level data lineage
* visually using [kedro-viz](https://github.com/quantumblacklabs/kedro-viz)
* analyze lineage graphs programmatically using the powerful [networkx graph library](https://networkx.org/)## Resources
* Demo of Tokern Lineage App
![data-lineage](https://user-images.githubusercontent.com/1638298/118261607-688a7100-b4d1-11eb-923a-5d2407d6bd8d.gif)
* Checkout an [example data lineage notebook](http://tokern.io/docs/data-lineage/example/).
* Check out [the post on using data lineage for cost control](https://tokern.io/blog/data-lineage-on-redshift/) for an
example of how data lineage can be used in production.## Quick Start
### Install a demo of using Docker and Docker Compose
Download the docker-compose file from Github repository.
# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/tokern-lineage-engine.yml -o docker-compose.ymlRun docker-compose
docker-compose up -d
Check that the containers are running.
docker ps
CONTAINER ID IMAGE CREATED STATUS PORTS NAMES
3f4e77845b81 tokern/data-lineage-viz:latest ... 4 hours ago Up 4 hours 0.0.0.0:8000->80/tcp tokern-data-lineage-visualizer
1e1ce4efd792 tokern/data-lineage:latest ... 5 days ago Up 5 days tokern-data-lineage
38be15bedd39 tokern/demodb:latest ... 2 weeks ago Up 2 weeks tokern-demodbTry out Tokern Lineage App
Head to `http://localhost:8000/` to open the Tokern Lineage app
### Install Tokern Lineage Engine
# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/tokern-lineage-engine.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml -o tokern-lineage-engine.ymlRun docker-compose
docker-compose up -d
If you want to use an external Postgres database, change the following parameters in `tokern-lineage-engine.yml`:
* CATALOG_HOST
* CATALOG_USER
* CATALOG_PASSWORD
* CATALOG_DBYou can also override default values using environement variables.
CATALOG_HOST=... CATALOG_USER=... CATALOG_PASSWORD=... CATALOG_DB=... docker-compose -f ... up -d
For more advanced usage of environment variables with docker-compose, [refer to docker-compose docs](https://docs.docker.com/compose/environment-variables/)
**Pro-tip**
If you want to connect to a database in the host machine, set
CATALOG_HOST: host.docker.internal # For mac or windows
#OR
CATALOG_HOST: 172.17.0.1 # Linux## Supported Technologies
* Postgres
* AWS Redshift
* Snowflake### Coming Soon
* SparkSQL
* Presto## Documentation
For advanced usage, please refer to [data-lineage documentation](https://tokern.io/docs/data-lineage/index.html)
## SurveyPlease take this [survey](https://forms.gle/p2oEQBJnpEguhrp3A) if you are a user or considering using data-lineage. Responses will help us prioritize features better.