https://github.com/getindata/tpc-h-data-pipelines-demo
https://github.com/getindata/tpc-h-data-pipelines-demo
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/getindata/tpc-h-data-pipelines-demo
- Owner: getindata
- Created: 2022-06-03T08:45:31.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-07-07T18:02:01.000Z (almost 4 years ago)
- Last Synced: 2025-01-24T02:31:01.841Z (over 1 year ago)
- Language: Dockerfile
- Size: 1.2 MB
- Stars: 2
- Watchers: 6
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# TCP-H Data Pipelines Project
## Description
This is an example of a [Data Pipelines](https://data-pipelines-cli.readthedocs.io/en/latest/index.html) project that has data pipelines simulating operations of a fictional company.
The purpose is to show you how an advanced [Data Pipelines](https://data-pipelines-cli.readthedocs.io/en/latest/index.html) project can look like. This project can be used as an example
when you work on your [Data Pipelines](https://data-pipelines-cli.readthedocs.io/en/latest/index.html) project.
You can see the contents of the repository and see how ```models```, ```tests``` and ```seeds``` look like, so that the process of implementation
in your project is easier.
### If you want to:
* see a simple example of a template for [Data Pipelines](https://data-pipelines-cli.readthedocs.io/en/latest/index.html) project
* walk through a demo of [Data Pipelines](https://data-pipelines-cli.readthedocs.io/en/latest/index.html)
* see an explanation on how to use [Data Pipelines](https://data-pipelines-cli.readthedocs.io/en/latest/index.html)
check [here](https://github.com/getindata/first-steps-with-data-pipelines).
## Data used in this project example
We used data from [TCP-H](https://www.tpc.org/tpch/), which is a set of data sets that are used for benchmarking of
decision support systems. This project simulates data and processes of a fictional company. We placed the data in a
[GCP BigQuery](https://cloud.google.com/bigquery) dataset.
### Resources:
- More about [data-pipelines-cli](https://data-pipelines-cli.readthedocs.io/en/latest/usage.html#)
- More about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
- [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers about `dbt`
- Rendering project templates with [Copier](https://copier.readthedocs.io/en/stable/)
- Data pipelines orchestration with [Airlfow](https://airflow.apache.org/)
- More about [TCP-H](https://www.tpc.org/tpch/)