An open API service indexing awesome lists of open source software.

https://github.com/getindata/tpc-h-data-pipelines-demo


https://github.com/getindata/tpc-h-data-pipelines-demo

Last synced: 3 months ago
JSON representation

Awesome Lists containing this project

README

          

# TCP-H Data Pipelines Project

## Description

This is an example of a [Data Pipelines](https://data-pipelines-cli.readthedocs.io/en/latest/index.html) project that has data pipelines simulating operations of a fictional company.
The purpose is to show you how an advanced [Data Pipelines](https://data-pipelines-cli.readthedocs.io/en/latest/index.html) project can look like. This project can be used as an example
when you work on your [Data Pipelines](https://data-pipelines-cli.readthedocs.io/en/latest/index.html) project.
You can see the contents of the repository and see how ```models```, ```tests``` and ```seeds``` look like, so that the process of implementation
in your project is easier.

### If you want to:

* see a simple example of a template for [Data Pipelines](https://data-pipelines-cli.readthedocs.io/en/latest/index.html) project
* walk through a demo of [Data Pipelines](https://data-pipelines-cli.readthedocs.io/en/latest/index.html)
* see an explanation on how to use [Data Pipelines](https://data-pipelines-cli.readthedocs.io/en/latest/index.html)

check [here](https://github.com/getindata/first-steps-with-data-pipelines).

## Data used in this project example

We used data from [TCP-H](https://www.tpc.org/tpch/), which is a set of data sets that are used for benchmarking of
decision support systems. This project simulates data and processes of a fictional company. We placed the data in a
[GCP BigQuery](https://cloud.google.com/bigquery) dataset.

### Resources:

- More about [data-pipelines-cli](https://data-pipelines-cli.readthedocs.io/en/latest/usage.html#)
- More about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
- [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers about `dbt`
- Rendering project templates with [Copier](https://copier.readthedocs.io/en/stable/)
- Data pipelines orchestration with [Airlfow](https://airflow.apache.org/)
- More about [TCP-H](https://www.tpc.org/tpch/)