Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xpcosmos/jaffle-shop
Modern Data Stack with DBT, PySpark, PostgresSQL and Docker
https://github.com/xpcosmos/jaffle-shop
dbt docker docker-compose pyspark python spark
Last synced: about 1 month ago
JSON representation
Modern Data Stack with DBT, PySpark, PostgresSQL and Docker
- Host: GitHub
- URL: https://github.com/xpcosmos/jaffle-shop
- Owner: xpcosmos
- Created: 2024-07-25T22:30:20.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-08-03T20:02:19.000Z (3 months ago)
- Last Synced: 2024-09-29T07:01:40.805Z (about 2 months ago)
- Topics: dbt, docker, docker-compose, pyspark, python, spark
- Language: Python
- Homepage:
- Size: 6.29 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ETL flow with Spark, Postgres, Docker and dbt
## Overview
This DBT project is designed to transform raw data into a clean and structured format for analysis. It leverages DBT's capabilities to create data models, run tests, and generate documentation. This project is motivated by the course provided by dbt. The conclusion badged can be acessed [clicking here](https://api.accredible.com/v1/auth/invite?code=ca670980af5f4d0e59f1&credential_id=512ed5f1-176e-489b-ba28-6c41001e8e45&url=https%3A%2F%2Fcredentials.getdbt.com%2F512ed5f1-176e-489b-ba28-6c41001e8e45&ident=639e5f9b-990e-4d90-94a4-dfec9ed7555b/)
This project provides the materialization of the transformation lineage bellow:
![lineage](src/lineage.png)
## Requirements
- **DBT version**: 1.8 or higher
- **Python version**: 3.11 or higher
- **Supported databases:** PostgreSQL## Setup
### DBT
1. **Install DBT**
Follow the [dbt core installation guide](https://docs.getdbt.com/docs/core/installation-overview)2. **Clone the Repository**
```bash
git clone https://github.com/xpcosmos/jaffle-shop.git
cd jaffle-shop
```3. **Configure DBT Profile**
Edit `profiles.yml` with your database connection details. Use environment variables for sensitive information.
## Usage
1. **Create Database and insert data**
Use the command bellow to create and insert data into the PostgresSQL running inside the container.
```bash
docker compose up -d
```2. **Run DBT transformation**
After installing dbt and ensure that the container is running, you must be able to build the models to run and test the transformation flow
```bash
dbt build
```## Documentation
Generate and view documentation:
```bash
dbt docs generate
dbt docs serve
```## Contributing
Contributions are welcome! Please fork the repository and open a pull request with your changes.