https://github.com/josephmachado/simple_dbt_project
Code for dbt tutorial
https://github.com/josephmachado/simple_dbt_project
analytics datawarehouse dbt postgres tutorial
Last synced: 6 months ago
JSON representation
Code for dbt tutorial
- Host: GitHub
- URL: https://github.com/josephmachado/simple_dbt_project
- Owner: josephmachado
- Created: 2020-04-25T18:17:21.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-05-31T19:16:35.000Z (over 1 year ago)
- Last Synced: 2025-04-15T02:58:01.395Z (6 months ago)
- Topics: analytics, datawarehouse, dbt, postgres, tutorial
- Homepage: https://www.startdataengineering.com/post/dbt-data-build-tool-tutorial
- Size: 43.4 MB
- Stars: 156
- Watchers: 4
- Forks: 82
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Repository for the following blogs
1. [dbt(data build tool) Tutorial](https://www.startdataengineering.com/post/dbt-data-build-tool-tutorial/)
# Setup
## Prerequisites
1. [python ^3.11](https://www.python.org/downloads/)
2. [git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)In addition to the tools, you would also need to know what dbt is, you can learn about it here: [dbt tutorial](https://www.startdataengineering.com/post/dbt-data-build-tool-tutorial/).
Clone the git repo as shown below:
```bash
git clone https://github.com/josephmachado/simple_dbt_project.git
cd simple_dbt_project
```## Demo on CodeSpaces
Here is a demo of how to run this on CodeSpaces:
[](https://youtu.be/UEVuIKmx5X0)
Setup python virtual environment as shown below:
```bash
rm -rf myenv
# set up venv and run dbt
python -m venv myenv
source myenv/bin/activate
pip install -r requirements.txt
```# Run dbt
Run dbt commands as shown below:
```bash
dbt clean
dbt deps
dbt snapshot
dbt run
dbt test
dbt docs generate
dbt docs serve
```Go to [http://localhost:8080](http://localhost:8080) to see the dbt documentation. If you are running this on GitHub CodeSpaces, follow [this section]() to expose port 8080 for access from your browser.
Press Ctrl + c to stop the document server.
# Create snapshots
Let's do some testing, Insert some data into source customer table(in our case the new_customer data is appended into customers.csv), to demonstrate dbt snapshots. Since we are using duckdb and the base table is essentially data at [customer.csv](./raw_data/customer.csv) we have to append new data to this customer.csv file as shown below:
```bash
# Remove header from ./raw_data/customers_new.csv
# and append it to ./raw_data/customers.csv
echo "" >> ./raw_data/customers.csv
tail -n +2 ./raw_data/customer_new.csv >> ./raw_data/customers.csv# NOTE: Windows users need to do this manually or via powershell as
```Run snapshot and create models again.
```bash
dbt snapshot
dbt run
``````bash
# reset customers.csv
head -n -5 ./raw_data/customers.csv > temp
cat temp > ./raw_data/customers.csv
rm temp
```Let's open a python REPL and check our data, as shown below:
```python
import duckdb
con = duckdb.connect("dbt.duckdb")
results = con.execute("select * from snapshots.customers_snapshot where customer_id = 82").fetchall()
for row in results:
print(row)
# NOTE: You will see 2 rows printed
exit()
```