Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/j-sephb-lt-n/dbt-example
Showcase of DBT functionality with a toy dataset
https://github.com/j-sephb-lt-n/dbt-example
Last synced: about 23 hours ago
JSON representation
Showcase of DBT functionality with a toy dataset
- Host: GitHub
- URL: https://github.com/j-sephb-lt-n/dbt-example
- Owner: J-sephB-lt-n
- License: gpl-3.0
- Created: 2024-07-05T20:36:09.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-08-01T19:26:40.000Z (3 months ago)
- Last Synced: 2024-08-01T21:54:44.984Z (3 months ago)
- Language: Python
- Size: 50.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# dbt-example
Showcase of dbt functionality with a toy dataset - I use this repo to remind myself of the features of DBT and how they work.
I am using a local DuckDB database.
Why use DBT? Because it encourages and facilitates data warehouse and software best practices through it's design:
- Data lineage (e.g. logging, snapshots)
- Documentation
- Testing
- Modularisation
- Separation of environments (e.g. dev, staging, prod)
- DRY
- Version control
The DBT best practice guide is amazing:
After running each step, you can see the changes made within the database itself using (e.g. run this in the duckdb cli):
```sql
select table_catalog, table_schema, table_name, table_type from information_schema.tables;
``````bash
python -m venv venv
source venv/bin/activate
pip install --no-deps -r requirements.txt
dbt --version
dbt debug # check that everything set up correctlypython -m simdata.init_db --db_name 'dev' # creates data in database databases/dev.duckdb
dbt source freshness # check freshness of input data
dbt run # run dbt models
dbt test # run all testsdbt snapshot # create snapshot tables
python -m simdata.update_payment_status_example --db_name 'dev' # adds a row to the bank table so can see pay_status_snapshot table working
dbt run # refresh pipeline to take into account new row in bank table
dbt snapshotdbt docs generate
dbt docs serve# tidy up
dbt clean # clean up dbt project
rm databases/* # remove databases
```although it is outside the repo, my ~/.dbt/profiles.yml looks like this:
```yaml
dbt_example:
target: dev
outputs:
dev:
type: duckdb
path: "databases/dev.duckdb"
schema: main # default target schema
threads: 4
```here are links to specific things in this project:
- [A custom generic data test](./tests/generic/test_is_valid_id.sql) (you can see it applied [here](./models/staging/data_tests.yml))
- You can run `python -m simdata.add_non_valid_order_id --db_name dev` to add a non-valid ID into the source data if you want to see the test fail (you will need to run `dbt run` before `dbt test` in order to see the test fail)