https://github.com/thesis/mezo-dbt
https://github.com/thesis/mezo-dbt
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/thesis/mezo-dbt
- Owner: thesis
- Created: 2025-07-22T12:06:54.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-08-08T12:42:44.000Z (11 months ago)
- Last Synced: 2025-08-08T14:37:59.772Z (11 months ago)
- Language: Dockerfile
- Size: 349 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# DBT DataWarhouse Transformations for Mezo
## Setup the dbt project locally
### Prerequisites
- Install [gcloud](https://cloud.google.com/sdk/docs/install)
- VSCode or any other code editor
- Optional: [dbt power user for vscode](https://marketplace.visualstudio.com/items?itemName=innoverio.vscode-dbt-power-user)
### Clone the Repository
```sh
git clone https://github.com/thesis/mezo-dbt
cd mezo-dbt
```
### Install Dependencies
- Install [uv](https://docs.astral.sh/uv/getting-started/installation/#installing-uv)
- Install Python dependencies (with uv):
```sh
uv sync
source .venv/bin/activate #Activate the venv
```
### Configure dbt [profile.yml](https://docs.getdbt.com/docs/core/connect-data-platform/profiles.yml) locally
- Create a .dbt folder in your home directory if it doesn’t exist:
```sh
mkdir ~/.dbt
touch ~/.dbt/profiles.yml
code ~/.dbt/profiles.yml ## https://code.visualstudio.com/docs/configure/command-line#_launching-from-command-line
##Or open with vim if you know how to close it.
vim ~/.dbt/profiles.yml ## or
```
- Edit profiles.yml file inside it with your BigQuery configuration:
```yml
mezo:
outputs:
dev:
type: bigquery
method: oauth
project:
dataset: dbt_yourname
location: EU
threads: 4
target: dev
```
- Authenticate with gcloud (creates local credentials JSON automatically):
```sh
gcloud auth login --enable-gdrive-access --update-adc
```
### Test your setup
```sh
dbt debug
```
### Install DBT Dependencies
```sh
dbt deps
```
For other dbt commands check:
[https://docs.getdbt.com/reference/dbt-commands](https://docs.getdbt.com/reference/dbt-commands)
### This projects uses [pre-commit](https://pre-commit.com/)
To run checks locally use:
```sh
pre-commit run --all-files --config .pre-commit-config_local.yaml
```
Use the following hook to run checks before commit:
```bash
INSTALL_PYTHON=/Users/benedikt/Documents/gitrepos/crfe-orc-cloud-composer/.venv/bin/python3
ARGS=(hook-impl --config=.pre-commit-config_local.yaml --hook-type=pre-commit)
# end templated
HERE="$(cd "$(dirname "$0")" && pwd)"
ARGS+=(--hook-dir "$HERE" -- "$@")
if [ -x "$INSTALL_PYTHON" ]; then
exec "$INSTALL_PYTHON" -mpre_commit "${ARGS[@]}"
elif command -v pre-commit > /dev/null; then
exec pre-commit "${ARGS[@]}"
else
echo '`pre-commit` not found. Did you forget to activate your virtualenv?' 1>&2
exit 1
fi
```
## How to Set Up a Goldsky Table
To set up a new table using Goldsky data in BigQuery:
Contact Goldsky Support: Email [Goldsky](support@goldsky.com) to request the setup of a new table to be imported into the `mezo-prod-dp-dwh-lnd-goldsky-cs-0` Google Cloud Storage (GCS) bucket. As of this writing, the [Goldsky documentation](https://docs.goldsky.com/mirror/extensions/channels/aws-s3) is limited, and self-service setup is not available—you must contact support to establish the connection.
### Organize Data in GCS
- For each import, create a separate folder in the GCS bucket.
- The folder structure should follow this pattern: `event_type=/event_date=/` (e.g., `event_type=donated/event_date=2025-05-22/`).
- This structure enables Hive partitioning of the table. For more details, see the [BigLake partitioned data documentation](https://cloud.google.com/bigquery/docs/create-cloud-storage-table-biglake#create-biglake-partitioned-data).
### Update dbt Source Configuration
- Edit the [models/00_sources/goldsky.yml](models/00_sources/goldsky.yml) file to add the new table definition.
- Use the existing configurations in the file as a template for your new entry as a reference.
- Ensure all relevant metadata, columns, and partitioning information are included.
### Register the Table in BigQuery
- The table will be created in BigQuery using the [dbt-external-tables](https://github.com/dbt-labs/dbt-external-tables) package.
- After updating the YAML file, run the following dbt command to create the external tables:
```sh
dbt run-operation stage_external_sources
```
- This command will register the external tables in BigQuery based on your configuration. This is automatically run during deployment and CI Process.
## Update the External Table in BigQuery
If the source file (e.g., Google Sheet) changes structure:
- Edit the corresponding YAML file in models/00_sources/
- Adjust schema, columns, or partitioning as needed.
- Re-stage the external table
```sh
dbt run-operation stage_external_sources
```
- These steps are also run automatically via GitHub Actions, but for local testing, you must run them manually.
## 📖 Documentation
This project includes autogenerated dbt documentation, hosted with GitHub Pages.
👉 [View the dbt docs](https://thesis.github.io/mezo-dbt/#!/overview)
The documentation site is automatically updated via GitHub Actions when changes are merged into the repository.