An open API service indexing awesome lists of open source software.

https://github.com/thesis/mezo-dbt


https://github.com/thesis/mezo-dbt

Last synced: 4 months ago
JSON representation

Awesome Lists containing this project

README

          

# DBT DataWarhouse Transformations for Mezo

## Setup the dbt project locally

### Prerequisites

- Install [gcloud](https://cloud.google.com/sdk/docs/install)
- VSCode or any other code editor
- Optional: [dbt power user for vscode](https://marketplace.visualstudio.com/items?itemName=innoverio.vscode-dbt-power-user)

### Clone the Repository

```sh
git clone https://github.com/thesis/mezo-dbt
cd mezo-dbt
```

### Install Dependencies

- Install [uv](https://docs.astral.sh/uv/getting-started/installation/#installing-uv)
- Install Python dependencies (with uv):

```sh
uv sync
source .venv/bin/activate #Activate the venv
```

### Configure dbt [profile.yml](https://docs.getdbt.com/docs/core/connect-data-platform/profiles.yml) locally

- Create a .dbt folder in your home directory if it doesn’t exist:

```sh
mkdir ~/.dbt
touch ~/.dbt/profiles.yml
code ~/.dbt/profiles.yml ## https://code.visualstudio.com/docs/configure/command-line#_launching-from-command-line
##Or open with vim if you know how to close it.
vim ~/.dbt/profiles.yml ## or
```

- Edit profiles.yml file inside it with your BigQuery configuration:

```yml
mezo:
outputs:
dev:
type: bigquery
method: oauth
project:
dataset: dbt_yourname
location: EU
threads: 4
target: dev
```

- Authenticate with gcloud (creates local credentials JSON automatically):

```sh
gcloud auth login --enable-gdrive-access --update-adc
```

### Test your setup

```sh
dbt debug
```

### Install DBT Dependencies

```sh
dbt deps
```

For other dbt commands check:
[https://docs.getdbt.com/reference/dbt-commands](https://docs.getdbt.com/reference/dbt-commands)

### This projects uses [pre-commit](https://pre-commit.com/)

To run checks locally use:

```sh
pre-commit run --all-files --config .pre-commit-config_local.yaml
```

Use the following hook to run checks before commit:

```bash
INSTALL_PYTHON=/Users/benedikt/Documents/gitrepos/crfe-orc-cloud-composer/.venv/bin/python3
ARGS=(hook-impl --config=.pre-commit-config_local.yaml --hook-type=pre-commit)

# end templated

HERE="$(cd "$(dirname "$0")" && pwd)"
ARGS+=(--hook-dir "$HERE" -- "$@")

if [ -x "$INSTALL_PYTHON" ]; then
exec "$INSTALL_PYTHON" -mpre_commit "${ARGS[@]}"
elif command -v pre-commit > /dev/null; then
exec pre-commit "${ARGS[@]}"
else
echo '`pre-commit` not found. Did you forget to activate your virtualenv?' 1>&2
exit 1
fi
```

## How to Set Up a Goldsky Table

To set up a new table using Goldsky data in BigQuery:

Contact Goldsky Support: Email [Goldsky](support@goldsky.com) to request the setup of a new table to be imported into the `mezo-prod-dp-dwh-lnd-goldsky-cs-0` Google Cloud Storage (GCS) bucket. As of this writing, the [Goldsky documentation](https://docs.goldsky.com/mirror/extensions/channels/aws-s3) is limited, and self-service setup is not available—you must contact support to establish the connection.

### Organize Data in GCS

- For each import, create a separate folder in the GCS bucket.
- The folder structure should follow this pattern: `event_type=/event_date=/` (e.g., `event_type=donated/event_date=2025-05-22/`).
- This structure enables Hive partitioning of the table. For more details, see the [BigLake partitioned data documentation](https://cloud.google.com/bigquery/docs/create-cloud-storage-table-biglake#create-biglake-partitioned-data).

### Update dbt Source Configuration

- Edit the [models/00_sources/goldsky.yml](models/00_sources/goldsky.yml) file to add the new table definition.
- Use the existing configurations in the file as a template for your new entry as a reference.
- Ensure all relevant metadata, columns, and partitioning information are included.

### Register the Table in BigQuery

- The table will be created in BigQuery using the [dbt-external-tables](https://github.com/dbt-labs/dbt-external-tables) package.
- After updating the YAML file, run the following dbt command to create the external tables:

```sh
dbt run-operation stage_external_sources
```

- This command will register the external tables in BigQuery based on your configuration. This is automatically run during deployment and CI Process.

## Update the External Table in BigQuery

If the source file (e.g., Google Sheet) changes structure:
- Edit the corresponding YAML file in models/00_sources/
- Adjust schema, columns, or partitioning as needed.
- Re-stage the external table

```sh
dbt run-operation stage_external_sources
```

- These steps are also run automatically via GitHub Actions, but for local testing, you must run them manually.

## 📖 Documentation

This project includes autogenerated dbt documentation, hosted with GitHub Pages.
👉 [View the dbt docs](https://thesis.github.io/mezo-dbt/#!/overview)
The documentation site is automatically updated via GitHub Actions when changes are merged into the repository.