https://github.com/frizzleqq/f1-dbt-duckdb
Data pipelines with dragster, dbt and duckdb using Ergast F1 data as source
https://github.com/frizzleqq/f1-dbt-duckdb
dagster dbt duckdb ergast-data
Last synced: 4 months ago
JSON representation
Data pipelines with dragster, dbt and duckdb using Ergast F1 data as source
- Host: GitHub
- URL: https://github.com/frizzleqq/f1-dbt-duckdb
- Owner: frizzleqq
- Created: 2022-08-21T13:45:04.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2025-04-11T13:44:45.000Z (about 1 year ago)
- Last Synced: 2025-04-11T14:10:09.495Z (about 1 year ago)
- Topics: dagster, dbt, duckdb, ergast-data
- Language: Python
- Homepage:
- Size: 800 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# F1 warehouse with DuckDB
Project with
* Orchestration: [dagster](https://docs.dagster.io/)
* Transformation & Testing: [dbt](https://docs.getdbt.com/)
* Processing Engine & Database: [DuckDB](https://duckdb.org/)
* Data Source: [Ergast API](http://ergast.com/mrd/) (Ergast is no longer being updated post 2024 season)
Cutout of the dagster lineage graph:

Example data in MotherDuck:

## Development
* Python >= 3.10 https://www.python.org/downloads/
This project uses `pyproject.toml` to describe package metadata and [uv](https://github.com/astral-sh/uv) to manage dependencies.
### Install uv
https://docs.astral.sh/uv/getting-started/installation/
### Setup environment
Create virtual environment
```bash
uv venv
```
Following command installs Python dependencies:
* The `[dev]` also installs development tools.
* The `--editable` makes the CLI script available.
```bash
uv pip install --editable .[dev]
```
Sync `uv` environment:
```bash
uv sync --extra dev
```
### Activate virtual environment
Bash:
```bash
source .venv/bin/activate
```
Windows:
```powershell
.venv\Scripts\activate
```
### Dagster
Dagster uses environment variables located in [.env](.env).
Please copy the `.env_template` over to `.env` or set the `DATA_DIR` environment variable.
Start local dagster server
```bash
dagster dev
```
### Dagster CLI
Launch dagster job without
```bash
dagster job execute -m foneplatform -j ergast_job
```
### Development Tools
* Code linting/formatting: `ruff`
* type checking: `mypy`
* SQL linting/formatting: `sqlfluff`
## Environments
Based on the environment variable `ENVIRONMENT` we do:
* `dev`: Locally a `f1.duckdb` database will be created within the `DATA_DIR` (defaults to "data" within project directory)
* `md`: Connects to MotherDuck to store the `f1` database using the `MOTHERDUCK_TOKEN`
Use the `.env` file to set environment variables
## dbt
Install dbt_utils:
```
dbt deps --project-dir="./dbt" --profiles-dir="./dbt"
```
Run models:
```
dbt run --project-dir="./dbt" --profiles-dir="./dbt"
```
Run tests:
```
dbt test --project-dir="./dbt" --profiles-dir="./dbt"
```
### sqlfluff
Run SQL linter on dbt models:
> **_NOTE:_** This may require setting the `DATA_DIR` environment variable to be set to the `data` directory containing the duckdb database.
```
sqlfluff lint ./dbt/models/core
```
## Staging
Staging is done by a Dagster Multi-Asset ([./foneplatform/assets/ergast.py](./foneplatform/assets/ergast.py)):
1. Downloading ZIP of CSV files (http://ergast.com/downloads/f1db_csv.zip)
1. Read CSV using DuckDB and store the asset-result in DuckDB using the `DuckDBDuckDBIOManager`
1. dbt will create models in the same DuckDB database