{"id":17277860,"url":"https://github.com/florianwilhelm/wald-stack-demo","last_synced_at":"2025-10-11T17:27:24.206Z","repository":{"id":81535291,"uuid":"584673013","full_name":"FlorianWilhelm/wald-stack-demo","owner":"FlorianWilhelm","description":"🌳 WALD Stack Demo 🏎️","archived":false,"fork":false,"pushed_at":"2024-09-18T19:01:30.000Z","size":27933,"stargazers_count":32,"open_issues_count":0,"forks_count":6,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-30T08:07:04.605Z","etag":null,"topics":["airbyte","bigquery","data-analysis","dbt","lightdash","python","snowflake","snowpark"],"latest_commit_sha":null,"homepage":"https://waldstack.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FlorianWilhelm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-03T08:09:12.000Z","updated_at":"2024-12-13T07:15:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"12ee23ca-692d-4d0f-b9ca-e1bee4882049","html_url":"https://github.com/FlorianWilhelm/wald-stack-demo","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/FlorianWilhelm/wald-stack-demo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianWilhelm%2Fwald-stack-demo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianWilhelm%2Fwald-stack-demo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianWilhelm%2Fwald-stack-demo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianWilhelm%2Fwald-stack-demo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FlorianWilhelm","download_url":"https://codeload.github.com/FlorianWilhelm/wald-stack-demo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianWilhelm%2Fwald-stack-demo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279008117,"owners_count":26084396,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airbyte","bigquery","data-analysis","dbt","lightdash","python","snowflake","snowpark"],"created_at":"2024-10-15T09:10:12.866Z","updated_at":"2025-10-11T17:27:24.188Z","avatar_url":"https://github.com/FlorianWilhelm.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/wald-stack-social-banner.png\" alt=\"WALD stack logo\" width=\"600\" role=\"img\"\u003e\n\u003c/div\u003e\n\n# WALD: The Modern \u0026 Sustainable Analytics Stack\n\nThe name **WALD**-stack stems from the four technologies it is composed of, i.e. a cloud-computing **W**arehouse\nlike [Snowflake] or [Google BigQuery], the open-source data integration engine [**A**irbyte], the open-source full-stack\nBI platform [**L**ightdash], and the open-source data transformation tool [**D**BT].\n\nThis demonstration projects showcases the WALD-stack in a minimal example. It makes use of the\n[Kaggle Formula 1 World Championship dataset] and the data warehouse [Snowflake]. To allow the definition of\n[Python]-based models within [dbt Core] also Snowflake's [Snowpark]-feature is enabled. For analytics and BI\nwe use the graphical BI-tool [Lightdash], which is a suitable addition from the dbt-ecosystem.\n\nThe WALD-stack is sustainable since it consists mainly of open-source technologies, however all technologies are also\noffered as managed cloud services. The data warehouse itself, i.e. [Snowflake] or [Google BigQuery], is the only non-open-source\ntechnology in the WALD-stack. In case of Snowflake, only the clients, eg. [snowflake-connector-python] and\n[snowflake-snowpark-python], are available as open-source software.\n\nTo manage the Python environment and dependencies in this demonstration, we make use of [Mambaforge], which is a faster\nand open-source alternative to [Anaconda].\n\n🎬 **Check out the [slides] of the [PyConDE / PyData talk about the WALD Stack].**\n\n## Getting started\n\n1. Setting up the data **W**arehouse [Snowflake], i.e.:\n   1. [register a 30-day free trial Snowflake account] and choose the standard edition, AWS as cloud provider and any\n      region you want,\n   2. check the Snowflake e-mail for your *account-identifier*, which is specified by the URL you are given, e.g.\n      like `https://\u003caccount_identifier\u003e.snowflakecomputing.com`,\n   3. [log into Snowflake's Snowsight UI] using your *account-identifier*,\n   4. check if [Snowflake's TPC-H sample database] `SNOWFLAKE_SAMPLE_DATA` is available under \u003ckbd\u003eData\u003c/kbd\u003e » \u003ckbd\u003eDatabases\u003c/kbd\u003e\n      or create it under \u003ckbd\u003eData\u003c/kbd\u003e » \u003ckbd\u003ePrivate Sharing\u003c/kbd\u003e » \u003ckbd\u003eSAMPLE_DATA\u003c/kbd\u003e and name it `SNOWFLAKE_SAMPLE_DATA`.\u003cbr\u003e\n   5. create a new database named `MY_DB` with owner `ACCOUNTADMIN` by clicking \u003ckbd\u003eData\u003c/kbd\u003e » \u003ckbd\u003eDatabases\u003c/kbd\u003e » \u003ckbd\u003e+ Database\u003c/kbd\u003e (upper right corner)\n      and entering `MY_DB` in the emerging New Database form,\n   6. [activate Snowpark and third-party packages] by clicking on your login name followed by \u003ckbd\u003eSwitch Role\u003c/kbd\u003e » \u003ckbd\u003eORGADMIN\u003c/kbd\u003e.\n      Only if \u003ckbd\u003eORGADMIN\u003c/kbd\u003e doesn't show in the drop-down menu, go to \u003ckbd\u003eWorksheets\u003c/kbd\u003e » \u003ckbd\u003e+ Worksheet\u003c/kbd\u003e and execute:\n      ```SQL\n      use role accountadmin;\n\n      grant role orgadmin to user YOUR_USERNAME;\n      ```\n      This should add `ORGADMIN` to the list. Now click \u003ckbd\u003eAdmin\u003c/kbd\u003e » \u003ckbd\u003eBilling\u003c/kbd\u003e » \u003ckbd\u003eTerms \u0026 Billing\u003c/kbd\u003e,\n      and click \u003ckbd\u003eEnable\u003c/kbd\u003e next to `Anaconda Python packages`. The Anaconda Packages (Preview Feature) dialog opens,\n      and you need to agree to the terms by clicking \u003ckbd\u003eAcknowledge \u0026 Continue\u003c/kbd\u003e.\n   7. choose a warehouse (which is a compute-cluster in Snowflake-speak) by clicking on \u003ckbd\u003eWorksheets\u003c/kbd\u003e and selecting\n      \u003ckbd\u003eTutorial 1: Sample queries on TPC-H data\u003c/kbd\u003e. Now click on the role button showing \u003ckbd\u003eACCOUNTADMIN · No Warehouse\u003c/kbd\u003e\n      on the upper right and select the warehouse \u003ckbd\u003eCOMPUTE_WH\u003c/kbd\u003e or create a new one. Note the name of the warehouse\n      for the dbt setup later,\n   8. execute *all* statements from the tutorial worksheet to see if everything was set up correctly.\n\n2. Setting up [**D**BT] and [Snowpark] locally, i.e.:\n   1. clone this repository with `git clone https://github.com/FlorianWilhelm/wald-stack-demo.git`,\n   2. change into the repository with `cd wald-stack-demo`,\n   3. make sure you have [Mambaforge] installed,\n   4. set up the mamba/conda environment `wald-stack` with:\n      ```\n      mamba env create -f environment.yml\n      ```\n   5. activate the environment with `mamba activate wald-stack`,\n   6. create a directory `~/.dbt/` and a file `profiles.yml` in it, with content:\n      ```yaml\n      default:\n        outputs:\n          dev:\n            account: your_account-identifier\n            database: MY_DB\n            password: your_password\n            role: accountadmin\n            schema: WALD_STACK_DEMO\n            threads: 1\n            type: snowflake\n            user: your_username\n            warehouse: COMPUTE_WH\n        target: dev\n      ```\n      and set `account`, `password` as well as `user` accordingly. **Note** that `account` is the Snowflake Account identifier,\n      e.g. `DWABNEV.LRB61572`, but the `.` replaced by `-`, e.g. `DWABNEV-LRB61572`.\n      Also check that the value of `warehouse` corresponds to the one you have in Snowflake,\n   7. test that your connection works by running `dbt debug` in the directory of this repo. You should see \"All checks passed!\"-message.\n\n3. Setting up [**A**irbyte] locally, i.e.:\n   1. make sure you have [docker] installed,\n   2. install it with:\n      ```commandline\n      git clone https://github.com/airbytehq/airbyte.git\n      cd airbyte\n      docker compose up\n      ```\n   3. check if the front-end comes up at [http://localhost:8000](http://localhost:8000) and log in with\n      username `airbyte` and password `password`,\n   4. enter some e-mail address and click continue. The main dashboard should show up.\n\n4. Set up [**L**ightdash] locally, i.e.:\n   1. make sure you have [docker] installed,\n   2. install Lightdash locally by following the [local deployment instructions], i.e.:\n      ```commandline\n      cd .. # to leave \"wald-stack-demo\" if necessary\n      git clone https://github.com/lightdash/lightdash\n      cd lightdash\n      ./scripts/install.sh # and choose \"Custom install\", enter the path to your dbt project from above\n      ```\n   3. check if the front-end comes up at [http://localhost:8080](http://localhost:8080).\n   4. install the `lightdash` CLI command following the [how-to-install-the-lightdash-cli] docs.\n   5. authenticate the CLI and connect the `wald_stack` dbt project by running `lightdash login http://localhost:8080`.\n\n\u003e **Note**\n\u003e If you use [Colima] as a Docker alternative, the installation script will fail, caused by the function supposed to start Docker Desktop. A simple fix is to comment out the line calling the `start_docker` function (line 417). Be sure that your Docker daemon is already running.\n\u003e Additionally IPv6 is not properly implemented, which results in not being able to authenticate lightdash CLI using `localhost` as host. Use `lightdash login http://127.0.0.1:8080` instead to force IPv4.\n\n\u003e **Note**\n\u003e If you have improvements for this example, please consider contributing back by creating a pull request. To have it\n\u003e all nice and tidy, please make sure to install \u0026 setup [pre-commit], i.e. `pip install pre-commit` and `pre-commit install`,\n\u003e so that all your commits conform automatically to the style guides used in this project.\n\n## Demonstration of the WALD-stack\n\nTo demonstrate the power of the WALD stack we will:\n\n1. ingest a Formula 1 dataset into [Snowflake] using Snowflake's internal capabilities,\n2. use [Airbyte] to exemplify how external data sources, in our case a csv file with weather information, can be ingested into Snowflake,\n3. use [dbt] to transform the raw data using SQL and Python leveraging [Snowpark] for data analysis as well as train \u0026 predict the position in a race using some simple [Scikit-Learn] model,\n4. use [Lightdash] to visualise the results and demonstrate its ad-hoc analysis capabilities.\n\n### Ingesting the Formula 1 Dataset\n\nTo have same data to play around we are going to use the [Kaggle Formula 1 World Championship dataset], which is luckily\navailable on some S3 bucket. To ingest the data into Snowflake, just execute the script [ingest_formula1_from_s3_to_snowflake.sql]\nwithin a notebook of the Snowsight UI. Just select all rows and hit the run button.\n\nThe following figure shows database entities, relationships, and characteristics of the data:\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/db-schema.png\" alt=\"Formula 1 database schemas\" width=\"800\" role=\"img\"\u003e\n\u003c/div\u003e\n\n### Ingesting the weather data with Airbyte\n\nTo get our hands on some data we can ingest into our warehouse, let's take some [weather data from opendatasoft], which\nis located in the `seeds` folder. For Airbyte to find it, we need to copy it into the running Airbyte [docker] container with:\n```commandline\ndocker cp seeds/cameri_weather.csv airbyte-server:/tmp/workspace/cameri_weather.csv\n```\nIt is certainly not necessary to point out that this is purely for testing the stack and in a production setting, one\nwould rather choose some S3 bucket or a completely different data source like [Kafka].\n\nBefore we start using Airbyte, let's first set up a new database and schema for the data we are about to ingest.\nOpen a notebook in Snowsight and execute:\n```sql\nCREATE DATABASE WEATHER;\nUSE DATABASE WEATHER;\nCREATE SCHEMA RAW;\n```\n\nLet's fire up the Airbyte web UI under [http://localhost:8000](http://localhost:8000) where you should see this after having logged in:\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/airbyte-welcome.png\" alt=\"Welcome screen of Airbyte\" width=\"500\" role=\"img\"\u003e\n\u003c/div\u003e\n\nNow click on \u003ckbd\u003eCreate your first connection\u003c/kbd\u003e and select `File` as source type and fill out the form like this:\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/airbyte-source.png\" alt=\"Source selection of Airbyte\" width=\"500\" role=\"img\"\u003e\n\u003c/div\u003e\n\nFor the `Reader Options`, just copy \u0026 paste the following string:\n\n```json\n{\"sep\":\";\", \"header\": 0, \"names\": [\"ghcn_din\", \"date\", \"prcp\", \"snow\", \"tmax\", \"tmin\", \"elevation\", \"name\", \"coord\", \"country_code\"]}\n```\n\nHit \u003ckbd\u003eSet up Source\u003c/kbd\u003e and select \u003ckbd\u003eSnowflake\u003c/kbd\u003e in the next form as destination type. No you should see a detailed form\nto set up the Snowflake destination. Enter the values like this with the corresponding settings from the Snowflake setup\nfrom above. Remember that the `host` url follows the schema `\u003caccount_identifier\u003e.snowflakecomputing.com`.\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/airbyte-destination.png\" alt=\"Destination selection of Airbyte\" width=\"500\" role=\"img\"\u003e\n\u003c/div\u003e\n\nThen hit \u003ckbd\u003eSet up destination\u003c/kbd\u003e and see a new form popping up. We just stick with the sane defaults provided to us.\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/airbyte-setup-details.png\" alt=\"Setup details of Airbyte connection\" width=\"500\" role=\"img\"\u003e\n\u003c/div\u003e\n\nAfter hitting \u003ckbd\u003eSet up connection\u003c/kbd\u003e, you should see that Airbyte starts syncing our weather data to Snowflake.\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/airbyte-sync.png\" alt=\"Airbyte syncs the weather data\" width=\"500\" role=\"img\"\u003e\n\u003c/div\u003e\n\nAfter roughly a minute, the sync should be successfully completed.\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/airbyte-sync-succeeded.png\" alt=\"Airbyte sync succeeded\" width=\"500\" role=\"img\"\u003e\n\u003c/div\u003e\n\nAirbyte has a lot more to offer since it has hundreds of sources and destinations for syncing. For our demonstration, however, that is all we need.\nNote that Airbyte integrates nicely with [dbt] and you can even specify your dbt transformations in Airbyte directly. There is much more to discover here :-)\nIt should also be noted that uploading a simple csv file into Snowflake could also have been done using [dbt's seed] command.\n\n### **D**BT\n\nSince everything is already set up for you in this repository, just don't forget to activate the mamba environment with `mamba activate wald-stack` before\nyou run dbt with `dbt run` in the directory of this repo. You should see an output like this:\n```commandline\n16:30:55  Running with dbt=1.3.1\n16:30:55  Found 22 models, 17 tests, 0 snapshots, 0 analyses, 501 macros, 0 operations, 3 seed files, 9 sources, 0 exposures, 0 metrics\n16:30:55\n16:30:57  Concurrency: 1 threads (target='dev')\n16:30:57\n16:30:57  1 of 22 START sql view model WALD_STACK_DEMO.stg_f1_circuits ................... [RUN]\n16:30:58  1 of 22 OK created sql view model WALD_STACK_DEMO.stg_f1_circuits .............. [SUCCESS 1 in 0.75s]\n16:30:58  2 of 22 START sql view model WALD_STACK_DEMO.stg_f1_constructors ............... [RUN]\n16:30:59  2 of 22 OK created sql view model WALD_STACK_DEMO.stg_f1_constructors .......... [SUCCESS 1 in 1.06s]\n16:30:59  3 of 22 START sql view model WALD_STACK_DEMO.stg_f1_drivers .................... [RUN]\n16:31:00  3 of 22 OK created sql view model WALD_STACK_DEMO.stg_f1_drivers ............... [SUCCESS 1 in 0.75s]\n16:31:00  4 of 22 START sql view model WALD_STACK_DEMO.stg_f1_lap_times .................. [RUN]\n16:31:00  4 of 22 OK created sql view model WALD_STACK_DEMO.stg_f1_lap_times ............. [SUCCESS 1 in 0.73s]\n16:31:00  5 of 22 START sql view model WALD_STACK_DEMO.stg_f1_pit_stops .................. [RUN]\n16:31:01  5 of 22 OK created sql view model WALD_STACK_DEMO.stg_f1_pit_stops ............. [SUCCESS 1 in 0.72s]\n16:31:01  6 of 22 START sql view model WALD_STACK_DEMO.stg_f1_races ...................... [RUN]\n16:31:02  6 of 22 OK created sql view model WALD_STACK_DEMO.stg_f1_races ................. [SUCCESS 1 in 0.77s]\n16:31:02  7 of 22 START sql view model WALD_STACK_DEMO.stg_f1_results .................... [RUN]\n16:31:03  7 of 22 OK created sql view model WALD_STACK_DEMO.stg_f1_results ............... [SUCCESS 1 in 0.70s]\n16:31:03  8 of 22 START sql view model WALD_STACK_DEMO.stg_f1_status ..................... [RUN]\n16:31:03  8 of 22 OK created sql view model WALD_STACK_DEMO.stg_f1_status ................ [SUCCESS 1 in 0.67s]\n...\n```\nUsing the Snowsight UI you can now explore the created tables in the database `MY_DB`. From an analyst's perspective,\nthe tables created from [models/marts/aggregates] are interesting as here Python is used to retrieve summary statistics\nabout pit stops by constructor in table `FASTEST_PIT_STOPS_BY_CONSTRUCTOR` and the 5-year rolling average of pit stop times\nalongside the average for each year is shown in table `LAP_TIMES_MOVING_AVG`.\n\nFrom a data scientist's perspective, it's really nice to see how easy it is to use [Scikit-Learn] to train an ML-model,\nstore it away using a Snowflake stage and loading it again for prediction. Check out the files under [models/marts/ml]\nto see how easy that is with [Snowpark] and also take a look at the resulting tables `TRAIN_TEST_POSITION` and `PREDICT_POSITION`.\n\nBesides transformations, [dbt] has much more to offer like unit tests. Run some predefined unit test examples with `dbt test`.\nAnother outstanding feature of dbt is how easy it is to create useful documentation for your users and yourself. To test\nit just run `dbt docs generate` followed by `dbt docs serve --port 8081` (on the default port 8080 Lightdash is running)\nand open [http://localhost:8081](http://localhost:8081). In this web UI you can explore your tables, columns, metrics, etc.\nand even get a useful lineage graph of your data:\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/dbt-lineage-graph.png\" alt=\"dbt lineage graph\" width=\"800\" role=\"img\"\u003e\n\u003c/div\u003e\n\nFinally, don't forget to check out the [References \u0026 Resources](#references--resources) for more information on learning dbt.\n\n### **L**ightdash\n\nThe Lightdash Web UI let's you do two basic things, i.e. running *ad-hoc queries* or construct queries with the intent\nto save their results as *charts*. Different charts can then be placed within *dashboards*. Charts and dashboards can be\norganized within *spaces*. Here is a basic view of Lightdash:\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/lightdash-overview.png\" alt=\"Main view of Lightdash and a click on the new-button\" width=\"800\" role=\"img\"\u003e\n\u003c/div\u003e\n\nFor demonstration purposes, let's run an ad-hoc query to take a look at the [weather-analsysis] table. For that, just hit\n\u003ckbd\u003e+ New\u003c/kbd\u003e and select \u003ckbd\u003eQuery using SQL runner\u003c/kbd\u003e. All we need to do is to select the table `weather_analsysis` from\nthe left menu, adjust the query and hit the \u003ckbd\u003e▶ Run query\u003c/kbd\u003e button. That should look like this:\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/lightdash-adhoc-query.png\" alt=\"Results of an ad-hoc query\" width=\"800\" role=\"img\"\u003e\n\u003c/div\u003e\n\nNow let's try to construct a chart by clicking on \u003ckbd\u003e+ New\u003c/kbd\u003e and select \u003ckbd\u003eQuery from tables\u003c/kbd\u003e. We select from\nthe left menu the table `Int lap times years` and choose the *metric* `Lap times in seconds` followed by the *dimensions* `Race name`\nand `Driver year` and filter for the race names italian and british grand prix. We then hit \u003ckbd\u003eConfigure\u003c/kbd\u003e and group\nby `Race name` and also set a horizontal bar char. The result looks like this:\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/lightdash-chart-grandprix.png\" alt=\"Horizontal bar chart of lap times over years for two grand pix\" width=\"800\" role=\"img\"\u003e\n\u003c/div\u003e\n\nIf you wonder about the concept of metrics and dimensions that dbt and lightdash are using you can find a [good introduction here](https://docs.lightdash.com/get-started/setup-lightdash/intro-metrics-dimensions).\n\nWe can now hit the \u003ckbd\u003eSave chart\u003c/kbd\u003e-button and save it into one of our spaces. If you haven't yet one, you can create one at that point.\nIn appearing chart, view click on \u003ckbd\u003e...\u003c/kbd\u003e and select \u003ckbd\u003eAdd chart to dashboard\u003c/kbd\u003e. Select a dashboard or create a new one.\nNow use \u003ckbd\u003eBrowse\u003c/kbd\u003e » \u003ckbd\u003eAll dashboards\u003c/kbd\u003e to find your newly created dashboard. This shows a similar dashboard with two charts\nand a small explanation box.\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/FlorianWilhelm/wald-stack-demo/main/assets/images/lightdash-dashboard.png\" alt=\"Dashboard with two charts in Lightdash\" width=\"800\" role=\"img\"\u003e\n\u003c/div\u003e\n\nThe workflow with Lightdash is that you mostly work with whatever IDE you like to create tables, metrics, dimensions within your dbt project.\nAfter you are happy with your changes just prepend `lightdash` before your `dbt` commands like `run`, `build`, `compile`. For instance, if you altered\nthe table [int_lab_times_years.sql], just run `lightdash dbt run -s int_lap_times_years` to update everything. In Lightdash you then hit\n \u003ckbd\u003e↻ Refresh dbt\u003c/kbd\u003e to load the changes.\n\n### Conclusion\n\nWe have seen the only surface of what's possible with the WALD stack using a simple example, but we did it end to end.\nThere is much more to discover and the dbt ecosystem is growing constantly. Many established tools also start to integrate\nwith it. For instance the data pipeline integration tool [dagster] also plays nicely with dbt as shown in the [dagster dbt integration] docs.\nIf you need with help with your WALD-stack or have general questions don't hesitate to consult us at [inovex].\n\n## What else is to see here?\n\nIn the `notebooks` directory, you'll find two notebooks that demonstrate how [dbt] as well as the\n[snowflake-connector-python] can also be directly used to execute queries for instance for debugging. In both cases\nthe subsystems of [dbt], and thus also the retrieval of the credentials, are used so that no credentials need to be\npassed.\n\n## Typical commands\n\n### dbt\n\n* **run all models**: `dbt run`\n* **run all tests**: `dbt test`\n* **executes snapshots**: `dbt snapshot`\n* **load seed csv-files**: `dbt seed`\n* **run + test + snapshot + seed in DAG order**: `dbt build`\n* **download dependencies**: `dbt dep`\n* **generate docs and lineage**: `dbt docs`\n\n### Lightdash\n\n* **restart**: `docker compose -f docker-compose.yml start`\n* **stop**: `docker compose -f docker-compose.yml stop -v`\n* **bring down and clean volumes**: `docker compose -f docker-compose.yml down -v`\n* **lightdash CLI**: `lightdash`\n\n## References \u0026 Resources\n\u003ca name=\"references--resources\"\u003e\u003c/a\u003e\n\nFollowing resources were used for this demonstration project besides the ones already mentioned:\n\n* [A Beginner’s Guide to DBT (data build tool)] by Jessica Le\n* [Snowpark for Python Blog Post] by Caleb Baechtold\n* [Overview Quickstart ML with Snowpark for Python] by Snowflake\n* [Advanced Quickstart ML with Snowpark for Python] by Snowflake\n* [Quickstart Data Engineering with Snowpark for Python and dbt] by Snowflake\n* [Upgrade to the Modern Analytics Stack: Doing More with Snowpark, dbt, and Python] by Ripu Jain and Anders Swanson\n* [dbt cheetsheet] by Bruno S. de Lima\n\n## Credits\n\nThe dbt, Snowpark part of this demonstration is heavily based on the [python-snowpark-formula1 repository] as well as\nthe awesome \"Advanced Analytics\" online workshop by [Hope Watson] from [dbt labs] held on January 25th, 2023. Check out\nthe similar tutorial [Generating ML-Ready Pipelines with dbt and Snowpark] by her.\n\n## ToDos\n\n- [ ] Clean up the Python code especially in the ml part.\n\n[**A**irbyte]:https://airbyte.com/\n[Airbyte]:https://airbyte.com/\n[Google BigQuery]: https://cloud.google.com/bigquery\n[Snowflake]: https://www.snowflake.com/\n[Snowpark]: https://www.snowflake.com/snowpark/\n[**L**ightdash]: https://www.lightdash.com/\n[dbt]: https://www.getdbt.com/\n[**D**BT]: https://www.getdbt.com/\n[dbt Core]: https://github.com/dbt-labs/dbt-core\n[Tableau]: https://www.tableau.com/\n[Lightdash]: https://github.com/lightdash/lightdash\n[snowflake-connector-python]: https://github.com/snowflakedb/snowflake-connector-python\n[snowflake-snowpark-python]: https://github.com/snowflakedb/snowpark-python\n[Mambaforge]: https://github.com/conda-forge/miniforge#mambaforge\n[register a 30-day free trial Snowflake account]: https://trial.snowflake.com/?owner=SPN-PID-545753\n[Snowflake's TPC-H sample database]: https://docs.snowflake.com/en/user-guide/sample-data-tpch.html\n[log into Snowflake's Snowsight UI]: https://app.snowflake.com/\n[activate Snowpark and third-party packages]: https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html\n[A Beginner’s Guide to DBT (data build tool)]: https://pttljessy.medium.com/a-beginners-guide-to-dbt-data-build-tool-part-4-dbt-automation-test-and-templating-3656114a4d8d\n[Upgrade to the Modern Analytics Stack: Doing More with Snowpark, dbt, and Python]: https://www.snowflake.com/blog/modern-analytics-stack-snowpark-dbt-python/\n[docker]: https://www.docker.com/\n[local deployment instructions]: https://docs.lightdash.com/get-started/setup-lightdash/install-lightdash/#deploy-locally-with-our-installation-script\n[dbt cheetsheet]: https://github.com/bruno-szdl/cheatsheets/blob/main/dbt_cheat_sheet.pdf\n[Anaconda]: https://www.anaconda.com/products/distribution\n[Python]: https://www.python.org/\n[weather data from opendatasoft]: https://public.opendatasoft.com/explore/dataset/noaa-daily-weather-data/\n[Kafka]: https://kafka.apache.org/\n[dbt's seed]: https://docs.getdbt.com/docs/build/seeds\n[Snowpark for Python Blog Post]: https://medium.com/snowflake/operationalizing-snowpark-python-part-one-892fcb3abba1\n[Overview Quickstart ML with Snowpark for Python]: https://quickstarts.snowflake.com/guide/getting_started_snowpark_machine_learning/\n[Advanced Quickstart ML with Snowpark for Python]: https://quickstarts.snowflake.com/guide/machine_learning_with_snowpark_python\n[Quickstart Data Engineering with Snowpark for Python and dbt]: https://quickstarts.snowflake.com/guide/data_engineering_with_snowpark_python_and_dbt\n[Kaggle Formula 1 World Championship dataset]: https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2020\n[python-snowpark-formula1 repository]: https://github.com/dbt-labs/python-snowpark-formula1/\n[Hope Watson]: https://www.linkedin.com/in/hopewatson/\n[dbt labs]: https://www.getdbt.com/\n[ingest_formula1_from_s3_to_snowflake.sql]: ./setup_scripts/ingest_formula1_from_s3_to_snowflake.sql\n[Scikit-Learn]: https://scikit-learn.org/\n[models/marts/aggregates]: ./models/marts/aggregates/\n[models/marts/ml]: ./models/marts/ml/\n[how-to-install-the-lightdash-cli]: https://docs.lightdash.com/guides/cli/how-to-install-the-lightdash-cli\n[int_lab_times_years.sql]: ./models/intermediate/int_lap_times_years.sql\n[pre-commit]: https://pre-commit.com/\n[weather-analsysis]: ./models/marts/core/weather_analysis.sql\n[dagster]: https://dagster.io/\n[dagster dbt integration]: https://docs.dagster.io/integrations/dbt\n[inovex]: https://www.inovex.de/en/\n[Colima]: https://github.com/abiosoft/colima\n[slides]: assets/slides/inovex-wald-stack-pycon-pydata-small.pdf\n[PyConDE / PyData talk about the WALD Stack]: https://pretalx.com/pyconde-pydata-berlin-2023/talk/TP7ABB/\n[Generating ML-Ready Pipelines with dbt and Snowpark]: https://www.youtube.com/watch?v=K9nVAaLTAIM\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflorianwilhelm%2Fwald-stack-demo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflorianwilhelm%2Fwald-stack-demo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflorianwilhelm%2Fwald-stack-demo/lists"}