{"id":21514932,"url":"https://github.com/getindata/dbt-intro","last_synced_at":"2026-03-19T20:41:51.063Z","repository":{"id":137895171,"uuid":"485665486","full_name":"getindata/dbt-intro","owner":"getindata","description":"Introductory repository to dbt with the use of data-pipelines-cli  Topics Resources","archived":false,"fork":false,"pushed_at":"2022-04-26T06:45:55.000Z","size":6,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-01-24T02:31:06.576Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/getindata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-26T06:44:45.000Z","updated_at":"2022-04-28T11:26:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"3ee012ce-1680-4d90-a36d-bd717d355dbe","html_url":"https://github.com/getindata/dbt-intro","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fdbt-intro","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fdbt-intro/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fdbt-intro/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fdbt-intro/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/getindata","download_url":"https://codeload.github.com/getindata/dbt-intro/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244066192,"owners_count":20392407,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-23T23:53:37.484Z","updated_at":"2026-01-04T02:06:11.292Z","avatar_url":"https://github.com/getindata.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"## dbt-resources\n\nThis section contains various resources, which will help you establish your dbt knowledge.\nIdeal for starting your new dbt adventure!\n\nList of contents:\n1. What is dbt and why companies are using it?\n  https://seattledataguy.substack.com/p/what-is-dbt-and-why-are-companies?s=r\n2. Hackernews discussion on dbt (January 2022)\n  https://news.ycombinator.com/item?id=29424445\n3. Up \u0026 Running: data pipeline with BigQuery and dbt\n  https://getindata.com/blog/up-running-data-pipeline-bigquery-dbt/\n4. Overview of testing options with dbt\n  https://datacoves.com/post/an-overview-of-testing-options-in-dbt-data-build-tool\n5. Integrating airflow and dbt\n  https://www.astronomer.io/guides/airflow-dbt/\n6. Auto-generating an Airflow DAG using the dbt manifest\n  https://engineering.autotrader.co.uk/2021/09/15/auto-generated-airflow-dag-for-dbt.html\n7. Creating dbt project on Windows\n  https://www.youtube.com/watch?v=5rNquRnNb4E\n8. 5 tips to improve your dbt project\n  https://www.youtube.com/watch?v=qOx8l_QFz9I\n9. Future of the modern data stack (December 2020)\n  https://blog.getdbt.com/future-of-the-modern-data-stack/\n10. dbt Official Documentation\n  https://docs.getdbt.com/docs/introduction\n\n\n\n# Exercise\n## Setting up environment\n1. Go to: https://console.cloud.google.com/vertex-ai/workbench/list/instances?project=dataops-demo-342817\n2. If you don't see a project or you see an error, click on project select button right to the Google Cloud Platform sign, type dataops-demo-342817 and select it.\n\n![Screenshot 2022-04-25 at 22 34 56](https://user-images.githubusercontent.com/77925576/165170378-c7ed628d-4f5c-4d30-be2c-0aaca3ae08a1.png)\n\n4. Click on New Notebook located in the topbar and then \"Customize...\"\n![Screenshot 2022-04-25 at 22 33 26](https://user-images.githubusercontent.com/77925576/165170160-a08af36a-d022-4c5d-b5cd-a181576a6f76.png)\n\n5. Type notebook name (preferrably your name). In environment section, choose Debian 10 and \"Custom container\" \n6. Provide link to the image: gcr.io/getindata-images-public/jupyterlab-dataops:bigquery-1.0.5\n![Screenshot 2022-04-25 at 22 42 09](https://user-images.githubusercontent.com/77925576/165171403-93633875-3f5c-429c-a40a-014a863cd10d.png)\n\n8. In machine configuration section, choose n1-standard-2 machine 2vCPU/7.5GB RAM (~0.074 USD / hour)\n9. Leave everything else on default.\n10. Create Jupyter notebook.\n11. Wait until it's configured and click on Open Jupyterlab\n\nYou can find full documentation of our GID Data Platform Tool on https://github.com/getindata/data-pipelines-cli and also https://data-pipelines-cli.readthedocs.io/en/latest/index.html\n\n## Inside the notebook with data-pipelines-cli\nYou are now inside managed Vertex AI Workbench instance, which will serve as our transformations development workflow. This image lets you open:\n- VSCode instance\n- CloudBeaver, open source SQL IDE\n- dbt docs\n- python3 interactive terminal\n\n1. Now, open a VSCode instance. At the top, click on explore and open a home directory so you can easily create new files and track changes to directories inside VSCode.\n  \n  \u003e-\u003e Tip: In the toolbar click on 'Explore' and then 'Open Folder'. Click OK. You should be located in JOVYAN directory.\n  \n![Screenshot 2022-04-25 at 22 59 10](https://user-images.githubusercontent.com/77925576/165173963-c2aaa4c9-d68b-4709-8ddf-1e1c63f79fe6.png)\n\n3. Open a new terminal instance.\n\n![Screenshot 2022-04-25 at 23 01 27](https://user-images.githubusercontent.com/77925576/165174292-ed5b1cc0-0516-40ec-89f9-aa6de7de833f.png)\n\n5. Browse to the work directory with `cd work` and execute command `dp init https://github.com/getindata/data-pipelines-cli-init-example`. This will initialize data-pipelines-cli in the environment. Provide any username when prompted. \n  \n  \u003e-\u003e Tip: when copy+pasting for the 1st time, you might be asked for permissions to access your clipboard by Chrome. Accept. \n\n6. Run `dp create .` This command will create a full data-pipelines-cli environment with dbt project as a core part of it. IMPORTANT: provide __dataops-demo-342817__ as a GCP project name.\n  \n  \u003e-\u003e Tip: when prompted, you can simply press ENTER to use default values. Don't use it for GCP Project ID!\n  \n  \u003e-\u003e Tip: use underscores _\n  \n  \u003e-\u003e Tip: Example of provided values\n  \n  ![Screenshot 2022-04-25 at 23 08 50](https://user-images.githubusercontent.com/77925576/165175393-660a9fec-9a07-4179-93bd-abd337f9d285.png)\n\n7. Run `git init`. Data-pipelines-cli is a tool tightly coupled with CI/CD so we need to initialize git repository. We won't use CI/CD in this exercise.\n8. Run these commands in following order:\n   `git add .`\n   `git config --global user.email \"you@example.com\"`\n   `git config --global user.name \"Your Name\"`\n   `git commit -m 'Initial'`\n9. Your environment is now ready to execute some dbt code!\n\n## Running dbt transformations\n1. Firstly, set up some seeds to load your static data to warehouse. You need to provide .yml file with a definition, and a .csv with actual data to be loaded in. Put them under `seeds` directory. You can make additional directories inside `seeds` for clarity.\n  \n  \u003e-\u003e Tip: you can find documentation on seeds on https://docs.getdbt.com/docs/building-a-dbt-project/seeds\n\n2. Next, setup data sources under `models` directory, as they will act as a starting point for you transformations. Lookup tables names in BigQuery under `raw_data` schema.\n  \n  \u003e-\u003e Tip: at any point of this tutorials, you can execute `dp seed`, `dp run` and `dp test` commands to see how your pipelines behave against the database.\n  \n  \u003e-\u003e Tip: execute `dp --help` to see a list of available commands\n\n3. Put tests in .yml files, based on patterns that you see in the data (please do that in real-life scenarios!). Look up for uniqueness and not_nulls in columns. \n  \n  \u003e-\u003e Tip: you can find documentation on tests on https://docs.getdbt.com/docs/building-a-dbt-project/tests\n\n4. Write your models inside `models` directory. You can make additional directories there - a good practice is to separate them based on schema names you wish to have. Put tests in .yml files.\n  \n  \u003e-\u003e Tip: you can find documentation on models on https://docs.getdbt.com/docs/building-a-dbt-project/building-models\n\n  \u003e-\u003e Tip: Ideas for transformations based on example data\n  \u003e        \n\u003e\u003e provide mapping between real country names and identifiers found in raw_mapping.country\n  \u003e\u003e        \n\u003e\u003e find out which country had most total sales \n  \u003e\u003e        \n\u003e\u003e provide a metric on monthly revenue by month\n5. Execute everything and look results in your personal schema.\n\n6. Enrich your seeds, sources and models with descriptions and additional tests f.e. with dbt-expectations plugin. https://github.com/calogica/dbt-expectations\n\n7. Run dp docs-serve in the terminal, and open dbt docs in new Vertex Workbench window. \n\n![Screenshot 2022-04-25 at 23 32 24](https://user-images.githubusercontent.com/77925576/165178605-707da95b-ebee-4e11-a495-ed27c3fb1c14.png)\n\n8. In dbt docs, look up 'Lineage Graph' to find DAG of your new project:\n\n![Screenshot 2022-04-25 at 23 33 45](https://user-images.githubusercontent.com/77925576/165178762-2d1a9222-8051-4a1e-9640-17ef2d77d02f.png)\n\n![Screenshot 2022-04-25 at 23 34 51](https://user-images.githubusercontent.com/77925576/165178936-88c02bf2-1e27-4615-92cf-2612a928a5cd.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetindata%2Fdbt-intro","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgetindata%2Fdbt-intro","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetindata%2Fdbt-intro/lists"}