{"id":28471001,"url":"https://github.com/datnguye/sqlmesh-demo","last_synced_at":"2025-07-01T20:32:22.071Z","repository":{"id":190748345,"uuid":"683277906","full_name":"datnguye/sqlmesh-demo","owner":"datnguye","description":"POC of sqlmesh with Jaffle Shop","archived":false,"fork":false,"pushed_at":"2023-09-05T03:24:26.000Z","size":399,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-07T10:07:26.350Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datnguye.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":"audits/assert_positive_order_ids.sql","citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-08-26T04:06:15.000Z","updated_at":"2024-05-13T06:44:29.000Z","dependencies_parsed_at":"2023-08-26T06:51:04.247Z","dependency_job_id":"8d28122a-7c36-4746-b257-6ae3db028580","html_url":"https://github.com/datnguye/sqlmesh-demo","commit_stats":null,"previous_names":["datnguye/sqlmesh-demo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/datnguye/sqlmesh-demo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datnguye%2Fsqlmesh-demo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datnguye%2Fsqlmesh-demo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datnguye%2Fsqlmesh-demo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datnguye%2Fsqlmesh-demo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datnguye","download_url":"https://codeload.github.com/datnguye/sqlmesh-demo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datnguye%2Fsqlmesh-demo/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263033079,"owners_count":23403090,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-07T10:07:26.876Z","updated_at":"2025-07-01T20:32:22.045Z","avatar_url":"https://github.com/datnguye.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sqlmesh-demo\n\nPOC of `sqlmesh` with Jaffle Shop data as an engineer trying stepping out from the `dbt` world!\n\nActually try to mimic [dbt Jaffle Shop project](https://github.com/dbt-labs/jaffle_shop) and get the first impression of this cool tool 🌟\n\n_Environment used_:\n\n- sqlmesh v0.30\n- postgres 15 as the gateway\n- Windows 11\n- VSCode with dbt extensions installed\n\n👉 **Quick final look of this POC:** 👈\n\n![CLL](assets/CLL-sqlmesh.png)\n\n## 1. Getting familiar with sqlmesh CLI\n\n```bash\n# 1. Create virtual environment \u0026 Activate it\npython -m venv .env\n.\\.env\\Scripts\\activate.bat\n\n\n# 2. Install sqlmesh and the dependencies if any\npip install -r requirements.txt # sqlmesh\nsqlmesh --version # 0.28.0\nsqlmesh --help\n    # Usage: sqlmesh [OPTIONS] COMMAND [ARGS]...\n\n    #   SQLMesh command line tool.\n\n    # Options:\n    #   --version          Show the version and exit.\n    #   -p, --paths TEXT   Path(s) to the SQLMesh config/project.\n    #   --config TEXT      Name of the config object. Only applicable to\n    #                      configuration defined using Python script.\n    #   --gateway TEXT     The name of the gateway.\n    #   --ignore-warnings  Ignore warnings.\n    #   --help             Show this message and exit.\n\n    # Commands:\n    #   audit                   Run audits.\n    #   create_external_models  Create a schema file containing external model...\n    #   dag                     Renders the dag using graphviz.\n    #   diff                    Show the diff between the current context and a...\n    #   evaluate                Evaluate a model and return a dataframe with a...\n    #   fetchdf                 Runs a sql query and displays the results.\n    #   format                  Format all models in a given directory.\n    #   ide                     Start a browser-based SQLMesh IDE.\n    #   info                    Print information about a SQLMesh project.\n    #   init                    Create a new SQLMesh repository.\n    #   invalidate              Invalidates the target environment, forcing its...\n    #   migrate                 Migrate SQLMesh to the current running version.\n    #   plan                    Plan a migration of the current context's...\n    #   prompt                  Uses LLM to generate a SQL query from a prompt.\n    #   render                  Renders a model's query, optionally expanding...\n    #   rollback                Rollback SQLMesh to the previous migration.\n    #   run                     Evaluates the DAG of models using the built-in...\n    #   table_diff              Show the diff between two tables.\n    #   test                    Run model unit tests.\n    #   ui                      Start a browser-based SQLMesh UI.\n\n# 3. Initialize the project skeleton with Postgres dialect, and do the very first runs\nsqlmesh init postgres\n    # (repo)\n    # ├───audits\n    # │       assert_positive_order_ids.sql\n    # ├───macros\n    # │       __init__.py\n    # ├───models\n    # │       full_model.sql\n    # │       incremental_model.sql\n    # │       seed_model.sql\n    # ├───seeds\n    # │       seed_data.csv\n    # └───tests\n    #         test_full_model.yaml\nsqlmesh info\n    # Models: 3\n    # Macros: 11\n    # Data warehouse connection succeeded\n    # Test connection succeeded\nsqlmesh plan [dev]\n    # ======================================================================\n    # Successfully Ran 1 tests against duckdb\n    # ----------------------------------------------------------------------\n    # New environment `prod` will be created from `prod`\n    # Summary of differences against `prod`:\n    # └── Added Models:\n    #     ├── sqlmesh_example.seed_model\n    #     ├── sqlmesh_example.incremental_model\n    #     └── sqlmesh_example.full_model\n    # Models needing backfill (missing dates):\n    # ├── sqlmesh_example.seed_model: 2023-08-25 - 2023-08-25\n    # ├── sqlmesh_example.incremental_model: 2020-01-01 - 2023-08-25\n    # └── sqlmesh_example.full_model: 2020-01-01 - 2023-08-25\n    # Apply - Backfill Tables [y/n]: y\n    # Creating new model versions ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 3/3 • 0:00:00\n    # All model versions have been created successfully\n    # Evaluating models ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 3/3 • 0:00:00\n    # All model batches have been executed successfully\n    # Virtually Updating 'prod' ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 0:00:00\n    # The target environment has been updated successfully\nsqlmesh run\n    # No models scheduled to run at this time.\nsqlmesh audit\n    # Found 1 audit(s).\n    # assert_positive_order_ids PASS.\n    # Finished with 0 audit errors and 0 audits skipped.\n    # Done.\nsqlmesh test\n    # .\n    # ----------------------------------------------------------------------\n    # Ran 1 test in 0.162s\n    # OK\n```\n\n**First impressions**:\n\n- So far so good, the installation tooks quite a bit long, especially when installed the `pandas` 🏃‍♂️\n- Lots of useful CLI commands 👍\n- New concept with `plan` \u0026 `apply`, and the virtual environment defaults to `prod` 👍\n- It creates DuckDB files by default and do transformation smoothly 🎉\n- Project skeleton looks similar to dbt, but not quite, there are new things such as: `audit`, only `config.yml` (not dealing with `dbt_project.yml` and `profiles.yml`) 🎉\n- Everything based `model` e.g. for a seed file we need to create a corresponding model.sql file, same for source files 👀\n- No model global config 👀\n- Each model has the individual config (kind, cron, grain) and the `SELECT` statement which are similar idea to dbt, but no Jinja syntax here! 🎉\n- Great Web IDE with data lineage 🎉\n- Oh wow! `sqlmesh prompt` command - LLM 🎉\n- No package ecosystem but it seems to utilize the python's eco one which is huge 💯\n\n**Struggling?!**:\n\n- How to run a specific model and debug the compiled sql code?\n  - `sqlmesh run` command only allow to run all stuff with a date range\n  - `sqlmesh plan` command seems to be the same. Oh! `sqlmesh plan --select-model \u003cmodel\u003e` will do\n  - `sqlmesh render \u003cmodel\u003e` command seems to be helpful to see the SQL compiled code\n  - `sqlmesh evaluate \u003cmodel\u003e` command is my goal here, voila!\n- How to generate the project documentation site and host it in Github Page? With `sqlmesh ui` command? It seems impossible as of v0.28?!\n- What are the steps we should perform in CI/CD? I will find out later!\n\n## 2. Model development and Testing\n\n- **Adding seeds \u0026 models**\n  - It is not so quick to add the seeds because I neeed to create the coresponding model files with explicitly specifying the list of columns and datatypes 😢\n  - `sqlmesh evaluate raw_customers` produces an error \"Cannot find snapshot for 'raw_customers'\" 😢\n    - Oh! I need to have the model name passed in! So, I should run `sqlmesh evaluate jf.raw_customers` ✅\n  - To see the compiled sql code, let's use `sqlmesh render jf.stg_customers` 👍\n  - Try to duplicate the model name within the model config, and `sqlmesh plan` will complain: `Error: Duplicate key 'jf.raw_customers' found in UniqueKeyDict\u003cmodels\u003e. Call dict.update(...) if this is intentional.` 👍\n  - Auto-completion works very nicely when coding the model 🎉\n  - A built-in linter with `sqlmesh format`, hmm...the result looks not great in my imagination, but still looks ok!👍\n  - No incremental run or full refresh run, it is to deal with date range or full load by default 👍\n  - Greate readability expo with zero Jinja code 🎉\n  - Not a great expo with CLI typing because the command is long, easy to wrongly type and hence takes time 👎, but it is fine with Web IDE 💕\n  - Aha! Semi-colonm is important bit here if it is not related to SQL e.g. model config, macros 👀\n  - [Loop or Control flow ops](https://sqlmesh.readthedocs.io/en/stable/concepts/macros/sqlmesh_macros/#control-flow-operators) is cool even it takes sometime to get familiar with 👍\n    - Found a limitation: `@EACH(@payment_methods, x -\u003e ... as @x_amount)` will fail but work like a charm when change to `@EACH(@payment_methods, x -\u003e ... as amount_@x)` ⚠️\n      - Let's write a sqlmesh macro for that ✅:\n\n      ```python\n      @macro()\n      def make_order_amounts(\n          evaluator,\n          payment_method_values=[],\n          column__payment_method: str = \"payment_method\",\n          column__amount: str = \"amount\",\n      ):\n          return [\n              f\"\"\"SUM(\n                  CASE \n                      WHEN {column__payment_method} = {item}\n                          THEN {column__amount}\n                      ELSE 0\n                  END\n              ) AS {item.name}_amount\n              \"\"\"\n              for item in payment_method_values.expressions\n          ]\n      ```\n\n      ```sql\n      @DEF(payment_methods, ARRAY['credit_card', 'coupon', 'bank_transfer', 'gift_card']);\n\n      select @make_order_amounts(@payment_methods)\n      ...\n      ```\n\n  - Seems that the CLI logs was not exposed somewhere - hard to debug when something wrong happened 🤔\n    - There we go! Let's set the env variable `SQLMESH_DEBUG=1` ✅\n  - In the model kind of `INCREMENTAL_BY_UNIQUE_KEY`, the `unique_key` config is a tuple e.g. `(key1, key2)`, if I made it as an array `[key1, key2]`, it would hang your `sqlmesh` command(s) ⚠️\n  - Plan \u0026 Apply in Postgress randomly have error message ❗, do it again with a success ⚠️\n\n    ```bash\n    Failed to create schema 'jf': duplicate key value violates unique constraint \"pg_namespace_nspname_index\"\n    DETAIL:  Key (nspname)=(jf) already exists.\n\n    Failed to create schema 'jf': duplicate key value violates unique constraint \"pg_namespace_nspname_index\"\n    DETAIL:  Key (nspname)=(jf) already exists.\n    ```\n\n    - This is to be dealing with the concurrency config (default to 4 tasks), definitely an issue with Postgres adapter, but we can workaround by set it to `1` ✅\n  - Take sometime to get familiar with new concept [Virtual Environment](https://tobikodata.com/virtual-data-environments.html). There are several schemas get created within `plan` operation including: your configured schema \u0026 the environment schemas 👍\n    - Each env schema will be physically as:\n      - Schema name is auto-prefixed by `sqlmesh__` e.g. `sqlmesh__jf` 👀\n      - Table name is auto-prefixed by the configured schema name, and auto-suffixed with a hash e.g. `jf__orders__1347386500` 👀\n  - Recommended to [join Slack community](https://tobikodata.com/slack), the team is very supportive when I asked questions🙏\n\n- **Adding audits and tests**:\n  - Audit is reusable - `@this_model` is represented to the attached model 🎉\n    - Let's get familiar with `audit` command e.g. `sqlmesh audit --model jf.orders` 🏃\n    - I need to add it to each model config 👀\n    - It is NOT fully reusable if the column name is vary ⚠️\n      - NO! Actually it allows to parameterize the audit with variable e.g. `@column is null` and then in the model config using `audits [assert_not_null(column=order_id)]`\n    - Similar idea to dbt singular test - write the sql to fail the case 👍\n      - I can join with other models if needed ✅\n    - Modifying an audit requires to apply a plan first ❓\n    - Quite easy to create the similar dbt generic tests in the audit: `not null`, `unique`, `accepted values`, `relationships` 👍\n      - Actually they have those [buit-in audits here](https://sqlmesh.readthedocs.io/en/stable/concepts/audits/#built-in-audits) 🎉\n    - The log output of audit seems not to mention the attached model ❓\n\n      ```log\n      (.env) C:\\Users\\DAT\\Documents\\Sources\\sqlmesh-demo\u003esqlmesh audit                  \n      Found 21 audit(s).\n      assert_positive_order_ids PASS.\n      assert_not_null PASS.\n      assert_unique PASS.\n      assert_not_null PASS.\n      assert_not_null PASS.\n      assert_not_null PASS.\n      assert_not_null PASS.\n      assert_not_null PASS.\n      assert_not_null PASS.\n      assert_not_null PASS.\n      assert_accepted_values PASS.\n      assert_unique PASS.\n      assert_relationships PASS.\n      assert_not_null PASS.\n      assert_unique PASS.\n      assert_not_null PASS.\n      assert_unique PASS.\n      assert_accepted_values PASS.\n      assert_not_null PASS.\n      assert_unique PASS.\n      assert_accepted_values PASS.\n\n      Finished with 0 audit errors and 0 audits skipped.\n      Done.\n      ```\n\n  - Test is really `unit test` (with `duckdb` dialect) 🎉\n    - I need to add yml file(s) to the `(repo)/tests` directory 👀\n    - Might take time to implement because it requires to manually fake the data: both input and output ⚠️\n    - Let's get familiar with `test` command e.g. `sqlmesh test -k test_full` 🏃\n    - It has a risk of new SQL Syntax (in the modern DWH) which might be not supported in DuckDB ⚠️\n\n- **Additional stuff**:\n  - DRY with common functions\n    - Let's try [Python macro](https://sqlmesh.readthedocs.io/en/stable/concepts/macros/sqlmesh_macros/#python-macro-functions) 👀\n      - So far it's pefect 👍 until when trying with passing List arguments -- it is just hanging ⚠️\n        - Oh! The docs is out of date, the team has advised the correct syntax ✅:\n\n        ```python\n        @macro()\n        def make_indicators(evaluator, string_column, string_values):\n            return [\n                f\"CASE WHEN {string_column} = {value} THEN {value} ELSE NULL END AS {string_column.name}_{value.name}\"\n                for value in string_values.expressions\n            ]\n        ```\n\n      - The syntax takes for a while to get familiar with (same expo when I started writting jinja) but the readability is better ✅\n  - Column Level Security (aka Masking Policy), for example, in Snowflake ❓\n    - Let's try [Pre/Post Statement](https://sqlmesh.readthedocs.io/en/stable/concepts/models/seed_models/#pre-and-post-statements) or [Statement](https://sqlmesh.readthedocs.io/en/stable/concepts/models/overview/#statements), better to get understanding of [Model Concept](https://sqlmesh.readthedocs.io/en/stable/concepts/models/sql_models/#model-ddl) first 👀\n      - Voila I successfully managed it with sqlmesh macro + pre/post statement ✅\n      - Here is a sample:\n        - `(repo)/macros/snow-mask-ddl/schema_name.masking_func_name.sql` -- multiple masking funcs created in the `snow-mask-ddl` dir\n\n        ```sql\n        CREATE MASKING POLICY IF NOT EXISTS @schema.mp_customer_name AS (\n            masked_column string\n        ) RETURNS string -\u003e\n            CASE \n                WHEN CURRENT_ROLE() IN ('ANALYST') THEN masked_column\n                    WHEN CURRENT_ROLE() IN ('SYSADMIN') THEN SHA2(masked_column)\n            ELSE '**********'\n        END;\n        ```\n\n        - 2 main macros to create\u0026apply the func:\n\n        ```python\n        import os\n        from sqlmesh import macro\n        @macro()\n        def apply_masking_policy(evaluator, column: str, func: str, params: str):\n            param_list = str(params).split(\"|\")\n            return \"\"\"\n                INSERT INTO {table}(id) VALUES ('{value}') --faking sql here\n                \"\"\".format(\n                table=str(func), value=f\"{column}-applied-{func}-with-{','.join(param_list)}\"\n            )\n        @macro()\n        def create_masking_policy(evaluator, func: str):\n            ddl_dir = os.path.dirname(os.path.realpath(__file__))\n            ddl_file = f\"{ddl_dir}/snow-mask-ddl/{func}.sql\"\n            func_parts = str(func).split(\".\")\n            assert len(func_parts) == 2\n\n            schema = func_parts[0]\n            with open(ddl_file, \"r\") as file:\n                content = file.read()\n                return content.replace(\"@schema\", schema)\n        ```\n\n        - And, use it the model\n\n        ```sql\n        MODEL (\n          name jf.customers,\n          ...\n        );\n\n        @create_masking_policy(common.mp_customer_name);\n\n        /model sql code here/;\n\n        @apply_masking_policy(last_name, common.mp_customer_name, first_name | last_name)\n        ```\n\n  - Metadata analysis\n    - State gets stored into the gateway database under the schema named `sqlmesh` by default 👍\n      - Snapshot gets stored in `_snapshots` table, one row per model \u0026 version 👍\n      - Seed gets stored in `_seeds` table, contains all seed data in a column 👀 -- definitely will have some limitation of size\n      - Model schedule gets stored in `_intervals` table 👀\n      - Run time per model or per run -- cannot find the info ❓\n  \n  - SqlMesh's Macros in another python package -- seems not support ❓\n  - Column Level Lineage, yes it is available int `docs` site by using `sqlmesh ui` command 🎉\n\n## 3. Setup CI/CD\n\nCICD Bot is available for Github PR, let's try [getting started](https://sqlmesh.readthedocs.io/en/stable/integrations/github/#getting-started) 🚀\n\nCurrently only Github Actions is supported ⚠️\n\n- Only 1 job for CICD Bot defined in the workflow, when triggered, the bot will run 4 additional jobs:\n  - SQLMesh Unit Test\n    - Behind the scenes, it is the `sqlmesh test` ✅\n  - SQLMesh Has Required Approved\n    - If your pipeline has approval configs, it will require an approval before doing data gapless deployments to production ✅\n  - SQLMesh PR Environment Synced\n    - Behind the scenes, it creates/updates the PR Environment `sqlmesh_demo_{pull_request_number}` ✅\n    - Not sure what is the command behind, guessing `sqlmesh plan sqlmesh_demo_{pull_request_number} --auto-apply` 👀\n  - SQLMesh Prod Environment Synced\n    - Do the deployment of production, recommended to [add approval seeting](https://sqlmesh.readthedocs.io/en/stable/integrations/github/#enforce-that-certain-reviewers-have-approved-of-the-pr-before-it-can-be-merged) 👍\n    - Required 2 bits: ✅\n      - Approval (optional)\n      - SQLMesh PR Environment Synced done successfully\n\n👉 Sample [PR's pipeline](https://github.com/datnguye/sqlmesh-demo/actions/runs/6071256412) check here\n\n**Happy Engineering** 🎉\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatnguye%2Fsqlmesh-demo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatnguye%2Fsqlmesh-demo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatnguye%2Fsqlmesh-demo/lists"}