{"id":24087987,"url":"https://github.com/tinybirdco/mv_example","last_synced_at":"2025-10-26T21:02:55.653Z","repository":{"id":113972959,"uuid":"451783789","full_name":"tinybirdco/MV_example","owner":"tinybirdco","description":null,"archived":false,"fork":false,"pushed_at":"2022-01-25T07:54:04.000Z","size":105,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-26T18:43:58.641Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tinybirdco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-25T07:49:49.000Z","updated_at":"2022-01-25T07:49:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"2e77407a-70ac-446c-854f-6d7df682ede0","html_url":"https://github.com/tinybirdco/MV_example","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tinybirdco/MV_example","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2FMV_example","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2FMV_example/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2FMV_example/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2FMV_example/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tinybirdco","download_url":"https://codeload.github.com/tinybirdco/MV_example/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2FMV_example/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281171978,"owners_count":26455541,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-26T02:00:06.575Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-10T03:56:52.772Z","updated_at":"2025-10-26T21:02:55.624Z","avatar_url":"https://github.com/tinybirdco.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# MVs example\n\nWe have created this repo to explain some things you can do with your data in Tinybird to boost the performance of your queries.\n\nIn this use case, we have supposed that the goal is to compute the billing monthly, being able to filter by workspace_id, user_id and date.\n\nSince we don't have a solid knowledge of the data and its meaning, that's what we have guessed:\n- We have deduced that the billing can be calculated using the column *totalBalance*\n- We have to work just with the columns: *workspaceId*, *userId*, *ts* and *totalBalance*\n\n## Approaches\n\nWhen dealing with a huge amount of data, you need to take advantages of some tools, such as materialized views (MV). Using them will help you to get a much better performance of your queries.\nYou can read more about materialized views and how to use them in these guides:\n- [Create Materialized Views with Transformation Pipes](https://guides.tinybird.co/guide/materialized-views)\n- [Calculating data on ingestion with Materialized Views](https://guides.tinybird.co/guide/materialized-columns)\n\nWe have defined two different approaches to solve this use case so that you can see which one fits better with your needs.\n\n#### 1. Materializing the monthly billing using SummingMergeTree\n\nIn this first approach we will use the MV to save the data already pre-computed per month.\n\nTo do that, we have created the pipe *mv_grouped_data_sum*:\n```\nNODE retrieve\nSQL \u003e\n\n    SELECT\n        workspaceId,\n        userId,\n        toStartOfMonth(ts) AS month,\n        sum(totalBalance) AS month_balance\n    FROM usage_1\n    GROUP BY\n        workspaceId,\n        userId,\n        month\n\nTYPE materialized\nDATASOURCE ds_mv_grouped_data_sum\n```\n\n\nThat way, we will be saving the data in the datasource *ds_mv_grouped_data_sum*, whose schema will be:\n\n```\nSCHEMA \u003e\n    `workspaceId` String,\n    `userId` String,\n    `month` Date,\n    `month_balance` Nullable(Int64)\n\nENGINE \"SummingMergeTree\"\nENGINE_PARTITION_KEY \"toYYYYMM(month)\"\nENGINE_SORTING_KEY \"workspaceId, userId, month\"\n```\nSince the data is stored already pre-computed, the queries performed over this data will be much faster.\n\nTo retrieve the monthly billing for an specific workspace we just have to use the pipe called *api_grouped_data_sum* where:\n- *workspaceId* and *month* are parameters\n- you have the possibility of filtering by *userId*\n\n```\n%\nSELECT\n    workspaceId,\n    {% if defined(userId) %}\n        userId,\n    {% end %}\n    month,\n    sum(month_balance) as month_balance\nFROM ds_mv_grouped_data_sum\nwhere workspaceId = {{String(workspace_id, '00015f9b-6ca4-4c17-b42a-4c63b31eb78c')}}\nand month = {{String(month_selected, '2021-12-01')}}\n{% if defined(userId) %}\n    and userId = {{String(user_id, 'UfTzY4mFAbSrTNDEN0gvXBVSXjz1')}}\n{% end %}\nGROUP BY\n    workspaceId,\n    {% if defined(userId) %}\n        userId,\n    {% end %}\n    month\n```\n\nThe Data Flow defined for this case is the following:\n\n![data flow sum month](img/month_sum.png)\n\u003e Data Flow created to compute the monthly billing using SummingMergeTree\n\n#### 2. Materializing the daily billing using SummingMergeTree\n\nIn this second approach we will use the MV to save the data pre-computed per day.\nThis approach is a more flexible one, since you would be able to consult the information by day, by a selected range of dates or by month.\n\nTo do that, we have created the pipe *mv_grouped_data_sum_daily*:\n```\nNODE retrieve\nSQL \u003e\n\n    SELECT\n        workspaceId,\n        userId,\n        toDate(ts) AS day,\n        sum(totalBalance) AS daily_balance\n    FROM usage_1\n    GROUP BY\n        workspaceId,\n        userId,\n        day\n\nTYPE materialized\nDATASOURCE ds_mv_grouped_data_sum_daily\n```\n\nThat way, we will be saving the data in the datasource *ds_mv_grouped_data_sum_daily*, whose schema will be:\n\n```\nSCHEMA \u003e\n    `workspaceId` String,\n    `userId` String,\n    `day` Date,\n    `daily_balance` Nullable(Int64)\n\nENGINE \"SummingMergeTree\"\nENGINE_PARTITION_KEY \"toYYYYMM(day)\"\nENGINE_SORTING_KEY \"workspaceId, userId, day\"\n```\n\nSince the data is stored pre-computed by day, if you want to compute the billing for a range of dates or per month, the last calculation will be made \"on-the-fly\". Therefore, it's expected to be a bit slower than the approach 1; that's the trade-off for a more flexible approach (but don't worry because we are talking about ms!)\n\nTo retrieve the daily billing for an specific workspace we just have to use the pipe called *api_grouped_data_sum_daily*, where *workspaceId* and *day* are parameters. Additionaly, you have the possibility of filtering by *userId*:\n\n```\n%\nNODE daily_bill_per_WS\nSQL \u003e\n\n    %\n    SELECT\n        workspaceId,\n        {% if defined(userId) %}\n            userId,\n        {% end %}\n        day,\n        sum(daily_balance) as daily_balance\n    FROM ds_mv_grouped_data_sum_daily\n    where workspaceId = {{String(workspace_id, '00015f9b-6ca4-4c17-b42a-4c63b31eb78c')}}\n    and day = {{String(month_selected, '2021-12-15')}}\n    {% if defined(userId) %}\n        and userId = {{String(user_id, 'UfTzY4mFAbSrTNDEN0gvXBVSXjz1')}}\n    {% end %}\n    GROUP BY\n        workspaceId,\n        {% if defined(userId) %}\n            userId,\n        {% end %}\n        day\n```\n\nIn case we want to retrieve the billing for an specific workspace in a range of time, we have to use the pipe called *api_grouped_data_sum_range*, where *workspaceId* and *day* are parameters:\n\n```\nNODE range_bill_per_WS\nSQL \u003e\n\n    %\n    SELECT\n        workspaceId,\n        {% if defined(userId) %}\n            userId,\n        {% end %}\n        sum(daily_balance) as daily_balance\n    FROM ds_mv_grouped_data_sum_daily\n    where workspaceId = {{String(workspace_id, '002a29aa-8ae1-4bde-9835-89b62ccf9558')}}\n    and day between {{String(day_selected, '2021-12-05')}} AND {{String(day_selected, '2021-12-15')}}\n    {% if defined(userId) %}\n        and userId = {{String(user_id, 'xn10QBBYPLRgvoelDITwL0f0Dw13')}}\n    {% end %}\n    GROUP BY\n        workspaceId\n        {% if defined(userId) %}\n            ,\n            userId\n        {% end %}\n```\n\nIf we want to retrieve the monthly billing for an specific workspace, we will have to use the pipe called *api_grouped_data_sum_monthly*, where *workspaceId* and *day* are parameters:\n\n```\n%\nSELECT\n    workspaceId,\n    {% if defined(userId) %}\n        userId,\n    {% end %}\n    toStartOfMonth(day) as month,\n    sum(daily_balance) as monthly_balance\nFROM ds_mv_grouped_data_sum_daily\nwhere workspaceId = {{String(workspace_id, '002a29aa-8ae1-4bde-9835-89b62ccf9558')}}\nand month = {{String(month_selected, '2021-12-01')}}\n{% if defined(userId) %}\n    and userId = {{String(user_id, 'xn10QBBYPLRgvoelDITwL0f0Dw13')}}\n{% end %}\nGROUP BY\n    workspaceId,\n    {% if defined(userId) %}\n        userId\n    {% end %}\n    month\n```\n\nThe Data Flow defined for this case is the following:\n\n![data flow sum daily](img/day_sum.png)\n\u003e Data Flow created to compute the daily, monthly and for an specific range billing using SummingMergeTree\n\n## Creating API endpoints\n\nAll the pipes to retrieve data that have been just explained can be easily transformed to API endpoints to consume the data directly.\n\nYou can read more about this [in this guide](https://guides.tinybird.co/guide/intro-to-exposing-data)\n\n## How to reproduce the example\n\nTo reproduce the example in your workspace you have to download all the files contained in the folders datasources, pipes and endpoints of this repo and upload all of them to Tinybird.\n\nIMPORTANT! *We have created all these files based on the schema of the datasource \"usage_1\" you have in the workspace at the moment. If you make some changes in that schema, these files should be updated too according to the changes.\nLet us know in case this happens and you need any kind of help*.\n\nYou can do that following these steps:\n\n1. Clone this repo to a local folder\n2. Install [Tinybird CLI](https://docs.tinybird.co/cli.html)\n3. Go to the folder that contains the files:\n```\n    cd tb-project/\n```\n4. Login to your workspace by running in the CLI the following command and providing your token (you can find it in the token section of the corresponding workspace):\n```\n    tb auth --region us-east\n```\n5. Upload all the files by using the commands:\n\n```\n    tb push --push-deps\n```\n\nOnce you upload all these files to you workspace, you will have a Data Flow similar to the following:\n![data flow created](img/data_flow.png)\n\u003e Data Flow created\n\n## Additional comments\n\nAs explained in the previous section, this example has been created so that it can work in your workspace without requiring any changes.\n\nHowever, we have detected some possible improvements that can be applied in the definition of the *usage_1* datasource.\n\nFirstly, we recommend you to use the following datasource schema:\n```\nSCHEMA \u003e\n    `documentId` String `json:$.documentId`,\n    `id` String `json:$.id`,\n    `inputRiskLevel` Int16 `json:$.inputRiskLevel`,\n    `inputWordCount` Int16 `json:$.inputWordCount`,\n    `outputWordCount` Int16 `json:$.outputWordCount`,\n    `projectId` String `json:$.projectId`,\n    `requestId` String `json:$.requestId`,\n    `skillId` String `json:$.skillId`,\n    `translation` String `json:$.translation`,\n    `ts` DateTime `json:$.ts`,\n    `userId` String `json:$.userId`,\n    `workspaceId` String `json:$.workspaceId`,\n    `nValue` Int16 `json:$.nValue`,\n    `planBalance` Nullable(Int32) `json:$.planBalance`,\n    `planPaused` Nullable(UInt8) `json:$.planPaused`,\n    `planUnlimited` Nullable(UInt8) `json:$.planUnlimited`,\n    `totalBalance` Nullable(Int32) `json:$.totalBalance`,\n    `trialing` Nullable(UInt8) `json:$.trialing`,\n    `deviceToken` Nullable(String) `json:$.deviceToken`\n\nENGINE \"MergeTree\"\nENGINE_PARTITION_KEY \"toYYYYMM(ts)\"\nENGINE_SORTING_KEY \"workspaceId, userId, ts\"\n```\n\nAs you can see, we have changed:\n- ENGINE_PARTITION_KEY: related to how data is stored.\n- ENGINE_SORTING_KEY: it is really important and it should be aligned with your use case. If you are filtering by workspaceId, userId and date, all these columns should be indexes, and therefore, there should be set as engine_sorting_key. This change would improve significantly the queries performance.\n\nAdditionally, you can analyze more in depth the content of each columns and try to change from String to LowCardinalty if the cardinality of the columns is not high (for instance, *translation*). This kind of change improves the memory needed to save and retrieve the data.\n\nThe definition of the datasource schema is currently just possible using the CLI.\nTo do that, you will have to upload it by running the command:\n```\n    tb push datasource/usage_1.datasource\n```\n\nAfter that, you will have to append data to it, following the process you have defined at the moment.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinybirdco%2Fmv_example","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftinybirdco%2Fmv_example","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinybirdco%2Fmv_example/lists"}