{"id":13513658,"url":"https://github.com/citusdata/citus-example-ad-analytics","last_synced_at":"2025-04-15T19:00:04.034Z","repository":{"id":8915308,"uuid":"59712239","full_name":"citusdata/citus-example-ad-analytics","owner":"citusdata","description":"Reference App for Ad Analytics, using Ruby on Rails.","archived":false,"fork":false,"pushed_at":"2023-06-01T09:06:52.000Z","size":148,"stargazers_count":80,"open_issues_count":4,"forks_count":21,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-04-15T18:59:54.974Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"CSS","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/citusdata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-05-26T02:10:53.000Z","updated_at":"2025-03-23T06:48:09.000Z","dependencies_parsed_at":"2024-11-01T17:30:40.446Z","dependency_job_id":"eab3cd98-4309-44cd-a737-bc260d39368f","html_url":"https://github.com/citusdata/citus-example-ad-analytics","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/citusdata%2Fcitus-example-ad-analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/citusdata%2Fcitus-example-ad-analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/citusdata%2Fcitus-example-ad-analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/citusdata%2Fcitus-example-ad-analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/citusdata","download_url":"https://codeload.github.com/citusdata/citus-example-ad-analytics/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249135822,"owners_count":21218365,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T05:00:34.080Z","updated_at":"2025-04-15T19:00:04.000Z","avatar_url":"https://github.com/citusdata.png","language":"CSS","readme":"## Citus Example: Ad Analytics\n\nExample app that uses the distributed Citus database to provide a realtime ad analytics dashboard.\n\n## Deploying on Citus Cloud and Heroku\n\n1. Migrate the database locally: `rake db:migrate`\n2. Load the test data into the database: `rake test_data:load_bulk`\n3. Load the summary data into the database: `rake rollup:initial`\n4. Run the app locally: `rails s -b 0.0.0.0 -p 3000`\n\nAfter starting the data load task visit the app to see an example of what you can build with Citus.\n\nNote: If you get an error like \"could not find the source blob\" on Heroku deploy, just click the Deploy button again.\n\n## Screenshots\n\n\u003cimg src=\"http://cl.ly/0y430z3l122y/Screen%20Shot%202016-06-01%20at%206.15.02%20PM.png\" width=\"600\" /\u003e\n\n## Schema Diagram\n\n\u003cimg src=\"http://cl.ly/0n3G0Q453p1X/schema_diagram.png\" width=\"600\" /\u003e\n\nWe're distributing only the part of our dataset that we expect to take significant amounts of space, specifically\n`ads`, `clicks` and `impressions`.\n\nWe use `ad_id` as the common shard key for the hash distribution, in order to have data for a specific ad colocated on one shard.\n\n## Feature Highlight: Colocated Joins\n\n\u003e To join two large tables efficiently, it is advised that you distribute them on the same columns you used to join the tables. In this case, the Citus master knows which shards of the tables might match with shards of the other table by looking at the distribution column metadata. This allows Citus to prune away shard pairs which cannot produce matching join keys. The joins between remaining shard pairs are executed in parallel on the workers and then the results are returned to the master.\n\u003cbr\u003ehttps://www.citusdata.com/docs/citus/5.1/dist_tables/querying.html#colocated-joins\n\nIn this demo app we're showing co-located joins between the `ads` table and the `impressions` and `clicks` tables. One example query:\n\n```sql\nSELECT ads.campaign_id, COUNT(*)\n  FROM ads\n       JOIN impressions ON (ads.id = ad_id)\n WHERE ads.campaign_id IN (1,2,3,4,5,6,7,8,9,10,11)\n GROUP BY ads.campaign_id\n```\n\nAnother example co-located join we use is to find the data for the daily click-through-rate graph on e.g. http://citus-example-ad-analytics.herokuapp.com/campaigns/1 - which uses the roll-up tables and looks like this:\n\n\u003cimg src=\"http://cl.ly/1u1P2o0x1D2W/Screen%20Shot%202016-06-23%20at%205.49.26%20PM.png\" /\u003e\n\n```sql\nSELECT ads.name,\n           extract(epoch from idr.date) AS day,\n           CASE WHEN idr.count \u003e 0 THEN COALESCE(cdr.count, 0) / idr.count::float\n           ELSE NULL\n           END AS ctr\n      FROM ads\n           JOIN impression_daily_rollups idr ON (idr.ad_id = ads.id)\n           JOIN click_daily_rollups cdr ON (idr.ad_id = cdr.ad_id AND idr.date = cdr.date)\n     WHERE ads.campaign_id = 2 AND idr.date BETWEEN '2016-05-25 00:00:00 UTC' AND '2016-06-24 23:59:59 UTC'\n     ORDER BY 2\n```\n\n## Feature Highlight: Daily Rollups\n\nThis demo app also shows how to work with historic data effectively. Since our impressions/clicks data is append-only, we can make a few optimizations for all data that is older than the current day.\n\nSpecifically, we can roll-up the data into daily count values, so we avoid having to read the entire table when we want to find the total amount of clicks for a given ad or campaign.\n\n```\ncitus=\u003e \\d impression_daily_rollups\nTable \"public.impression_daily_rollups\"\n Column |  Type  | Modifiers\n--------+--------+-----------\n ad_id  | uuid   | not null\n count  | bigint | not null\n date   | date   | not null\nIndexes:\n    \"impression_daily_rollups_pkey\" PRIMARY KEY, btree (ad_id, date)\n\ncitus=\u003e \\d click_daily_rollups\nTable \"public.click_daily_rollups\"\n Column |  Type  | Modifiers\n--------+--------+-----------\n ad_id  | uuid   | not null\n count  | bigint | not null\n date   | date   | not null\nIndexes:\n    \"click_daily_rollups_pkey\" PRIMARY KEY, btree (ad_id, date)\n```\n\nYou can see the task that runs daily here: https://github.com/citusdata/citus-example-ad-analytics/blob/master/lib/tasks/rollup.rake#L24\n\n## Feature Highlight: Single-node transactions\n\nWith Citus you can use transactions in your code, as long as they only touch a single node. This can also be used to update multiple tables which are co-located.\n\nIn this app this is used to allow Rails' `counter_cache: true` and `touch: true` to update the parent record correctly.\n\nExample:\n\n```\nirb(main):003:0\u003e impression.destroy\nBEGIN\nDELETE FROM \"impressions\" WHERE \"impressions\".\"impression_id\" = 'fffff511-7012-4c5e-8431-5f97efd72926' AND \"impressions\".\"ad_id\" = '7fc94c84-f39f-4c7d-bf9e-bdbf5211a2f9'\nSELECT  \"ads\".* FROM \"ads\" WHERE \"ads\".\"id\" = '7fc94c84-f39f-4c7d-bf9e-bdbf5211a2f9' LIMIT 1\nUPDATE \"ads\" SET \"impressions_count\" = COALESCE(\"impressions_count\", 0) - 1 WHERE \"ads\".\"id\" = '7fc94c84-f39f-4c7d-bf9e-bdbf5211a2f9'\nUPDATE \"ads\" SET \"updated_at\" = '2016-07-22 23:52:59.667746' WHERE \"ads\".\"id\" = '7fc94c84-f39f-4c7d-bf9e-bdbf5211a2f9'\nCOMMIT\n```\n\n## Feature Highlight: BRIN indices to find recent data\n\nIn order to also include recent data into count values that are displayed, we're using a [BRIN index](https://www.postgresql.org/docs/9.5/static/brin-intro.html) on `impressions.seen_at` and `clicks.clicked_at` to quickly find the recent records which are not contained in the roll-up tables yet.\n\nYou can see an example query on the campaign index and detail pages, e.g.\n\n```sql\nSELECT ad_id, COUNT(*)\n         FROM ads\n         JOIN clicks ON (ads.id = ad_id)\n        WHERE ads.campaign_id = 1\n              AND clicked_at \u003e now()::date\n        GROUP BY ad_id\n```\n\nThe distributed EXPLAIN output for this query shows how it uses the lossy BRIN index to find the values on the worker nodes:\n\n```\n                                                                             QUERY PLAN                                                                             \n--------------------------------------------------------------------------------------------------------------------------------------------------------------------\n Distributed Query into pg_merge_job_6969\n   Executor: Real-Time\n   Task Count: 16\n   Tasks Shown: One of 16\n   -\u003e  Task\n         Node: host=ec2-52-1-243-13.compute-1.amazonaws.com port=5432 dbname=citus\n         -\u003e  HashAggregate  (cost=456.49..456.50 rows=1 width=16) (actual time=0.660..0.660 rows=0 loops=1)\n               Group Key: clicks.ad_id\n               -\u003e  Nested Loop  (cost=12.91..456.48 rows=2 width=16) (actual time=0.658..0.658 rows=0 loops=1)\n                     Join Filter: (ads.id = clicks.ad_id)\n                     Rows Removed by Join Filter: 970\n                     -\u003e  Seq Scan on ads_102137 ads  (cost=0.00..1.64 rows=1 width=16) (actual time=0.009..0.013 rows=1 loops=1)\n                           Filter: (campaign_id = 6)\n                           Rows Removed by Filter: 50\n                     -\u003e  Bitmap Heap Scan on clicks_102169 clicks  (cost=12.91..453.39 rows=116 width=16) (actual time=0.100..0.459 rows=970 loops=1)\n                           Recheck Cond: (clicked_at \u003e (now())::date)\n                           Rows Removed by Index Recheck: 48\n                           Heap Blocks: lossy=19\n                           -\u003e  Bitmap Index Scan on clicks_clicked_at_brin_102169  (cost=0.00..12.88 rows=116 width=0) (actual time=0.083..0.083 rows=1280 loops=1)\n                                 Index Cond: (clicked_at \u003e (now())::date)\n             Planning time: 0.484 ms\n             Execution time: 0.753 ms\n Master Query\n   -\u003e  HashAggregate  (cost=0.00..0.15 rows=10 width=0) (actual time=0.001..0.001 rows=0 loops=1)\n         Group Key: intermediate_column_6969_0\n         -\u003e  Seq Scan on pg_merge_job_6969  (cost=0.00..0.00 rows=0 width=0) (actual time=0.001..0.001 rows=0 loops=1)\n Planning time: 7.291 ms\n(28 rows)\n```\n\n## LICENSE\n\nCopyright (c) 2023, Citus Data Inc\n\nLicensed under the MIT license - feel free to incorporate the code in your own projects!\n","funding_links":[],"categories":["CSS"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcitusdata%2Fcitus-example-ad-analytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcitusdata%2Fcitus-example-ad-analytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcitusdata%2Fcitus-example-ad-analytics/lists"}