{"id":13878107,"url":"https://github.com/ankane/active_hll","last_synced_at":"2025-11-17T14:14:45.940Z","repository":{"id":65467912,"uuid":"592901003","full_name":"ankane/active_hll","owner":"ankane","description":"HyperLogLog for Rails and Postgres","archived":false,"fork":false,"pushed_at":"2025-10-22T05:17:29.000Z","size":53,"stargazers_count":83,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-10-22T06:09:08.576Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ankane.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-01-24T19:24:38.000Z","updated_at":"2025-10-22T05:17:33.000Z","dependencies_parsed_at":"2023-02-16T03:30:52.317Z","dependency_job_id":"1d90e314-b824-4024-8d49-8813f49c946c","html_url":"https://github.com/ankane/active_hll","commit_stats":{"total_commits":25,"total_committers":1,"mean_commits":25.0,"dds":0.0,"last_synced_commit":"05c86a950b8473147f5b58ec8b9d7e8a4bd80bbc"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/ankane/active_hll","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Factive_hll","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Factive_hll/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Factive_hll/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Factive_hll/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ankane","download_url":"https://codeload.github.com/ankane/active_hll/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Factive_hll/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":284894595,"owners_count":27080732,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-17T02:00:06.431Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-06T08:01:39.990Z","updated_at":"2025-11-17T14:14:45.935Z","avatar_url":"https://github.com/ankane.png","language":"Ruby","funding_links":[],"categories":["Ruby"],"sub_categories":[],"readme":"# Active HLL\n\n:fire: HyperLogLog for Rails and Postgres\n\nFor fast, approximate count-distinct queries\n\n[![Build Status](https://github.com/ankane/active_hll/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/active_hll/actions)\n\n## Installation\n\nFirst, install the [hll extension](https://github.com/citusdata/postgresql-hll) on your database server:\n\n```sh\ncd /tmp\ncurl -L https://github.com/citusdata/postgresql-hll/archive/refs/tags/v2.19.tar.gz | tar xz\ncd postgresql-hll-2.19\nmake\nmake install # may need sudo\n```\n\nThen add this line to your application’s Gemfile:\n\n```ruby\ngem \"active_hll\"\n```\n\nAnd run:\n\n```sh\nbundle install\nrails generate active_hll:install\nrails db:migrate\n```\n\n## Getting Started\n\nHLLs provide an approximate count of unique values (like unique visitors). By rolling up data by day, you can quickly get an approximate count over any date range.\n\nCreate a table with an `hll` column\n\n```ruby\nclass CreateEventRollups \u003c ActiveRecord::Migration[8.1]\n  def change\n    create_table :event_rollups do |t|\n      t.date :time_bucket, index: {unique: true}\n      t.hll :visitor_ids\n    end\n  end\nend\n```\n\nYou can use [batch](#batch) and [stream](#stream) approaches to build HLLs\n\n### Batch\n\nTo generate HLLs from existing data, use the `hll_agg` method\n\n```ruby\nhlls = Event.group_by_day(:created_at).hll_agg(:visitor_id)\n```\n\n\u003e Install [Groupdate](https://github.com/ankane/groupdate) to use the `group_by_day` method\n\nAnd store the result\n\n```ruby\nEventRollup.upsert_all(\n  hlls.map { |k, v| {time_bucket: k, visitor_ids: v} },\n  unique_by: [:time_bucket]\n)\n```\n\nFor a large number of HLLs, use SQL to generate and upsert in a single statement\n\n### Stream\n\nTo add new data to HLLs, use the `hll_add` method\n\n```ruby\nEventRollup.where(time_bucket: Date.current).hll_add(visitor_ids: [\"visitor1\", \"visitor2\"])\n```\n\nor the `hll_upsert` method (experimental)\n\n```ruby\nEventRollup.hll_upsert({time_bucket: Date.current, visitor_ids: [\"visitor1\", \"visitor2\"]})\n```\n\n## Querying\n\nGet approximate unique values for a time range\n\n```ruby\nEventRollup.where(time_bucket: 30.days.ago.to_date..Date.current).hll_count(:visitor_ids)\n```\n\nGet approximate unique values by time bucket\n\n```ruby\nEventRollup.group(:time_bucket).hll_count(:visitor_ids)\n```\n\nGet approximate unique values by month\n\n```ruby\nEventRollup.group_by_month(:time_bucket, time_zone: false).hll_count(:visitor_ids)\n```\n\nGet the union of multiple HLLs\n\n```ruby\nEventRollup.hll_union(:visitor_ids)\n```\n\n## Data Protection\n\nCardinality estimators like HyperLogLog do not [preserve privacy](https://arxiv.org/pdf/1808.05879.pdf), so protect `hll` columns the same as you would the raw data.\n\nFor instance, you can check membership with a good probability with:\n\n```sql\nSELECT\n    time_bucket,\n    visitor_ids = visitor_ids || hll_hash_text('visitor1') AS likely_member\nFROM\n    event_rollups;\n```\n\n## Data Retention\n\nData should only be retained for as long as it’s needed. Delete older data with:\n\n```ruby\nEventRollup.where(\"time_bucket \u003c ?\", 2.years.ago).delete_all\n```\n\nThere’s not a way to remove data from an HLL, so to delete data for a specific user, delete the underlying data and recalculate the rollup.\n\n## Hosted Postgres\n\nThe `hll` extension is available on a number of [hosted providers](https://github.com/ankane/active_hll/issues/4).\n\n## History\n\nView the [changelog](CHANGELOG.md)\n\n## Contributing\n\nEveryone is encouraged to help improve this project. Here are a few ways you can help:\n\n- [Report bugs](https://github.com/ankane/active_hll/issues)\n- Fix bugs and [submit pull requests](https://github.com/ankane/active_hll/pulls)\n- Write, clarify, or fix documentation\n- Suggest or add new features\n\nTo get started with development:\n\n```sh\ngit clone https://github.com/ankane/active_hll.git\ncd active_hll\nbundle install\nbundle exec rake test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Factive_hll","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fankane%2Factive_hll","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Factive_hll/lists"}