{"id":15685340,"url":"https://github.com/ankane/datasketches-ruby","last_synced_at":"2025-11-17T14:15:26.555Z","repository":{"id":45499149,"uuid":"331169567","full_name":"ankane/datasketches-ruby","owner":"ankane","description":"Sketch data structures for Ruby","archived":false,"fork":false,"pushed_at":"2025-04-03T22:26:15.000Z","size":83,"stargazers_count":12,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-07T17:13:15.963Z","etag":null,"topics":["hyperloglog"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ankane.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-20T02:28:36.000Z","updated_at":"2025-04-03T22:26:15.000Z","dependencies_parsed_at":"2023-11-14T02:26:09.354Z","dependency_job_id":"e3415e3c-cb71-4eae-9ab8-b7acdcbdffd4","html_url":"https://github.com/ankane/datasketches-ruby","commit_stats":{"total_commits":118,"total_committers":1,"mean_commits":118.0,"dds":0.0,"last_synced_commit":"8d0967707efe144322d1cad80390f14e1001e012"},"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fdatasketches-ruby","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fdatasketches-ruby/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fdatasketches-ruby/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fdatasketches-ruby/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ankane","download_url":"https://codeload.github.com/ankane/datasketches-ruby/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252922339,"owners_count":21825639,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hyperloglog"],"created_at":"2024-10-03T17:24:54.122Z","updated_at":"2025-11-17T14:15:21.529Z","avatar_url":"https://github.com/ankane.png","language":"C++","readme":"# DataSketches Ruby\n\n[DataSketches](https://datasketches.apache.org/) - sketch data structures - for Ruby\n\n[![Build Status](https://github.com/ankane/datasketches-ruby/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/datasketches-ruby/actions)\n\n## Installation\n\nAdd this line to your application’s Gemfile:\n\n```ruby\ngem \"datasketches\"\n```\n\n## Sketch Families\n\nDistinct counting\n\n- [CPC sketch](#cpc-sketch)\n- [HyperLogLog sketch](#hyperloglog-sketch)\n- [Theta sketch](#theta-sketch)\n\nMost frequent\n\n- [Frequent item sketch](#frequent-item-sketch)\n\nQuantiles and histograms\n\n- [KLL sketch](#kll-sketch)\n\nSampling\n\n- [VarOpt sketch](#varopt-sketch)\n\n## CPC Sketch\n\nCreate a sketch\n\n```ruby\nsketch = DataSketches::CpcSketch.new\n```\n\nAdd data\n\n```ruby\nsketch.update(1)\nsketch.update(2.0)\nsketch.update(\"three\")\n```\n\nEstimate the count\n\n```ruby\nsketch.estimate\n```\n\nSave a sketch\n\n```ruby\ndata = sketch.serialize\n```\n\nLoad a sketch\n\n```ruby\nsketch = DataSketches::CpcSketch.deserialize(data)\n```\n\nGet the union\n\n```ruby\nu = DataSketches::CpcUnion.new(14)\nu.update(sketch1)\nu.update(sketch2)\nu.result\n```\n\n## HyperLogLog Sketch\n\nCreate a sketch\n\n```ruby\nsketch = DataSketches::HllSketch.new(14)\n```\n\nAdd data\n\n```ruby\nsketch.update(1)\nsketch.update(2.0)\nsketch.update(\"three\")\n```\n\nEstimate the count\n\n```ruby\nsketch.estimate\n```\n\nSave a sketch\n\n```ruby\ndata = sketch.serialize_updatable\n# or\ndata = sketch.serialize_compact\n```\n\nLoad a sketch\n\n```ruby\nsketch = DataSketches::HllSketch.deserialize(data)\n```\n\nGet the union\n\n```ruby\nu = DataSketches::HllUnion.new(14)\nu.update(sketch1)\nu.update(sketch2)\nu.result\n```\n\n## Theta Sketch\n\nCreate a sketch\n\n```ruby\nsketch = DataSketches::UpdateThetaSketch.new\n```\n\nAdd data\n\n```ruby\nsketch.update(1)\nsketch.update(2.0)\nsketch.update(\"three\")\n```\n\nEstimate the count\n\n```ruby\nsketch.estimate\n```\n\nSave a sketch\n\n```ruby\ndata = sketch.serialize\n```\n\nLoad a sketch\n\n```ruby\nsketch = DataSketches::UpdateThetaSketch.deserialize(data)\n```\n\nGet the union\n\n```ruby\nu = DataSketches::ThetaUnion.new\nu.update(sketch1)\nu.update(sketch2)\nu.result\n```\n\nGet the intersection\n\n```ruby\ni = DataSketches::ThetaIntersection.new\ni.update(sketch1)\ni.update(sketch2)\ni.result\n```\n\nCompute A not B\n\n```ruby\nd = DataSketches::ThetaANotB.new\nd.compute(a, b)\n```\n\n## Frequent Item Sketch\n\nCreate a sketch\n\n```ruby\nsketch = DataSketches::FrequentStringsSketch.new(64)\n```\n\nAdd data\n\n```ruby\nsketch.update(\"a\")\nsketch.update(\"b\")\nsketch.update(\"c\")\n```\n\nEstimate the frequency of an item\n\n```ruby\nsketch.estimate(\"a\")\n```\n\nSave a sketch\n\n```ruby\ndata = sketch.serialize\n```\n\nLoad a sketch\n\n```ruby\nsketch = DataSketches::FrequentStringsSketch.deserialize(data)\n```\n\n## KLL Sketch\n\nCreate a sketch\n\n```ruby\nsketch = DataSketches::KllIntsSketch.new\n# or\nsketch = DataSketches::KllFloatsSketch.new\n```\n\nAdd data\n\n```ruby\nsketch.update(1)\nsketch.update(2)\nsketch.update(3)\n```\n\nGet quantiles\n\n```ruby\nsketch.quantile(0.5)\n```\n\nGet the minimum and maximum values from the stream\n\n```ruby\nsketch.min_value\nsketch.max_value\n```\n\nSave a sketch\n\n```ruby\ndata = sketch.serialize\n```\n\nLoad a sketch\n\n```ruby\nsketch = DataSketches::KllIntsSketch.deserialize(data)\n```\n\nMerge sketches\n\n```ruby\nsketch.merge(sketch2)\n```\n\n## VarOpt Sketch\n\nCreate a sketch\n\n```ruby\nsketch = DataSketches::VarOptSketch.new(14)\n```\n\nAdd data\n\n```ruby\nsketch.update(1)\nsketch.update(2.0)\nsketch.update(\"three\")\n```\n\nSample data\n\n```ruby\nsketch.samples\n```\n\n## Credits\n\nThis library is modeled after the DataSketches [Python API](https://github.com/apache/datasketches-cpp/tree/master/python).\n\n## History\n\nView the [changelog](https://github.com/ankane/datasketches-ruby/blob/master/CHANGELOG.md)\n\n## Contributing\n\nEveryone is encouraged to help improve this project. Here are a few ways you can help:\n\n- [Report bugs](https://github.com/ankane/datasketches-ruby/issues)\n- Fix bugs and [submit pull requests](https://github.com/ankane/datasketches-ruby/pulls)\n- Write, clarify, or fix documentation\n- Suggest or add new features\n\nTo get started with development:\n\n```sh\ngit clone --recursive https://github.com/ankane/datasketches-ruby.git\ncd datasketches-ruby\nbundle install\nbundle exec rake compile\nbundle exec rake test\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Fdatasketches-ruby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fankane%2Fdatasketches-ruby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Fdatasketches-ruby/lists"}