{"id":13879329,"url":"https://github.com/groupon/locality-uuid.rb","last_synced_at":"2025-07-16T15:32:06.272Z","repository":{"id":8394811,"uuid":"9973092","full_name":"groupon/locality-uuid.rb","owner":"groupon","description":null,"archived":false,"fork":false,"pushed_at":"2015-06-29T00:27:29.000Z","size":143,"stargazers_count":18,"open_issues_count":1,"forks_count":7,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-10-29T05:07:52.913Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/groupon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-05-10T02:37:53.000Z","updated_at":"2020-09-24T17:08:21.000Z","dependencies_parsed_at":"2022-07-31T01:40:02.569Z","dependency_job_id":null,"html_url":"https://github.com/groupon/locality-uuid.rb","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/groupon%2Flocality-uuid.rb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/groupon%2Flocality-uuid.rb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/groupon%2Flocality-uuid.rb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/groupon%2Flocality-uuid.rb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/groupon","download_url":"https://codeload.github.com/groupon/locality-uuid.rb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226143895,"owners_count":17580245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-06T08:02:17.388Z","updated_at":"2024-11-24T08:31:15.874Z","avatar_url":"https://github.com/groupon.png","language":"Ruby","funding_links":[],"categories":["Ruby"],"sub_categories":[],"readme":"Locality UUID (vB) Ruby\n=======================\n\nThis is a UUID class intended to help control data locality when inserting into a distributed data \nsystem, such as MongoDB or HBase. There is also [a Java implementation](http://github.com/groupon/locality-uuid.java).\nThis version does not conform to any external standard or spec. Developed at Groupon in Palo Alto\nby Peter Bakkum and Michael Craig.\n\nProblems we encountered with other UUID solutions:\n\n- Collisions when used with heavy concurrency.\n- Difficult to retrieve useful information, such as the timestamp, from the id.\n- Time-based versions (such as v1) place this component at the front, meaning all generated ids \n    start with the same bytes for some time period. This was disadvantageous for us because all\n    of these writes were hashed to the same shards of our distributed databases. Thus, only one\n    machine was receiving writes at a given time.\n\nSolutions:\n\n- We have tested thoroughly in a concurrent environment and include both the PID and MAC Address \n    in the UUID. In the first block of the UUID we have a counter which we increment with a large\n    primary number, ensuring that the counter in a single process takes a long time to wrap around\n    to the same value.\n- The UUID layout is very simple and documented below. Retrieving the millisecond-precision timestamp\n    is as simple as copying the last segment and converting from hex to decimal.\n- To ensure that generated ids are evenly distributed in terms of the content of their first few bytes,\n    we increment this value with a large number. This means that DB writes using these values are\n    evenly distributed across the cluster. We ALSO allow toggling into sequential mode, so that the\n    first few bytes of the UUID are consistent and writes done with these keys hash to tho same machine\n    in the cluster, when this characteristic is desireable. Thus, this library is a tool for managing\n    the locality of reads and writes of data inserted with UUID keys.\n\nThis library has both Java and Ruby implementations, and is also accessible from the command line.\n\nFormat\n------\n\nThis generates UUIDs in the following format:\n\n```\n   wwwwwwww-xxxx-byyy-yyyy-zzzzzzzzzzzz\n\nw: highly variable counter value\nx: process id\nb: literal hex 'b' representing the UUID version\ny: fragment of machine MAC address\nz: UTC timestamp (milliseconds since epoch)\n```\n\nExample:\n\n```\n   20be0ffc-314a-bd53-7a50-013a65ca76d2\n\ncounter     : 3,488,672,514\nprocess id  : 12,618\nMAC address : __:__:_d:53:7a:50\ntimestamp   : 1,350,327,498,450 (Mon, 15 Oct 2012 18:58:18.450 UTC)\n```\n\nExample Use\n-----------\n\nInstall the gem:\n\n```bash\ngem install locality-uuid\n```\n\nUse it in a program:\n\n```ruby\nrequire 'locality-uuid'\n\nuuid = UUID.new\n\nputs \"UUID          : #{uuid.to_s}\"\nputs \"raw bytes     : #{uuid.bytes.unpack(\"H*\")[0]}\"\nputs \"process id    : #{uuid.pid}\"\nputs \"MAC fragment  : #{uuid.mac}\"\nputs \"timestamp     : #{uuid.timestamp}\"\nputs \"version       : #{uuid.version}\"\n\ncopy = UUID.new(uuid.to_s)\n\nputs \"copied        : #{(uuid == copy)}\"\n```\n\nOr just run from the command line:\n\n```\nlocality-uuid\n```\n\nNotes\n-----\n\nThis UUID version was designed to have easily readable PID, MAC address, and\ntimestamp values, with a regularly incremented count. The motivations for this\nimplementation are to reduce the chance of duplicate ids, store more useful\ninformation in UUIDs, and ensure that the first few characters vary for successively\ngenerated ids, which can be important for splitting ids over a cluster. The UUID\ngenerator is also designed to be be thread-safe without locking.\n\nUniqueness is supported by the millisecond precision timestamp, the MAC address\nof the generating machine, the 2 byte process id, and a 4 byte counter. Thus,\na UUID is guaranteed to be unique in an id space if each machine allows 65,536 processes or less,\ndoes not share the last 28 bits of its MAC address with another machine in the id\nspace, and generates fewer than 4,294,967,295 ids per millisecond in a process.\n\n___Counter___\nThe counter value is reversed, such that the least significant 4-bit block is the first\ncharacter of the UUID. This is useful because it makes the earlier bits of the UUID\nchange more often. Note that the counter is not incremented by 1 each time, but rather\nby a large prime number, such that its incremental value is significantly different, but\nit takes many iterations to reach the same value.\n\nExamples of sequentially generated ids in the default counter mode:\n```\nc8c9cef9-7a7f-bd53-7a50-013e4e2afbde\n14951cfa-7a7f-bd53-7a50-013e4e2afbde\n6f5169fb-7a7f-bd53-7a50-013e4e2afbde\nba2da6fc-7a7f-bd53-7a50-013e4e2afbde\n06f8f3fc-7a7f-bd53-7a50-013e4e2afbde\n51c441fd-7a7f-bd53-7a50-013e4e2afbde\nac809efe-7a7f-bd53-7a50-013e4e2afbde\nf75cdbff-7a7f-bd53-7a50-013e4e2afbde\n```\n\nNote the high variability of the first few characters.\n\nThe counter can also be toggled into sequential mode to effectively reverse this logic.\nThis is useful because it means you can control the locality of your data as you generate\nids across a cluster. Sequential mode works by creating an initial value based on a hash\nof the current date and hour. This means it can be discovered independently on distributed\nmachines. The value is then incremented by one for each id generated. If you use key-based\nsharding, data inserted with these ids should have some locality.\n\n\nExamples of sequentially generated ids in sequential counter mode:\n```\nf5166777-7a7f-bd53-7a50-013e4e2afc26\nf5166778-7a7f-bd53-7a50-013e4e2afc26\nf5166779-7a7f-bd53-7a50-013e4e2afc26\nf516677a-7a7f-bd53-7a50-013e4e2afc26\nf516677b-7a7f-bd53-7a50-013e4e2afc26\nf516677c-7a7f-bd53-7a50-013e4e2afc26\nf516677d-7a7f-bd53-7a50-013e4e2afc26\nf516677e-7a7f-bd53-7a50-013e4e2afc26\n```\n\n___PID___\nThis value is just the current process id modulo 65,536. In my experience, most linux\nmachines do not allow PID numbers to go this high, but OSX machines do.\n\n___MAC Address___\nThe last 28 bits of the first active MAC address found on the machine. If no active\nMAC address is found, this is filled in with zeroes.\n\n___Timestamp___\nThis is the UTC milliseconds since Unix epoch. To convert to a time manually first\ncopy the last segment of the UUID, convert to decimal, then use a time library to\ncount up from 1970-1-1 0:00:00.000 UTC.\n\n\nAPI\n---\n\n__UUID.initialize__\n\nGenerate a new UUID object.\n\n__UUID.initialize (String uuid)__\n\nConstruct a UUID with the given String, must be of the form `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`\nwhere `x` matches `[0-9a-f]`.\n\n__UUID.initialize (UUID uuid)__\n\nConstruct a UUID with the given UUID. This creates a copy.\n\n__UUID::use_sequential_ids()__\n\nToggle into sequential mode, so ids are generated in order.\n\n__UUID::use_variable_ids()__\n\nToggle into variable mode, so the first few characters of each id vary during generation. This is the default mode.\n\n__UUID.bytes() -\u003e String__\n\nGet raw byte content of UUID.\n\n__UUID.to_s() -\u003e String__\n\nGet UUID String in the standard format.\n\n__UUID.version() -\u003e String__\n\nReturn the UUID version character, which is 'b' for ids generated by this library.\n\n__UUID.pid() -\u003e Fixnum__\n\nReturn the PID embedded in the UUID.\n\n__UUID.timestamp() -\u003e Time__\n\nReturn timestamp embedded in UUID, which is set at generation.\n\n__UUID.mac() -\u003e String__\n\nGet the embedded MAC Address fragment. This will be a String of hex characters.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgroupon%2Flocality-uuid.rb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgroupon%2Flocality-uuid.rb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgroupon%2Flocality-uuid.rb/lists"}