{"id":17398424,"url":"https://github.com/vweevers/zipfian-integer","last_synced_at":"2025-04-30T05:22:47.711Z","repository":{"id":34914886,"uuid":"189721607","full_name":"vweevers/zipfian-integer","owner":"vweevers","description":"Get an integer between min and max with skew towards either.","archived":false,"fork":false,"pushed_at":"2022-05-20T22:09:21.000Z","size":216,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-25T01:58:22.527Z","etag":null,"topics":["nodejs","number-generator","random","zipf","zipfian"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vweevers.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-01T10:49:49.000Z","updated_at":"2022-05-20T22:09:08.000Z","dependencies_parsed_at":"2022-09-11T23:51:59.963Z","dependency_job_id":null,"html_url":"https://github.com/vweevers/zipfian-integer","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vweevers%2Fzipfian-integer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vweevers%2Fzipfian-integer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vweevers%2Fzipfian-integer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vweevers%2Fzipfian-integer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vweevers","download_url":"https://codeload.github.com/vweevers/zipfian-integer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251645989,"owners_count":21620848,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nodejs","number-generator","random","zipf","zipfian"],"created_at":"2024-10-16T14:56:13.302Z","updated_at":"2025-04-30T05:22:47.691Z","avatar_url":"https://github.com/vweevers.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# zipfian-integer\n\n\u003e **Get an integer between a min and max with skew towards either.**  \n\u003e A JS port of [Apache Commons Math](http://commons.apache.org/math/)'s `ZipfRejectionInversionSampler`.\n\n[![npm status](http://img.shields.io/npm/v/zipfian-integer.svg)](https://www.npmjs.org/package/zipfian-integer)\n[![node](https://img.shields.io/node/v/zipfian-integer.svg)](https://www.npmjs.org/package/zipfian-integer)\n[![Travis build status](https://img.shields.io/travis/vweevers/zipfian-integer.svg?label=travis)](http://travis-ci.org/vweevers/zipfian-integer)\n[![JavaScript Style Guide](https://img.shields.io/badge/code_style-standard-brightgreen.svg)](https://standardjs.com)\n\n## Table of Contents\n\n\u003cdetails\u003e\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n- [Usage](#usage)\n- [About](#about)\n- [Visual Example](#visual-example)\n- [API](#api)\n  - [`sample = zipfian(min, max, skew[, rng])`](#sample--zipfianmin-max-skew-rng)\n  - [`num = sample()`](#num--sample)\n- [Install](#install)\n- [Development](#development)\n  - [Verify](#verify)\n  - [Benchmark](#benchmark)\n- [License](#license)\n\n\u003c/details\u003e\n\n## Usage\n\n```js\nconst zipfian = require('zipfian-integer')\nconst sample = zipfian(1, 100, 0.2)\n\nconsole.log(sample())\nconsole.log(sample())\n```\n\nThis logs two random integers between 1 and 100 with a `skew` of 0.2, thus more frequently returning integers \u0026lt; 50. You can optionally inject a (seeded) random number generator. The following example always returns the same integers in sequence unless you change the seed:\n\n```js\nconst random = require('pseudo-math-random')('a seed')\nconst sample = zipfian(1, 100, 0.2, random)\n```\n\n## About\n\nThis module is an optionally deterministic random number generator. With a `skew` parameter of 0 it produces integers with a uniform distribution over the range `min` to `max`. As `skew` increases, it produces integers with a Zipfian distribution over that range: integers near the `min` become rapidly more likely than integers near the `max`.\n\n\u003e :bulb: **Zipf's law states that given some corpus of natural language utterances, the frequency of any word is [inversely proportional](https://en.wikipedia.org/wiki/Inversely_proportional) to its rank in the [frequency table](https://en.wikipedia.org/wiki/Frequency_table). Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.**  \n\u003e _[Zipf's law](https://en.wikipedia.org/wiki/Zipf%27s_law)_\n\nFor example, words in a typical English text have a Zipfian skew of [`1.07`](https://medium.com/@jasoncrease/zipf-54912d5651cc). If we have an array of all 171_000 English words ordered by their frequency and want to randomly pick a word, favoring the naturally most frequent words like \"the\" and \"of\":\n\n```js\nconst corpus = ['the', 'of', ..., 'ragtop', 'eucatastrophe']\nconst randomIndex = zipfian(0, corpus.length - 1, 1.07)\n\nconsole.log(corpus[randomIndex()])\nconsole.log(corpus[randomIndex()])\n```\n\n\u003e :bulb: **The same relationship occurs in many other rankings unrelated to language.**  \n\u003e _[Zipf's law](https://en.wikipedia.org/wiki/Zipf%27s_law)_\n\nThis means we're not limited to words in the English language. We can make use of Zipf's law in benchmarks of a key-value store, for example! For convenience `zipfian-integer` also supports a negative `skew`, more frequently returning integers leaning towards the `max`. Let's say the integers represent the keyspace of a key-value store with numeric keys in insertion order, then a positive `skew` favors the oldest keys while a negative `skew` favors the latest keys.\n\nBy using the integer returned by `zipfian-integer` not as a rank (that then requires a lookup) but as the key itself, we have a O(1) way to target keys in a keyspace of any size. If we then also inject a seeded random number generator into `zipfian-integer`, we can make every benchmark target the exact same keys. For instance to benchmark reads on the oldest keys or uniformly distributed writes on the entire keyspace.\n\nThe algorithm is fast, accurate and has a constant memory footprint. Other solutions like [`prob.js`](https://github.com/bramp/prob.js) build a lookup table which costs time and memory. Performance of `zipfian-integer` depends mostly on `skew` and your choice of random number generator (see benchmarks below).\n\n## Visual Example\n\n\u003ctable\u003e\n\u003ctr\u003e\n  \u003ctd\u003e\u003cimg src=\"https://raw.githubusercontent.com/vweevers/zipfian-integer/7f2a2b874e3bc068b48952c4e698ad3d9463e8c7/img/1.png\" /\u003e\u003c/td\u003e\n  \u003ctd\u003e\u003csub\u003e\u003ccode\u003eskew=1\u003c/code\u003e vs \u003ccode\u003eskew=-1\u003c/code\u003e\u003c/sub\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n  \u003ctd\u003e\u003cimg src=\"https://raw.githubusercontent.com/vweevers/zipfian-integer/72014f2434d05f2874f3b1434952b017de8889e5/img/1b.png\" /\u003e\u003c/td\u003e\n  \u003ctd\u003e\u003csub\u003e\u003ccode\u003eskew=2\u003c/code\u003e vs \u003ccode\u003eskew=-2\u003c/code\u003e\u003c/sub\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n  \u003ctd\u003e\u003cimg src=\"https://raw.githubusercontent.com/vweevers/zipfian-integer/7f2a2b874e3bc068b48952c4e698ad3d9463e8c7/img/2.png\" /\u003e\u003c/td\u003e\n  \u003ctd\u003e\u003csub\u003e\u003ccode\u003eskew=1\u003c/code\u003e vs \u003ccode\u003eskew=-1\u003c/code\u003e, same seed\u003c/sub\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n  \u003ctd\u003e\u003cimg src=\"https://raw.githubusercontent.com/vweevers/zipfian-integer/7f2a2b874e3bc068b48952c4e698ad3d9463e8c7/img/3.png\" /\u003e\u003c/td\u003e\n  \u003ctd\u003e\u003csub\u003e\u003ccode\u003eskew=0\u003c/code\u003e\u003c/sub\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n## API\n\n### `sample = zipfian(min, max, skew[, rng])`\n\nCreate a new random number generator with a Zipfian distribution. The `skew` must be a floating-point number. The `rng` if provided must be a function that returns a random floating-point number between 0 (inclusive) and 1 (exclusive). It defaults to [`Math.random`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random).\n\n### `num = sample()`\n\nGet a random integer between min (inclusive) and max (inclusive).\n\n## Install\n\nWith [npm](https://npmjs.org) do:\n\n```\nnpm install zipfian-integer\n```\n\n## Development\n\n### Verify\n\nA small test is included to verify `zipfian-integer` results against results of the Apache Commons Math original. First generate test data (a few million combinations of parameters):\n\n- Install JDK and [Maven](https://maven.apache.org/)\n- `cd test/java`\n- `mvn compile`\n- `mvn -q exec:java \u003e ../../data.ndjson`\n\nThen verify it:\n\n- `cd ../..`\n- `npm i`\n- `node test/verify.js test/data.ndjson`\n\n### Benchmark\n\n```\n$ node benchmark.js\nnode v10.14.1\n\nn=1e2  skew=+0.0 pseudo-math-random x 8,446,287 ops/sec ±0.67%\nn=1e2  skew=+0.0 Math.random x 9,556,211 ops/sec ±0.34%\nn=1e6  skew=+0.0 pseudo-math-random x 8,141,930 ops/sec ±0.43%\nn=1e6  skew=+0.0 Math.random x 9,509,349 ops/sec ±0.42%\nn=1e12 skew=+0.0 pseudo-math-random x 7,418,569 ops/sec ±0.44%\nn=1e12 skew=+0.0 Math.random x 8,905,792 ops/sec ±0.28%\nn=1e2  skew=+1.0 pseudo-math-random x 12,010,890 ops/sec ±0.41%\nn=1e2  skew=+1.0 Math.random x 19,650,279 ops/sec ±0.55%\nn=1e6  skew=+1.0 pseudo-math-random x 11,954,408 ops/sec ±0.53%\nn=1e6  skew=+1.0 Math.random x 19,752,283 ops/sec ±0.64%\nn=1e12 skew=+1.0 pseudo-math-random x 11,579,715 ops/sec ±0.51%\nn=1e12 skew=+1.0 Math.random x 17,908,808 ops/sec ±0.51%\nn=1e2  skew=-0.5 pseudo-math-random x 7,907,162 ops/sec ±0.40%\nn=1e2  skew=-0.5 Math.random x 9,388,148 ops/sec ±0.52%\nn=1e6  skew=-0.5 pseudo-math-random x 7,879,909 ops/sec ±0.35%\nn=1e6  skew=-0.5 Math.random x 9,196,799 ops/sec ±0.36%\nn=1e12 skew=-0.5 pseudo-math-random x 7,250,634 ops/sec ±0.30%\nn=1e12 skew=-0.5 Math.random x 8,636,395 ops/sec ±0.46%\n\nFastest is:\nn=1e6  skew=+1.0 Math.random\n```\n\n## License\n\nThe code of this port is licensed MIT © 2019-present Vincent Weevers. The original code (Apache Commons Math v3.6.1) is licensed under the Apache License 2.0. For details please see the full [LICENSE](LICENSE). The `NOTICE` of Apache Commons Math follows:\n\n```\nApache Commons Math\nCopyright 2001-2016 The Apache Software Foundation\n\nThis product includes software developed at\nThe Apache Software Foundation (http://www.apache.org/).\n\nThis product includes software developed for Orekit by\nCS Systèmes d'Information (http://www.c-s.fr/)\nCopyright 2010-2012 CS Systèmes d'Information\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvweevers%2Fzipfian-integer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvweevers%2Fzipfian-integer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvweevers%2Fzipfian-integer/lists"}