{"id":17867317,"url":"https://github.com/alexbuccheri/random_sampling","last_synced_at":"2025-10-28T06:35:12.741Z","repository":{"id":253242077,"uuid":"842659599","full_name":"AlexBuccheri/random_sampling","owner":"AlexBuccheri","description":"Personal random sampling testing","archived":false,"fork":false,"pushed_at":"2024-09-24T16:23:25.000Z","size":1573,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-21T23:04:24.597Z","etag":null,"topics":["fortran","random-number-generators","random-sampling"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlexBuccheri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-14T19:53:46.000Z","updated_at":"2024-09-24T16:23:28.000Z","dependencies_parsed_at":"2024-09-12T00:59:49.888Z","dependency_job_id":"082e4a23-cc24-4f79-98de-773118dbe4e5","html_url":"https://github.com/AlexBuccheri/random_sampling","commit_stats":null,"previous_names":["alexbuccheri/random_sampling"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AlexBuccheri/random_sampling","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexBuccheri%2Frandom_sampling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexBuccheri%2Frandom_sampling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexBuccheri%2Frandom_sampling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexBuccheri%2Frandom_sampling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlexBuccheri","download_url":"https://codeload.github.com/AlexBuccheri/random_sampling/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexBuccheri%2Frandom_sampling/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281397340,"owners_count":26493908,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-28T02:00:06.022Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fortran","random-number-generators","random-sampling"],"created_at":"2024-10-28T09:48:09.818Z","updated_at":"2025-10-28T06:35:12.722Z","avatar_url":"https://github.com/AlexBuccheri.png","language":"Jupyter Notebook","readme":"# Random Number Implementation and Validation\n\nCompilable code is build with:\n\n```shell\ncmake -S . -B cmake-build\ncmake --build cmake-build\n```\n\nSome analysis is performed in the [jupyter] folder. This is done by automatically wrapping the fortran shared library\nwith the fantastic [gfort2py](https://github.com/rjfarmer/gfort2py).\n\n## TODOs\n\n**Sorting**\n\n- [ ] Migrate wrapped GSL C calling example here\n- [ ] Add a pytoml, such that one can straightforwardly install the python dependencies\n\n**Mapping a random number to an interval**\n\n- [ ] Test Lemire's algorithm in the [integer mapping module](src/fortran/integer_mapping.f90)\n  * Note, there _could_ be issues arising from transcribing from C to fortran\n\n**Sampling without replacement**\n\nTest:\n- [ ] My choice of random seed precision (`uint`)\n- [ ] Hidde shuffle\n   * Contains bug/s\n- [ ] Time all algorithms tested in the [notebook](jupyter/sampling_without_replacement.ipynb)\n\nImplement\n- [ ] Weighted version of [reservoir sampling](https://en.wikipedia.org/wiki/Reservoir_sampling#Weighted_random_sampling)\n  * Ideally where the indices and weights are evaluated on-the-fly\n  * Also see the paper [Weighted random sampling with a reservoir](https://doi.org/10.1016/j.ipl.2005.11.003)\n\n\n## PRNGs\n\nImplemented [XOR](src/fortran/xorshifts.f90), which is straightforward and has a large period of over a million. \nHowever, one has to be careful with the handling of signed  vs unsigned integers when transcribing\nfrom C.\n\nAn unsigned int in C can hold numbers $[0, 2^{32} - 1]$, however fortran does not support this data type. \nInstead, signed `int32` has the range $[-2^{31}, 2^{31} - 1]$. I use a bit mask to remap negative values:\n\n```fortran\n! A mask that has all the bits set to 1 except the most significant bit \n! i.e. the sign bit in a 32-bit signed integer\niand(x, Z'7FFFFFFF')\n```\n\nThis leaves $[0, 2^{31}-1]$ unchanged. However, for negative values, $x$ is represented using two's complement notation. \nWhen you apply `iand(x, Z'7FFFFFFF')`, you are effectively masking out the sign bit. This operation converts a negative \nnumber into its unsigned equivalent by removing the sign bit and keeping only the lower 31 bits.\n\n```fortran\ninteger(int32) :: x\nx = -12345678_int32  ! x = -12345678 (in two's complement)\nx = iand(x, Z'7FFFFFFF')  ! Masking the sign bit\n! x becomes 2015137970 (the unsigned equivalent of -12345678)\n```\n\nThere are smarter things one can do. See this [Github reference](https://github.com/Jonas-Finkler/fortran-xorshift-64-star/blob/main/src/random.f90)\nby Jonas Finker, or [MR 2528](https://gitlab.com/octopus-code/octopus/-/merge_requests/2528/) for Octopus, however the \nabove is currently sufficient for my needs.\n\n\n## Mapping integers to a smaller range\n\nAlso showed that mapping $[0, P)$ to $[a, b)$ is fine when the values are real\nbut mapping to a smaller range of integers will inevitably result in duplication of numbers,\neven when uniformly sampling.\n\nSee:\n* [Lemire's mapping](src/cpp/lemire_mapping.cpp)\n* Mapping in the [XOR](src/fortran/integer_mapping.f90) module\n\n\n## Random Sampling a Population with no Replacements\n\nFor my use cases, one requires random sampling with no replacement.\nAlgorithms that randomly sample a population with no replacements include:\n\n* Reservoir sampling\n\t* A couple of versions are shown on [wikipedia](https://en.wikipedia.org/wiki/Reservoir_sampling)\n    * My [implementations](src/fortran/reservoir_sampling.f90)\n\n* Skip and Gap Sampling (Vitter's Algorithm)\n\t* Can be more efficient than standard Reservoir Sampling, especially for large streams\n\t\t* [Original paper](http://www.ittc.ku.edu/~jsv/Papers/Vit84.sampling.pdf)  with algorithms A - D, and followed up [here](http://www.ittc.ku.edu/~jsv/Papers/Vit87.RandomSampling.pdf)\n\t\t* [Reservoir Algorithms: Random Sampling with a Reservoir](https://richardstartin.github.io/posts/reservoir-sampling#reservoir-algorithms-random-sampling-with-a-reservoir). This link is quite thorough and covers Algorithms A, D, R, X, Z, L\n\t\t* Some more details on Knuth's Algorithm L [here](http://guptamukul.blogspot.com/2009/12/understanding-algorithm-l_05.html)\t\n\t* [Blog post](http://erikerlandson.github.io/blog/2014/09/11/faster-random-samples-with-gap-sampling/) on gap sampling\n\t\t* Quite short\n\t\t* Touches on Poisson distribution, which is also utilised by hidden shuffle - worth a read, but the code is Java\n\n* [Hidden Shuffle](http://wrap.warwick.ac.uk/150064)  This gives a python implementation, and claims it's more efficient than the above methods\n  * My [python implementation](src/python/hidden_shuffle.py), transcribed from the paper\n  * My [fortran implementation](src/fortran/hidden_shuffle.f90)\n\n* Hash-Based Sampling\n\n* Simple Random Sampling with Sorting\n\t* Efficient when the range (m) is small w.r.t. N (i.e. $2^{32}$)\n\t* Guarantees uniqueness of selected items.\n\nSome overviews on the problem, and related algorithms:\n * Looks like a good, recent [paper](https://arxiv.org/pdf/2104.05091) \"Simple, Optimal Algorithms for Random Sampling Without Replacement\" giving an overview of the methods listed here\n * For way more detail and code examples, see this [gist](https://peteroupc.github.io/randomfunc.html)\n\nFortran implementation references:\n* [Suite of old apps](https://people.math.sc.edu/Burkardt/f_src/rnglib/rnglib.html)\n* [XOR Github reference](https://github.com/Jonas-Finkler/fortran-xorshift-64-star/blob/main/src/random.f90) \n* [MersenneTwister-Lab in C](https://github.com/MersenneTwister-Lab/XSadd)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexbuccheri%2Frandom_sampling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexbuccheri%2Frandom_sampling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexbuccheri%2Frandom_sampling/lists"}