{"id":15356349,"url":"https://github.com/saravanabalagi/sampling_utils","last_synced_at":"2025-07-03T01:35:53.557Z","repository":{"id":57463774,"uuid":"223403091","full_name":"saravanabalagi/sampling_utils","owner":"saravanabalagi","description":"Python tools to sample randomly with dont pick closest n elements constraints. Also contains a batch generator for the same to sample with replacement and with repeats if necessary.","archived":false,"fork":false,"pushed_at":"2019-12-02T15:23:39.000Z","size":15,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-03T15:39:42.848Z","etag":null,"topics":["conditional-sampling","constrained-sampling","dont-pick-closest","sampling","sampling-methods","sampling-with-constraints"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/sampling-utils/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saravanabalagi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-11-22T12:55:53.000Z","updated_at":"2023-08-16T11:31:41.000Z","dependencies_parsed_at":"2022-09-26T20:41:28.497Z","dependency_job_id":null,"html_url":"https://github.com/saravanabalagi/sampling_utils","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/saravanabalagi/sampling_utils","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saravanabalagi%2Fsampling_utils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saravanabalagi%2Fsampling_utils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saravanabalagi%2Fsampling_utils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saravanabalagi%2Fsampling_utils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saravanabalagi","download_url":"https://codeload.github.com/saravanabalagi/sampling_utils/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saravanabalagi%2Fsampling_utils/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263244688,"owners_count":23436478,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conditional-sampling","constrained-sampling","dont-pick-closest","sampling","sampling-methods","sampling-with-constraints"],"created_at":"2024-10-01T12:28:27.502Z","updated_at":"2025-07-03T01:35:53.532Z","avatar_url":"https://github.com/saravanabalagi.png","language":"Python","readme":"# Sampling Utils\n\n![Pypi Version](https://img.shields.io/pypi/v/sampling_utils)\n![Pypi Licence](https://img.shields.io/pypi/l/sampling_utils)\n![Pypi Wheel](https://img.shields.io/pypi/wheel/sampling_utils)\n\nPython tools to sample randomly with dont pick closest `n` elements constraints. \nAlso contains a batch generator for the same to sample with replacement and with repeats if necessary.\n\n## Installation\n\nSimply install using `pip`\n\n```sh\npip install sampling_utils\n``` \n\n## Usage\n\n### Dont Pick Closest\n\n```python\nfrom sampling_utils import sample_from_list\nsample_from_list([1,2,3,4,5,6,7,8], dont_pick_closest=2)\n```\nYou are guaranteed to get samples that are at least `dont_pick_closest` apart\u003csup\u003e#\u003c/sup\u003e (in value, not in indices). \nHere you will get samples where `sample` - `any_other_sample` is always greater than 2.\n\nFor example, if 2 is picked, no other item in range [2+`dont_pick_closest` and 2-`dont_pick_closest`] will be picked\n\nAnother example looped 5 times:\n```python\nfor _ in range(5):\n    sample_from_list([1,2,3,4,5,6,8,9,10,12,14], dont_pick_closest=2)\n\n# Output\n# [5, 10, 2, 14]\n# [9, 6, 14, 1]\n# [3, 8, 12]\n# [10, 3, 6, 14]\n# [2, 5, 8, 12]\n```\n\nIf 12 is sampled, sampling 10 and 14 are not allowed since `dont_pick_closest` is 2. \nIn other words, if `n` is sampled, then sampling anything from `[n-dont_pick_closest, ... n-1, n , n+1, ... n+dont_pick_closest]`\nis not allowed (if present in the list).\n\n\u003csup\u003e#\u003c/sup\u003eWill be called as **dont_pick_closest rule** hereafter. \n\n\n### Number of samples\n\nYou can also specify how many samples you want from the list using `number_of_samples` parameter. \nBy default, you get maximum possible samples (without replacement).  \n\n```python\nfor _ in range(5):\n    sample_from_list([1,2,3,4,5,6,8,9,10,12,14], dont_pick_closest=2, num_samples=2)\n\n# Output\n# [8, 2]\n# [6, 3]\n# [12, 1]\n# [4, 10]\n# [9, 1]\n```\n\nIf you try to sample more than what's possible, you will get an error saying that it's not possible.\n\n### Min and max samples\n\nYou may want to just know how much you can sample from a given list obeying the **dont_pick_closest rule**\n\n```python\nfrom sampling_utils import get_min_samples, get_max_samples\nprint(get_min_samples([1,2,3,4,5,6,8,9,10,12,14], dont_pick_closest=2))\nprint(get_max_samples([1,2,3,4,5,6,8,9,10,12,14], dont_pick_closest=2))\n\n# Output\n# Min 3\n# Max 4\n```\n\n### Sampling without replacement successively / Generating batches of samples for one epoch\n\nIf you want to successively sample without replacement i.e. sample as many samples from the list without repeating, \nyou can use `batch_rand_generator` as shown below. \nThis is particularly useful to generate batches of data \nuntil no more batches can be generated (equivalent to one epoch).  \n\n```python\nfrom sampling_utils import batch_rand_generator \nfrom sampling_utils import get_batch_generator_elements\n\nbatch_size = 2\nbrg = batch_rand_generator([1,2,3,4,5,6,8,9,10,12,14], batch_size=batch_size, dont_pick_closest=2)\nprint(get_batch_generator_elements(brg, batch_size=batch_size))\n# Output\n# [[1, 4], [8, 5], [14, 3], [2, 6]]\n```\nNotice that the elements  \n\n- within each batch obey the **dont_pick_closest rule** _(e.g. 1 and 4 from batch 1)_\n- from different batches need not obey the rule _(e.g. 4 and 5 from batch 1 and 2 respectively)._\n\n## Contributing\n\nPull requests are very welcome.\n\n1. Fork the repo\n1. Create new branch with feature name as branch name\n1. Check if things work with a jupyter notebook\n1. Raise a pull request\n\n## Licence\n\nPlease see attached [Licence](LICENCE)","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaravanabalagi%2Fsampling_utils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaravanabalagi%2Fsampling_utils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaravanabalagi%2Fsampling_utils/lists"}