https://github.com/ori88c/non-replacement-random-item-sampler
An efficient random item sampler that ensures O(1) sampling complexity, and equal selection probability for all items across cycles. Each cycle ensures unique, non-repeating item selections, with each item sampled only once per cycle. Upon cycle completion, the sampler automatically refreshes, initiating a new cycle to repeat the process.
https://github.com/ori88c/non-replacement-random-item-sampler
constant-time cycles efficient-sampling no-repetitions non-replacement random random-element random-item random-permutations random-sample random-samples random-select random-selection random-selector random-shuffle rounds sampler sampling shuffle
Last synced: 5 months ago
JSON representation
An efficient random item sampler that ensures O(1) sampling complexity, and equal selection probability for all items across cycles. Each cycle ensures unique, non-repeating item selections, with each item sampled only once per cycle. Upon cycle completion, the sampler automatically refreshes, initiating a new cycle to repeat the process.
- Host: GitHub
- URL: https://github.com/ori88c/non-replacement-random-item-sampler
- Owner: ori88c
- License: apache-2.0
- Created: 2024-11-10T04:09:01.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-10T04:13:03.000Z (over 1 year ago)
- Last Synced: 2025-10-12T09:43:46.980Z (8 months ago)
- Topics: constant-time, cycles, efficient-sampling, no-repetitions, non-replacement, random, random-element, random-item, random-permutations, random-sample, random-samples, random-select, random-selection, random-selector, random-shuffle, rounds, sampler, sampling, shuffle
- Language: TypeScript
- Homepage:
- Size: 46.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Non-Replacement Random Item Sampler
The `NonReplacementRandomSampler` class implements a random item sampler that ensures equal selection probability for all items across cycles. Each cycle guarantees unique, non-repeating item selections, with each item sampled only **once per cycle**. Upon cycle completion — when all items have been selected — the sampler automatically refreshes, seamlessly initiating a new cycle to repeat the process.
Use case examples include:
* __Customer Feedback Surveys with Limited Fatigue__: Suppose a company has a pool of 1000 customers and wants to conduct feedback surveys over time without exhausting the same group of customers repeatedly. Using `NonReplacementRandomSampler`, each survey cycle ensures that each customer is sampled only once, minimizing survey fatigue within the cycle. When the cycle is exhausted (after surveying all 1000 customers), a new cycle begins, allowing the company to start another round of sampling while still ensuring each customer is only surveyed once per cycle. This approach **maintains variety** and minimizes the risk of "survey burnout" in any particular customer.
* __Randomized Load Testing of Distributed Servers__: In a distributed server environment with 50 servers, a company wants to run randomized load tests to simulate production traffic without overloading the same servers repeatedly. Using `NonReplacementRandomSampler`, each server is selected once per testing cycle, ensuring balanced load testing across all servers. When all servers have been tested, a new cycle automatically begins, allowing for continued load testing across the server pool without disproportionate load on any individual server within a cycle.
## Table of Contents :bookmark_tabs:
* [Key Features](#key-features)
* [API](#api)
* [Getter Methods](#getter-methods)
* [Use Case Example: Randomized Playlist Generator](#use-case-example)
* [Algorithm](#algorithm)
* [License](#license)
- __Random Sampling without Repetitions__: The sampler’s internal state is tracked by a cycle number, starting at 1. Each cycle guarantees unique, non-repeating item selections, with each item sampled only **once per cycle**. Upon cycle completion — when all items have been selected — the sampler automatically refreshes, seamlessly initiating a new cycle to repeat the process.
- __Manually Trigger a New Cycle__: The `startNewCycle` method initiates a new cycle, making all input items available again for sampling. This is useful in scenarios where the non-replacement attribute is not required for extended periods, allowing a new cycle to begin without waiting for full cycle exhaustion.
- __Sample Multiple Unique Items at Once__: The `sampleGroupFromCycle` method samples multiple **unique** items from the current cycle in a single operation.
- __Efficiency :gear:__: All methods have time and space complexities of O(1), except for `sampleGroupFromCycle`, where both time and space complexities are O(groupSize).
- __Comprehensive documentation :books:__: The class is thoroughly documented, enabling IDEs to provide helpful tooltips that enhance the coding experience.
- __Tests :test_tube:__: **Fully covered** by comprehensive unit tests.
- **TypeScript** support.
- No external runtime dependencies: Only development dependencies are used.
- ES2020 Compatibility: The `tsconfig` target is set to ES2020, ensuring compatibility with ES2020 environments.
The `NonReplacementRandomSampler` class provides the following methods:
* __sample__: Returns a random item from the remaining items in the current cycle. In other words, an item that has not been sampled yet in the current cycle.
* __sampleGroupFromCycle__: Returns an array containing `groupSize` unique items from the current cycle, i.e., items that have not been previously sampled in the current cycle. To avoid ambiguity, these items will also *not* be sampled again in the same cycle. It is equivalent to executing the `sample` method multiple times, with the difference that the group size is limited by the number of remaining items in the cycle. This distinction is necessary to ensure a *unique* group of items.
* __startNewCycle__: Manually initiates a new cycle, making all input items available again for sampling.
If needed, refer to the code documentation for a more comprehensive description.
The `NonReplacementRandomSampler` class provides the following getter methods to reflect the current state:
* __size__: The total number of items available for sampling. Remains constant throughout the instance’s lifespan.
* __currentCycle__: The current cycle number. The cycle number increments in either of the following cases: Either all items in the current cycle have been sampled (cycle exhaustion), **or** the `startNewCycle` method is manually triggered.
* __remainingItemsInCurrentCycle__: The number of items not yet sampled in the current cycle.
To eliminate any ambiguity, all getter methods have **O(1)** time and space complexity.
## Use Case Example: Randomized Playlist Generator :man_technologist:
Consider a component responsible for selecting songs randomly for a playlist. Once all songs in the list have been played, the component automatically begins a new cycle, reshuffling the songs for fresh playback.
```ts
import { NonReplacementRandomSampler } from 'non-replacement-random-item-sampler';
interface Song {
id: string;
title: string;
artist: string;
durationInSeconds: number;
genre: string;
releaseYear: number;
azureStorageUri: string; // URI to access the song file in Azure Storage
}
class RandomizedPlaylistGenerator {
private readonly _songSampler: NonReplacementRandomSampler;
constructor(songs: Song[]) {
// Note: Refer to the Ownership Transfer section in the NonReplacementRandomSampler
// constructor documentation.
this._songSampler = new NonReplacementRandomSampler(songs);
}
public get remainingSongsUntilNextShuffle(): number {
return this._songSampler.remainingItemsInCurrentCycle;
}
public sampleNext(): Song {
return this._songSampler.sample();
}
public sampleGroupFromPlaylist(groupSize: number): Song[] {
return this._songSampler.sampleGroupFromCycle(groupSize);
}
public shuffle(): void {
this._songSampler.startNewCycle();
}
}
```
The sampler’s non-replacement requirement presents a complexity challenge, as using array-mutating methods like `Array.splice` would result in an undesirable O(n) complexity per sample.
To address this, we conceptually divide the internal items array into two distinct parts:
* __Available Prefix__: A prefix of items that are still available for sampling in the current cycle.
* __Unavailable Suffix__: A suffix of items that have already been sampled within the current cycle.
The algorithm samples a random index from the Available Prefix and **swaps** it with the **new** start of the Unavailable Suffix. Each sample operation therefore **decreases** the length of the prefix by 1 and **increases** the length of the suffix by 1. As a result, each sampled item is excluded from further selections during the current cycle.
For example, consider an array `[A, B, C, D]`. Initially, the Available Prefix length is 4, allowing us to sample a random index within [0, 3]. Suppose we sample index 1 (item `B`). We then swap `B` with `D`, transforming the array to `[A, D, C, B]` and reducing the Available Prefix length to 3. Now, `B` is in the Unavailable Suffix, preventing it from being selected again within this cycle.
On the next sampling attempt, a random index within [0, 2] is chosen, say 0 (item `A`). We swap `A` with `C`, resulting in `[C, D, A, B]`. Now both `A` and `B` are in the Unavailable Suffix, ensuring neither is sampled again during the current cycle. This process continues until all items are exhausted from the Available Prefix, completing the cycle.
[Apache 2.0](LICENSE)