You are given a list of numbers (unknown length).
Let's say the length is 10.
GetRandom (List) is called once. If implemented correctly, each number has a 1/10 chance of being returned.
GetRandom (List) is called 100 times. If implemented correctly, each number will appear 10 times in the result.
Now you have to do the same for a flow of numbers.
GetRandom is called (Stream, 5). This adds 5 to the transmission. The sequence is of length N = 1, then 5 is returned (probability = 1 / N = 1)
GetRandom (Stream, 3) is called. 3 is added to the transmission. N = 2. 3 or 5 is returned (prob = 1/2).
How will it be checked if this is correct?
If GetRandom (Stream) (without adding more numbers) is called 10 times when the length of the list is 2, each number (3 and 5) must be returned ~ 5 times.
GetRandom is called (Stream, 7). 7 is added to Stream. N = 3. One of the 3 numbers (5, 3, 7) (probability = 1/3) is returned.
But how will it be checked if this is correct?
If GetRandom (Stream) is called 10 times when N = 3, each number is returned ~ 3 times.
So far so good?
Alright, here is my algorithm:
N = 0
Pointer = 0
GetRandom(Stream, Number = NULL):
Pointer += 1
if Number is NOT NULL:
N += 1
if Pointer == N:
Pointer = 1 # Reset
return Stream(Pointer) # Assume 1-based indexes
This simply goes through all the numbers in order / round-robin.
If GetRandom (Stream) is called in a Stream with 100 numbers, 1000 times, each number will appear exactly 10 times.
If GetRandom (Stream, 77) is called in a Stream with 100 numbers (77 is number 101), while the Pointer was reset to the initial location 1. Then, when GetRandom (Stream) is called 101 times, in the call 101, 77 will go out, which satisfies the required probability of 1/101. If called 202 times, on call 202, 77 will be issued, which satisfies 2/202.
Why bother with reservoir sampling k / k + 1, etc.?