Some interesting interview Q&A around “randomness”

1 How to sample a random Element from an infinite stream?

For an average developed this is a insane question! or at-least it was for me!!!

If we know the length of the stream we can use in-build or numpy random function to generate a number.

import numpy as np
np.random.randint(0, len(stream_data))

What is random probability ? Its picking an element an element from N numbers i.e 1/N

How to sample from a stream where we don’t know the N ? Interesting right!

It’s true we don’t know the what N is actually, however can we exploit the formula 1/N by replacing N with i, where i is the current index of the element in the stream?

  • Case 1: When there is only one element in the stream
def select_rand(big_stream):
rand_element = None
for i, e in enumerate(big_stream):
if i == 0: # could have been if(len(big_Stream)==0),hold on!
rand_element = e
return rand_element
  • Case 2: Consider stream length greater than 1 i.e the length varying from (1, i+1), starting from the index 1 to i+1. +1 since enumerate start from 0 and we wanted to consider the last element also for sampling

— Ok good! but how to sample a random element from given index i ? Lets see what is the probability of getting 1 in the range of [1, i+1]?

1/i+1

random.randint(1, i+1) == 1 

1 is selected because the length of the stream starts from index 1, use this property to select the element.

import random

def select_rand(big_stream):
rand_element = None

for i, e in enumerate(big_stream):
if i == 0:
rand_element = e
if random.randint(1, i + 1) == 1:
rand_element = e
return rand_element

In more technical terms:

For i = 0, we would've picked uniformly from [0, 0].

For i > 0, before the loop began, any element K in [0, i - 1] had 1 / i chance of being chosen as the random element. We want K to have 1 / (i + 1) chance of being chosen after the iteration. This is the case since the chance of having being chosen already but not getting swapped with the ith element is 1 / i * (1 - (1 / (i + 1))) which is 1 / i * i / (i + 1) or 1 / (i + 1)

This is solved using loop invariant the dailycodingproblem.com mail says :), I have no idea what that means! Good one to get subscribed to.

2 Using a function rand5() that returns an integer from 1 to 5 (inclusive) with uniform probability, implement a function rand7() that returns an integer from 1 to 7 (inclusive)

import numpy as np
from collections import Counter
N = 1000
Counter(np.random.randint(1,6,N) + np.random.randint(1,6,N)//2)
# Counter({1: 392, 7: 847, 3: 2007, 2: 1209, 6: 1570, 5: 1973, 4: 2002})

A simple guy in pursuit of of AI and Deep Learning with Big Data tools :) @ https://www.linkedin.com/in/mageswaran1989/