Let's say I need to simulate the following discrete distribution:

$$

P (X = k) =

begin {cases}

frac {1} {2 ^ N}, & text {if $ k = 1 $} \

1 – frac {1} {2 ^ N}, & text {if $ k = 0 $}

end {cases}

$$

The most obvious way is to draw. $ N $ random bits and check if they are all equal to 0 (or 1). However, information theory says

$$

begin {align}

S & = – Sigma_ {i} P_i log {P_i} \

& = – frac {1} {2 ^ N} log { frac {1} {2 ^ N}} – left (1 – frac {1} {2 ^ N} right) log { left (1 – frac {1} {2 ^ N} right)} \

& = frac {1} {2 ^ N} log {2 ^ N} + left (1 – frac {1} {2 ^ N} right) log { frac {2 ^ N} {2 ^ N – 1}} \

& rightarrow 0

end {align}

$$

So the minimum number of random bits actually required *decreases* as $ N $ Goes big How is this possible?

Please, suppose we are running on a computer where bits are their only source of randomness, so you can not simply toss a biased coin.