# Las Vegas algorithm for finding 00000 in bit string

Problem 1: Consider the following problem: given a binary string $$w=a_1a_2cdots a_n in{0,1}^*$$, decide whether $$w$$ contains 00000 as a substring (i.e., where $$w$$ contains five consecutive 0’s). There is an obvious $$O(n)$$-time algorithm. You will investigate the hidden constant factor, concerning the number of bits that need to be examined (i.e., the number of $$a_i$$‘s that need to be evaluated) to solve this problem.

1. (10/100 points) First argue that every deterministic algorithm solving this problem needs to evaluate $$Omega(n)$$ bits in the worst case.
2. (30/100 points) Prove that for a sufficiently large constant $$c$$, for any string $$a_1 cdots a_c$$ of length $$c$$ that does not contain 00000, there must exist two indices $$i,j in {1,ldots,c}$$ with $$a_i=a_j=1$$ and $$2 leq |i-j| leq 5$$.
3. (60/100 points) Design a Las Vegas randomized algorithm solving this problem (with zero error probability) that evaluates an expected $$alpha n + o(n)$$ number of bits for any input, for some constant $$alpha$$ strictly smaller than $$1$$. (You do not need to optimize the constant $$alpha$$.)

Above is a question some friends and I have looked at to prepare for a qualifying exam and is itself from a past qualifying exam. Parts 1 and 2 are fine but what we are having troubles with is part 3. I think the idea behind this problem is to imagine that evaluating some bit $$s_i$$ is really expensive, so we would like to get the right answer while minimizing how many of these bits we evaluate. That said, there is some ambiguity to us about whether we (1) care about the number of distinct bits we evaluate or (2) the total number of times we evaluate any bit. Clearly if we were worried about the latter, we could just store the result of the bit evaluation and avoid doing an expensive evaluation again, but we are not sure. For now, I assume the latter case and specifically that if we want the value for the same bit twice, we assume we need to evaluate it each time and incur $$2$$ units to the cost we are trying to minimize.

Now I have some idea for this problem and to help explain my idea, let us consider a simple deterministic algorithm with the pseudocode written below. We will loop from the start of the string to the end, checking all meaningful 5 bit substrings. If we ever find a $$1$$ bit as we loop over a 5 bit substring, we know that substring cannot be $$00000$$ but if we start at the bit right after that $$1$$ bit, there might be one. Thus, we update our new starting point to the position right after that $$1$$ bit and then start back up again from there.

On input $$s$$:

1. set $$i leftarrow 1$$
2. while($$i leq (n-4)$$):
• set $$b leftarrow text{true}$$
• for($$j = 0$$ to $$4$$):
• if( $$a_{i+j} = 1$$ ):
• $$i leftarrow (i+j+1)$$
• $$b leftarrow text{false}$$
• break for loop.
• if( $$b = text{true}$$ ):
3. return FALSE

My idea for a Las Vegas algorithm was to do the same algorithm but slightly modify it by making the inner loop performed in random order, making the pseudocode now be

On input $$s$$:

1. set $$i leftarrow 1$$
2. while($$i leq (n-4)$$):
• set $$b leftarrow text{true}$$
• for($$j = 0$$ to $$4$$ in random order):
• if( $$a_{i+j} = 1$$ ):
• $$i leftarrow (i+j+1)$$
• $$b leftarrow text{false}$$
• break for loop.
• if( $$b = text{true}$$ ):
3. return FALSE

The positive thing going for this algorithm is that if there exists a $$1$$ bit in the 5 bit substring we are looking at in the inner loop, we will find it in at most $$3$$ loop iterations in expectation. However, if I define a bit string $$s$$ to be
$$s = 100001 100001 100001 cdots 100001$$
then the algorithm should require looking at $$6$$ bits (potentially some of the same ones multiple times) in expectation to get past each $$100001$$ substring. This implies on this input we will go to see the value of $$n$$ bits (the number of distinct bits seen may be less) in expectation before we answer the question of if $$00000$$ is contained in $$s$$. Thus, this algorithm does not seem sufficient.

Does anyone have any thoughts on how to approach this problem or think we should actually be worried about the number of distinct bits we evaluate? If yes to the latter, then I could potentially see other ways to tackle this problem.