# Is there a linear-time solution to the minimum window substring problem, provided the characters in the substring must be in order?

Suppose there are two strings, $$S$$ and $$T$$, and we want to find the length $$l$$ of the shortest substring of $$S$$ which contains all the characters in $$T$$, in order. (Assume the length of $$T$$ is bounded, so it doesn’t need to be considered for determining time complexity.)

Here are some examples:

• $$S$$ = “abcabc”, $$T$$ = “abc”, $$l = 3$$, “(abc)abc”.
• $$S$$ = “abcabc”, $$T$$ = “acb”, $$l = 5$$, “(abcab)c”.
• $$S$$ = “ben thinks bananas are the best”, $$T$$ = “bans”, $$l=7$$, “ben thinks (bananas) are the best”

I found a solution to this using recursion which I believe is $$O(n^2)$$.

``````def ordered_moving_window(S, T, include_prefix=False):
# Recursion base case
if len(T) == 0:
return 0

shortest_length = float("inf")

for i in range(len(S)):
if S(i) == T(0):
# Recursively find all the other chars in the string
length = ordered_moving_window(S(i+1:), T(1:), include_prefix=True) + 1

# If this is being called recursively, then we need to include
# all the characters before the first character we found
# in our length calculation
if include_prefix:
length += i

shortest_length = min(length, shortest_length)

return shortest_length

print(ordered_moving_window("abcabc", "abc")) # 3
print(ordered_moving_window("abcabc", "acb")) # 5
print(ordered_moving_window("ben thinks bananas are the best", "bans")) # 7
``````

I believe the worst case for this particular solution is something like $$S$$ = “aaaaaaaab”, $$T$$ = “ab”, where the solution would need to traverse down the list (which is $$O(n)$$) for each “a” in $$S$$ (of which there are $$n$$), leading to a total time complexity of $$O(n^2)$$.

A few years ago, my professor assigned me a version of this problem, and I submitted this $$O(n^2)$$ solution. In my solution writeup, I also mentioned that I didn’t believe an $$O(n)$$ solution was possible. My justification was similar to:

Consider a hypothetical $$O(n)$$ algorithm where $$T$$ = “abc”. For $$S$$ = “ababc”, the algorithm would need to ignore the first “ab”, identify the solution “ab(abc)”, and return $$l = 3$$. However, for $$S$$ = “abac”, the algorithm would need to consider the first “ab” to identify the solution “(abac)” and return $$l = 4$$. Since the algorithm cannot look ahead to figure out whether it should consider or ignore a certain character, it cannot make this decision, and therefore such an algorithm cannot exist.

However, my professor disagreed with me, stating that an $$O(n)$$ solution to this problem is possible.

Since then, I’ve learned of the non-ordered minimum window substring problem, which is similar to the problem I listed earlier, but without the requirement that the characters be in order:

• $$S$$ = “abcabc”, $$T$$ = “acb”, $$l=3$$, “(abc)abc”
• $$S$$ = “ben thinks bananas are the best”, $$T$$ = “bans”, $$l = 5$$, “ben think(s ban)anas are the best”

This can be solved in $$O(n)$$ time complexity by keeping pointers to the start and end of the window, using a map to keep track of the characters in the window, advancing the end pointer when the sequence is not valid, and advancing the start pointer when the sequence is valid:

``````def moving_window_substring(S, T):
# Build a table of the count of each character in T
# (i.e., the chars we need our window to have)
needed_chars = {}
for char in T:
if char in needed_chars:
needed_chars(char) += 1
else:
needed_chars(char) = 1

# Set up the window
start = 0
end = 0
shortest_length = float("inf")
needed_chars_count = len(T)

# Grow the window until we have all the chars we need
while end < len(S):
end_char = S(end)

if end_char in needed_chars:
# If we needed this character, decrement the number of chars we need
needed_chars(end_char) -= 1
if needed_chars(end_char) >= 0:
needed_chars_count -= 1

if needed_chars_count == 0:
# The window is valid, we have all the chars we need
# Start shortening the window until it is no longer valid
while True:
start_char = S(start)

if start_char in needed_chars:
if needed_chars(start_char) < 0:
needed_chars(start_char) += 1
else:
# We need this character, break out of the loop
break

start += 1

shortest_length = min(shortest_length, end - start + 1)

end += 1

return shortest_length

print(moving_window_substring("abcabc", "abc")) # 3
print(moving_window_substring("abcabc", "acb")) # 3
print(moving_window_substring("ben thinks bananas are the best", "bans")) # 5
print(moving_window_substring("ben thinks bananas are the bans", "bans")) # 4
``````

Is there a similar $$O(n)$$-time-complexity solution to the ordered version of the problem?