Is there a linear-time solution to the minimum window substring problem, provided the characters in the substring must be in order?

Suppose there are two strings, $S$ and $T$, and we want to find the length $l$ of the shortest substring of $S$ which contains all the characters in $T$, in order. (Assume the length of $T$ is bounded, so it doesn’t need to be considered for determining time complexity.)

Here are some examples:

  • $S$ = “abcabc”, $T$ = “abc”, $l = 3$, “(abc)abc”.
  • $S$ = “abcabc”, $T$ = “acb”, $l = 5$, “(abcab)c”.
  • $S$ = “ben thinks bananas are the best”, $T$ = “bans”, $l=7$, “ben thinks (bananas) are the best”

I found a solution to this using recursion which I believe is $O(n^2)$.

def ordered_moving_window(S, T, include_prefix=False):
    # Recursion base case
    if len(T) == 0:
        return 0

    shortest_length = float("inf")

    for i in range(len(S)):
        if S(i) == T(0):
            # Recursively find all the other chars in the string
            length = ordered_moving_window(S(i+1:), T(1:), include_prefix=True) + 1

            # If this is being called recursively, then we need to include
            # all the characters before the first character we found
            # in our length calculation
            if include_prefix:
                length += i
            shortest_length = min(length, shortest_length)

    return shortest_length

print(ordered_moving_window("abcabc", "abc")) # 3
print(ordered_moving_window("abcabc", "acb")) # 5
print(ordered_moving_window("ben thinks bananas are the best", "bans")) # 7

I believe the worst case for this particular solution is something like $S$ = “aaaaaaaab”, $T$ = “ab”, where the solution would need to traverse down the list (which is $O(n)$) for each “a” in $S$ (of which there are $n$), leading to a total time complexity of $O(n^2)$.

A few years ago, my professor assigned me a version of this problem, and I submitted this $O(n^2)$ solution. In my solution writeup, I also mentioned that I didn’t believe an $O(n)$ solution was possible. My justification was similar to:

Consider a hypothetical $O(n)$ algorithm where $T$ = “abc”. For $S$ = “ababc”, the algorithm would need to ignore the first “ab”, identify the solution “ab(abc)”, and return $l = 3$. However, for $S$ = “abac”, the algorithm would need to consider the first “ab” to identify the solution “(abac)” and return $l = 4$. Since the algorithm cannot look ahead to figure out whether it should consider or ignore a certain character, it cannot make this decision, and therefore such an algorithm cannot exist.

However, my professor disagreed with me, stating that an $O(n)$ solution to this problem is possible.

Since then, I’ve learned of the non-ordered minimum window substring problem, which is similar to the problem I listed earlier, but without the requirement that the characters be in order:

  • $S$ = “abcabc”, $T$ = “acb”, $l=3$, “(abc)abc”
  • $S$ = “ben thinks bananas are the best”, $T$ = “bans”, $l = 5$, “ben think(s ban)anas are the best”

This can be solved in $O(n)$ time complexity by keeping pointers to the start and end of the window, using a map to keep track of the characters in the window, advancing the end pointer when the sequence is not valid, and advancing the start pointer when the sequence is valid:

def moving_window_substring(S, T):
    # Build a table of the count of each character in T
    # (i.e., the chars we need our window to have)
    needed_chars = {}
    for char in T:
        if char in needed_chars:
            needed_chars(char) += 1
            needed_chars(char) = 1

    # Set up the window
    start = 0
    end = 0
    shortest_length = float("inf")
    needed_chars_count = len(T)

    # Grow the window until we have all the chars we need
    while end < len(S):
        end_char = S(end)

        if end_char in needed_chars:
            # If we needed this character, decrement the number of chars we need
            needed_chars(end_char) -= 1
            if needed_chars(end_char) >= 0:
                needed_chars_count -= 1

        if needed_chars_count == 0:
            # The window is valid, we have all the chars we need
            # Start shortening the window until it is no longer valid
            while True:
                start_char = S(start)

                if start_char in needed_chars:
                    if needed_chars(start_char) < 0:
                        needed_chars(start_char) += 1
                        # We need this character, break out of the loop

                start += 1

            shortest_length = min(shortest_length, end - start + 1)

        end += 1

    return shortest_length            

print(moving_window_substring("abcabc", "abc")) # 3
print(moving_window_substring("abcabc", "acb")) # 3
print(moving_window_substring("ben thinks bananas are the best", "bans")) # 5
print(moving_window_substring("ben thinks bananas are the bans", "bans")) # 4

Is there a similar $O(n)$-time-complexity solution to the ordered version of the problem?