python – HTML Paragraph Identifier

I have various texts that are inconsistent in how paragraphs are formatted. Some texts use the <P> </P> tags to specify a paragraph, while others use two break tags (<br><br>). Sometimes the user forgets to close a <p> tag too. I need to “normalise” these texts so that they all specify paragraphs in the same way. Below is some Python code I wrote to achieve this goal. The code scans the input for the start or end of a paragraph and puts each paragraph it finds into an array, which I can then output in whatever format the business wants.
I wrote it in Python, but this is not the language the actual solution will be implemented in. The language is so obscure and old that there is no point in posting it here. This language doesn’t even have regular expressions, which is why I do not use regular expressions in the Python code. The plan is to translate this python code into the obscure old language we use at work.
The array PARAGRAPH_DELIMITERS contains all of the strings that specify the the start or end of a paragraph.

PARAGRAPH_DELIMITERS = ("<p>","</p>","<br><br>","<br></br><br></br>", "<br/><br/>")

class Input:
    def __init__ (self,html):
        self.pos = 0
        self.html = html
        self.length = len(html)

    def get_next_char(self):
        return self.html(self.pos)
    
 
def get_paragraphs(input):
    paragraphs = ()
    paragraph = ""
    while(input.pos < input.length):
        if new_paragraph(input):
            # We found a new paragraph, add it to the list of paragraphs.
            if paragraph != "":
                paragraphs.append(paragraph)
                paragraph = ""
        else:
            paragraph += input.get_next_char()
            input.pos += 1
     
    if paragraph != "":
         # We found a new paragraph, add it to the list of paragraphs.
        paragraphs.append(paragraph)

    return paragraphs

def new_paragraph(input):
    if input.get_next_char() != "<":
        return False

    lookahead_pos = input.pos
    new_paragraph_found = False
    potential_paragraph_delimiter = ""
    while(lookahead_pos < input.length):
        if input.html(lookahead_pos) == " ":
             # Skip whitespace.
            lookahead_pos += 1
        else:
            potential_paragraph_delimiter += input.html(lookahead_pos).lower()
            match_possible = False
            for i in range(len(PARAGRAPH_DELIMITERS)):
                if PARAGRAPH_DELIMITERS(i).startswith(potential_paragraph_delimiter):
                    match_possible = True
                    if len(potential_paragraph_delimiter) == len(PARAGRAPH_DELIMITERS(i)):
                        new_paragraph_found = True
            if not match_possible:
                break;
            if new_paragraph_found:
                break;
            lookahead_pos += 1

    if new_paragraph_found:
        input.pos = lookahead_pos + 1
    return new_paragraph_found



html = "<p>hello this is paragraph 1<P>hello this is paragraph 2</p><p>hellow this is paragraph 3<br><br>this is paragraph 4"
input = Input(html)
paragraphs = get_paragraphs(input)
print(paragraphs)