We introduce a similar but simpler scheme, named Stupid Backoff, that does not generate normalized probabilities. The main difference is that we don’t apply any discounting and instead directly use the relative frequencies (S is used instead of P to emphasize that these are not probabilities but scores)
That’s a quote from the paper that introduces the Stupid Backoff language model. https://www.aclweb.org/anthology/D07-1090.pdf
I’m curious, is there an acceptable/accurate way to convert those frequencies into probabilities?
Frequencies are acceptable for prediction and ranking predictions by their likelihood, but it’s not acceptable if you want to generate text with accurate probability distributions.
Is there an attribute of the algorithm that makes it such that normalizing the frequencies into probabilities isn’t an accurate reflection of the probability dist