I have looked at a few papers on this such as this, and it basically says that full text search is all about English with few resources for languages without spaces between words (like Chinese or Tibetan), and little for languages with complex morphologies like Arabic.
However, when I do a search on Google for different languages, they show matching results. How are they doing this?
Above I am searching for very popular keywords (key texts in those languages), but still, they are finding relevant results. Searching for word fragments, I find matches (in bold) too:
What is Google potentially doing (at a high, theoretical level) to accomplish this? Are they just doing character grams and statistical analysis? Or is something more fancy going on? Wondering what the general state of the art is.
Other sites like bible.com seem to have full-text search across 900+ languages. For example, ᐊᒡᓔᑦ ᐃᑦᔪᕐᖕᓁᑦᑐᑦ. What are they doing?