Google can automatically translate more languages. Translation service Google Translate will nearly double its support for source languages from 133 initially to 243, the data company announced Thursday. The spectrum ranges from languages with large numbers of speakers such as Cantonese or Punjabi (in a version of the Shahmukhi script) to less common dialects such as Southern Low Franconian or the Rio Grande Hunsrück dialect used in southern Brazil.
Advertisement
“We’re now using artificial intelligence to expand the range of supported languages,” Googler writes. Isaac Caswell on the company blog. “Thanks to our large language model PaLM2, we are introducing 110 new languages to Google Translate, our largest expansion to date.” Just last year, Google Translate expanded its vocabulary by 33 languages, i.e. 131 at the time, as well as traditional and simplified versions of the Chinese written language. This time Cantonese has been added separately. It was particularly difficult to train, Caswell reports, “because Cantonese in writing often overlaps with Mandarin.” This makes it not easy to automatically find Cantonese texts and incorporate them into large language models (LLMs).
Romani, which has also been spoken in Germany and Austria for centuries, also posed a challenge for programmers, as it is spread across Europe in many dialects. The result is an LLM that produces a mix that is not spoken of in this way: it is based on Southern Vlax, but also includes elements from the northern branch and the Balkan branches.
A quarter of the new languages are of African origin. There is now a distinct version of Portuguese spoken only by a minority of all Lusophones: the common language version in Portugal. Many creole languages are also included, for example from Jamaica, Mauritius, Papua New Guinea and the Seychelles. Politically explosive newcomers include Tibetans, Ossetians and the language of the Crimean Tatars. Google has a List of newly supported languages Published.
The GUI Challenge
New offers cannot yet be selected in the user interface. This means translation Out of Possible with new languages, but not yet In New languages. Even offline translation packages cannot be found in the app yet. How Google will design the interface for so many languages remains to be seen. The company has set a goal of automatically translating 1,000 languages into each other a day. This would mean it would support about one in seven languages.
Although languages are constantly disappearing, there are still over 7,000 languages. However, making a precise distinction between a language and a dialect is difficult and often (politically) controversial. In Germany, Yiddish, North Frisian, Romani, Saterlandic, Sorbian and South Jutish are particularly at risk of extinction.
Hallucinations: Let’s chew?
Google says it has finally met the Faroe Islands’ long-standing demand to be supported by Google Translate. But the language recognition doesn’t actually work in practice yet. The Faroese phrase “Mær gongst væl, takk, og tygum?” (“I’m fine, thank you, and you?”) Google Translate currently misinterprets as Icelandic. The result is this Dadaist evil: “Please will I groan and let’s chew?”.
Strange things happen when you accidentally or by mistake tell Google Translate the wrong source language. Because Google has not yet taught its AI the courage to make gaps, for example in the form of a “does not understand” error message. Should “Mein Göngst welt, takk, und tigme?” translated from “German” to English, the output reads: “We are counting, well, and thinking?” Conversely, if English is specified incorrectly, the system confuses this German surprise: “What will happen to you, your father and your son?”
(DS)