This blog posting represents the views of the author, David Fosberry. Those opinions may change over time. They do not constitute an expert legal or financial opinion.

If you have comments on this blog posting, please email me .

The Opinion Blog is organised by threads, so each post is identified by a thread number ("Major" index) and a post number ("Minor" index). If you want to view the index of blogs, click here to download it as an Excel spreadsheet.

Click here to see the whole Opinion Blog.

To view, save, share or refer to a particular blog post, use the link in that post (below/right, where it says "Show only this post").

Why Was "Russia" Translated To "Mordor"?

Posted on 10th January 2016

Show only this post
Show all posts in this thread.

People seem to be surprised by this (reported in this BBC report). They shouldn't be. It is a result of how Google Translate works.

There were complaints that Google Translate was translating "Russia" to "Mordor", Russia's Foreign Minister Sergey Lavrov's surname to "sad little horse" and "Russians" to "occupiers". Apparently this has now been "fixed", but I suspect that there will be more such cases in the future.

Google translate was created using stochastic (statistical analysis) methods. It was fed with huge volumes of documents which had been translated into a number of languages (the original documents were from the EU, where most documents are translated into all the languages of the member states). Since then it has been fed with other translated sets of documents, many from social media and has learned from those users of Google Translate who take the time to suggest improved translations.

One point to note from this is that Google Translate has no idea of the meaning of what it translates. If a lot of its source material translates "Russia" to "Mordor", the software will believe that this is a valid translation, and will not understand the insult.

The other point to note is the part that social media plays in providing learning material for Google Translate. In some ways this is good, in that the software is able to keep up with evolution in language use, can learn about slang and dialect, and is able to cope with text that is not grammatically correct or complete. In other ways this is not so good, such as this case, where viral social media content can warp the software's knowledge base and produce incorrect translations.

People need to understand how the tools that they use work, so that they understand their limitations and potential bias. I use Google Translate quite often, and sometimes I have to spend a lot of time and effort to get a good translation (translating forward and backwards, and adjusting the words and grammar of the starting text to give me a suitable result); sometimes even that fails, and I have to translate using other methods.

Of course, Google Translate could be improved. What I would like is to be able to tell the software that a certain word or phrase is the subject or the object, that certain elements form a list, that certain words form a noun or verb phrase, and to force the translation of a certain word to a particular translated word (which is possible) and have the grammar of the translated sentence updated accordingly (which is not possible). I am sure that this functionality will be included in the future, hopefully soon.