Massively Multilingual: AI, language and how data powers communication | Stillman Translations
Massively Multilingual: AI, language and how data powers communication

AI, language and how data powers communication.

Ever heard of Massively Multilingual? It’s changing the way machine translation works. 

“… perhaps the way [of translation] is to descend, from each language, down to the common base of human communication — the real but as yet undiscovered universal language — and then re-emerge by whatever particular route is convenient.” — Warren Weaver, 1949  

Photo by Joshua Hoehne on Unsplash 

As technology and data bring out the best of each other, we polish and renew all our tools. Amongst them, are language and translation tools. The recent decade has paved the way for the rise of machine learning. You can travel abroad, take a picture of a street sign and ask the system to translate it. But… accuracy is still not top-notch.  

So what’s the sitch? Passionate language experts, like us, are attentive as ever to new models such as what massively multilingual neural machine translation (NLP), a natural language processing approach, has to offer. One of the main goals of machine translation researchers and developers today is to create this single model that supports all languages, dialects, and modalities. Google even has a research team dedicated to building a universal Neural Machine Translation system that translates between every pair of languages.  

What’s taking so long? Data. The data set-up needed is immense. For their experiments, the Google team used 25 billion words of parallel sentences in 103 languages that they crawled from the ‘wild’ web. Not to mention the dataset noise (bad quality) results that come with faulty ‘raw’ web-crawled data, differing degrees of linguistic similarity, and more. 

But first things first, how did we get here? 


When Charles Babbage first proposed the idea of a programmable computing machine in 1834, he imagined it being used to translate the languages of other nations. And 120 years later, it actually happened. New York witnessed the work of the first automatic language translation machine. One that converted brief statements from Russian into English. 

Fast forward to our day and age, and we mostly use statistical machine translation models now. They analyze enormous amounts of existing translations and search for statistical patterns in this input. It used to be word-based, but now it searches for complete phrases. And in turn, they find the most adequate translation. It’s a great step, considering we started with rule-based translation which is more similar to looking into a dictionary and replacing word for word. 

( Read more about this in our previous article: Machine translation: everything you need to know)

How is neural machine translation different? 

Neural machine translation (NMT) uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model. It takes into account the whole input sentence at each step of the way. And it processes multiple languages with this one single translation model. Insights gained through training in one language can be applied to the translation of other languages.  


The TAUS Global Content Summits is a Silcoon Valley event that brings together localization industry experts to share information and jumpstart conversations. Guess what one of the topics was this year? 

The first thing they can all agree on (experts such as the platform owners of Lilt, Unbabel, Transifex, and Smartling) is the importance of data. Data is more important than algorithms. For successful localization and for content to feel as if it were created locally, the raw material is critical. And harder to filter than possibly imagined. This is why today it usually relies on human translation. But finding a human-translation technology balance is the end goal. It would optimize the localization process, which would be beneficial for everyone.  


Just a decade ago, data and AI were niche concepts that were only vaguely understood. But, a rising problem is, popular understanding and all, research shows “everyone wants to do the model work, not the data work”. 

It’s sort of hard to grasp. But multilingual models take an input language and transfer it to a language-agnostic space in space. A land where all languages are mapped and any input phrase with the same meaning points to the same area. But good, applicable data to map and compare is hard to come by. It’s costly and time-consuming. 

Also, it’s unequally distributed. There are high-resource languages, with loads of quality papers and links available. Like French, German and Spanish. And low-resource languages, such as Yoruba, Sindhi, and Hawaiian. And where Data is limited, output quality is decimated. This may not be a problem for common chats about food, or current events. But technically complex contracts, documentation, essays, have an altogether different vocabulary. And mixing the languages does not produce useful results. 

There are people investing though. Facebook considers this model to play a crucial role in the company’s globalization goal. Not only because the user experience will be better, but also it creates a safer internet environment. For example, it plays a critical role in eliminating hate speech in an accurate way, without flagging non-violent cases or brazenly overlooking others.  

Also, at least half of the 7,000 languages currently spoken will no longer exist by the end of this century. Multilingual machine translation could partially prevent this. It could easily extend translation into new languages even when parallel data is unavailable. But we still have a long way to go for this to happen.  


Bilingual or monolingual datasets are a given thing. Now multilingual domain-specific datasets will be a trending topic. Brands strive to communicate globally in a local manner. And community-based platforms such as the TAUS HLP Platform begin to arise. Platforms with tailored datasets openly contributed. Language services providers with years of texts collected, generated, and processed realize they have the necessary input to create highly valuable training datasets.  

And so a new branch of AI approaches, called lifelong learning machines. They will pull and feed data continually and indefinitely into AI systems. It will retain knowledge and selectively transfer it. Researchers at Western University, Canada present this mechanism in their paper called A Deep Learning Framework for Lifelong Machine Learning

Meanwhile, our dedicated team of language professionals is here to help you out with any content you need.