Why Google’s live translation so bad we asked sometimes
Last week, Google silently changed the line of its Pixel Buds support page, which now reads: “Google Translate is available for all headphone-enabled headphones and Android phones.” This feature was previously exclusive to Pixel Buds and Pixel phone owners. And although the company did not go far enough to officially announce it, this small tweak is noteworthy.
To understand why, take a brief look at the history of earphones. Google introduced a wireless glossy earbud last year in a highly anticipated state after mocking the product for what was promised as a tool for change: live translation. Just tapping Buds and speaking “Help Me Speak (Language)” worldwide opens the Google Translate app on your phone – so far, Pixel -. From there, you can speak a sentence, which is translated and written in the target language of your phone, and read. On paper, the new Google technology should have interpreters who are afraid of their jobs.
A live demo of the live translation tool at the product show received a round of applause, but when the device was first launched, the response was questionable: the quality of the translation did not match what the public expected.
Tech Insider has tested it in ten different languages. The machine successfully translated basic questions such as “where is the nearest hospital,” but as soon as the sentences become complicated, or if the speaker has a voice, things get lost in the translation. Our own reviewer has come to the conclusion that live translation has become “a little tricky,” when Google Assistant struggles to understand the words being spoken.
Says senior consumer technology analyst Daniel Gleeson: “Improving the natural language is very difficult. It could be a huge success for Google, and on the day they do, they will be shouting on the roof. “Probably the reason why the Pixel Buds support page update was kept in place, some might say.
Google’s problem does not arise from the translation process itself – in fact, the company has been improving its translation game for the past few years. In 2016, it transformed Google Translate into an AI-driven system based on in-depth learning. Until then, the tool has translated each word separately, and used grammatical rules to make the sentence grammatically correct – thus leading to the translation of fragments we know best. Neural networks, on the other hand, consider the whole sentence and predict what the correct output might be, based on the large data sets of text they have been trained in before. Through machine learning, these programs are able to consider the context of a sentence in order to deliver the most accurate translation.
Integrating machine learning has been part of the goal of Google Brain, a company branch dedicated to in-depth learning. Google Brain has also exploited the use of neural networks in another live translation tool, but that seems to be where everything is wrong: speech recognition. The Google Assistant, of course, is trained in the hours and hours of speaking, when he uses machine learning tools to identify patterns, and finally sees exactly what you mean when asked to translate.
Unless it doesn’t happen. So if Google has been able to use neural networks with a certain degree of success in text-to-text translation, why can’t the assistant consistently see the speech using the same method? Matic Horvat, a natural language research researcher at the University of Cambridge, says it all comes down to the data used to train the neural network.
He states: “The systems that accrue to the federal training database. “Also, the level of speech recognition decreases when you present things you have never heard before. If your training database is chat speech, it will not do very well in detecting talk in a busy area, for example.
Interference: is the enemy of any computer scientist working to improve speech recognition technology. Last year, Google allocated € 300,000 to the Digital News Innovation Fund in London-based Trint, which specializes in automated transcription, albeit with a different algorithm from Google. That algorithm, however, is not ideal for dealing with the basic problem of disruption.
The company’s website, in fact, offers a whole section of recommendations on how to record speech in a clear environment. It also claims to work with a 5 to 10 percent error, but makes it clear that this applies to clear recordings. There are no official recording statistics that include more speech or background sound. “The biggest challenge is to explain to our users that we are ready as the sound they will give us,” said Trint CEO Jeff Kofman. “With powerful echoes, sounds or words, the algorithm will make mistakes.”
Learn more: Google’s Pixel Buds are not just bad, they don’t mean anything
The challenges posed by live speech mean that the training process is very expensive and a long part of creating a neural network. And keeping live translation on a limited number of devices, as Google does with Pixel Buds, certainly does not help the system to learn. The more the speech is able to process it, the more data it can add to its algorithms – and the more the machine can learn to recognize unusual speech patterns. Google did not send an interview spokesperson, but pointed us to a blog post in Google Assistant.
For Gleeson, this is one of the reasons why Google is taking action to expand the feature to additional hardware. “One of the most difficult problems with speech recognition is to gather enough information about certain words, colloquialisms, expressions, all very separated,” he says. “Saving a feature on Pixel has never happened to allow Google to access those regions at high enough rates to process sufficient data. ”
Data collection, however, comes with some disadvantages. The most efficient neural networks are those with a lot of data – but that data is stored in CPUs as the size increases with the amount of information stored. CPUs are far from integrated into mobile devices, making real-time speech processing impossible today. In fact, every time you use a Google Assistant, the spoken word is sent for processing out of the data center, before being restored to your phone. No calculation attempt was made locally, as existing phones could not store the data needed by neural networks to process speech.
Although Google’s assistant is able to complete the process quickly, Horvat said, there is still a long way to go in recognition of speech. One of the company’s challenges today, is to improve transparency in features such as live translation, to find out how to integrate neural network processing into cell phones.
Engineers, in fact, are already working on producing small external chips that are suitable for fine-tuning neural networks, which can be integrated into phones. Earlier this month, for example, Huawei announced an AI chip company that could be used to train neural network algorithms in minutes.
Although Google has its own chip called the Edge TPU, it is designed for business use and not currently for smartphones. For Horvat, this is its Achilles heel: as a software company, Google does not have much control over manufacturers to ensure product development that will make neural local network processing available on all Android devices – unlike Apple, for example.
In the near future, Google may be forced to take small steps to improve its speech recognition technology. And while live translation has drawn a lot of criticism, from industry analyst Neil Shah, colleagues and research director at IoT, Counterpoint and ecosystems at Counterpoint, expanding your reach is a way for companies to prioritize competition: “Google has user access 2 billion Android users, ”he said. “It’s better to measure faster than the competition, and train with a greater flow of input data, as more users start using the latest voice communication on Android phones.”
Daniel Gleeson agrees. Whether the feature update adheres to a gentle mock tone or not, Google’s move will ultimately lead to significant improvements. As with all AI products, the tool needs to be learned – so by definition, it arrives in the market unfinished. He says: “You risk getting people to say that you don’t work as promised, but that is the only way to get there.” Interpreters do not have to worry about their jobs right now.