Google claims to have created a new translation system, known as Translatotron, where it is through mechanisms of artificial intelligence (AI) be able to translate the voice from one language to another directly, without having to convert any text.
The unit specializing in the Artificial Intelligence of the giant in Mountain View, California, announced this new system, which suggests a new alternative model from sequence to sequence with which the need to represent intermediate text in a translation is eliminated.
Translatotron's business It is based on a single process, rather than dividing it into different phases, as is the case with today's translation systems, which rely on text-synthesis mechanisms.
With this system, researchers assure that they can achieve a faster translation rate, in which avoid errors in the speech recognition and translation process, "directly retain the speaker's original voice after the translation".
In this way, as detailed by Google in its official blog, it also manages to reach better management of words that do not translate, like the real names.
How Translatotron works
The Google system uses spectograms with the captured voice signal as a source and generates other voice lines in the language chosen for the translation. It also employs a neuronal voice coder which converts the resulting spectral frame into sound waves with time references.
In addition, an additional mechanism can be added that teaches the characteristics of a person's speech and encodes them to maintain their tone of voice, for later use in synthesizing voice translation.
Throughout the process, Google uses artificial intelligence multitasking targets to predict movements of the source, while generating the translation spectogram.