AI Interpreting
People often ask me whether my work as an interpreter will soon be taken over by Artificial Intelligence (AI). After all, translators and interpreters are often found at the top of the list of jobs most affected by AI. I find this a little worrying, but at the same time I am fascinated by the progress we have seen in the past few years. So it’s time to find out what AI is actually capable of.
I was able to attend a presentation of an AI interpretation system, which is currently hailed as the “best there is”.
The setup:
The system was asked to simultaneously translate a speech from German into English, French, Spanish and Italian. The text chosen was the script of a General Annual Meeting. This is a pre-written text, which was read out in clear and accent-free German. As a listener, I was able to select subtitles, or listen to the sound, in any of the languages.
The output:
I am an English interpreter, so I was particularly interested in the English translation rendered by the AI interpreter. The German-English language combination can be expected to be comparably good because the volume of training data is considerably larger than for the other combinations tested here.
The voice of the AI interpreter was male and spoke with an English accent. So far, so good. Listening was extremely cumbersome right from the start, though. The AI interpreter tries not to leave out a single word – which means that it has to speak very fast indeed. The time delay between the original voice and the interpreter was much bigger than it is with a human interpreter. Apart from the speed, the fact that there were no pauses where there should have been – but pauses in unusual places – made understanding difficult. Word stress, pauses, variations in speed and volume of speech were all missing. These are acoustic cues that help convey meaning beyond words. Once I had stopped feeling dizzy, I started concentrating on the actual content of the speech. It was possible to have a vague idea of what was going on, but no more than that. There were some hilarious errors, such as “we have to put a tooth on”, which is the literal translation of a German idiom that means notching things up a gear. For the most part, all I heard were sequences of words without meaning.
The process:
AI interpreting is a process that consists of several steps:
- Step 1: Speech recognition: Even though our speaker spoke clearly and without an accent, the automatic speech recognition struggled. This is a serious problem, since the output of the speech recognition is the basis for the next step, the machine translation.
- Step 2: Machine translation: Machine translation such as with DeepL has some basic problems that become very obvious here:
- The machine is a universalist. But our speech was on financial topics. The AI interpreter would not be able to tell whether the German “Bank” should be translated as “bank” or “bench” because it is “context blind”.
- The AI interpreter translates EVERY SINGLE WORD, but not the meaning. In spoken language, we use a lot of redundancies, unnecessary repetitions, go back to correct ourselves, do not always use correct grammar. The AI interpreter will do the same. Human interpreters smooth out any imperfections, so that ultimately they translate the content and the intention, rather than the words.
- Step 3: AI generated voice. Choosing a pleasant voice and a preferred accent seems to be the easy part. But when we speak, we use a lot of cues that go beyond our choice of words: we adapt and vary the speed, pause, use intonation, vary volume within a sentence – the AI interpreter can’t do any of that.
The verdict:
At today’s level, AI interpreters are not even suitable for simple, non-critical applications. Even though the test I attended did not include any particular challenges, such as a strong accent, grammatical mistakes or cultural references, the output was simply unusable. If there is such a thing as “more unusable”, the translations into French, Italian and Spanish were it. The large group of interpreters who had dialled in for the test agreed that they had not expected the results to be this bad.
The demonstration brought home the complexity of the interpreting process, which is a cognitive performance that depends on words as its building blocks. But building blocks can only be used to construct a stable structure if we actually understand what we are doing. That includes understanding the context, the cultural environment, knowing who the speaker is, who the audience are, and much more.
As interpreters, we do not translate mere words. We are like a filter that ensures that the output matches the speaker’s intention. Of course we need to know our vocabulary. But we also know who we are communicating with and how to do so in an appropriate manner. In other words, interpreting is a deeply human activity.I am fully aware that this is just a snapshot. Immense investments are being pumped into developing the technology. I have no doubt that we will be seeing progress in the next few years. But we still have a long way to go.
If you would like to book human interpreters for you next event, please do not hesitate to contact us.