AppTek Achieves Top Ranking at the International Workshop in Spoken Language Translation’s (IWSLT) 2021 Evaluation Campaign

MCLEAN, Va., Aug. 4, 2021 /PRNewswire/ — AppTek, a leader in Artificial Intelligence (AI), Machine Learning (ML), Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), Text-to-Speech (TTS) and Natural Language Processing / Understanding (NLP/U) technologies, announced that its direct spoken language translation (SLT) system ranked first among end-to-end systems in the offline speech translation track at the 18th annual International Workshop on Spoken Language Translation (IWSLT 2021) evaluation campaign.

AppTek entered the competition to measure the performance of its end-to-end SLT system against other leading platforms developed by corporate and academic science teams around the world. In this year’s offline speech translation track, where the task was to automatically translate recorded TED talks from English to German, AppTek’s direct speech translation system achieved first place out of seven end-to-end submissions, as measured by the established Bilingual Evaluation Understudy (BLEU) MT evaluation metric.

"We are thrilled with the outcome of these results.  For the first time in company history, AppTek’s direct speech translation system outperformed our alternative cascaded approach," said Evgeny Matusov, Lead Science Architect, Machine Translation, at AppTek. "This marks a major milestone in both our ASR and MT technologies, and reflects the hard work, skill and innovation of our scientists."

Spoken language translation has traditionally been performed through cascaded approaches whereby a speech transcript is first created by an automatic speech recognition system and then translated using an MT system. However, recent advancements in deep learning have provided the possibility to address SLT tasks in a different manner. AppTek’s neural speech translation system offers an innovative solution to directly translate audio files from a source language into text in a target language without an intermediate source language text representation, which helps avoid recognition errors common in cascaded approaches. The speech translation model developed by AppTek scientists leverages not only speech data with respective translations, but also parallel text data, alongside elaborate data augmentation and pre-training methods.

Both cascaded and end-to-end models are evaluated in the IWSLT competition and are judged on the basis of their capability to produce translations close to target-language references. Cascaded approaches have outperformed end-to-end approaches for years, yet the latter have been steadily gaining ground.  AppTek’s end-to-end SLT system ranked third overall against a total of 16 participating end-to-end and cascaded systems.  The full IWSLT 2021 results can be found here.

"The state-of-the-art performance offered by AppTek’s end-to-end SLT system marks another step forward in our mission to deliver the next generation of speech-enabled technologies to the broadcast media and entertainment industry," said Kyle Maddock, AppTek’s SVP of Marketing. "These scientific breakthroughs combined with our TTS technologies are paving the road for faster, more accurate subtitling and automatic dubbing solutions."

AppTek scientists Parnia Bahar and Patrick Wilken will present the details of AppTek’s submission at this year’s online IWSLT conference on August 5-6, 2021.

About AppTek
AppTek is a global leader in artificial intelligence (AI) and machine learning (ML) technologies for automatic speech recognition (ASR), neural machine translation (NMT), and natural language understanding (NLU). The AppTek platform delivers industry-leading, real-time streaming and batch technology solutions in the cloud or on-premise for organizations across a breadth of worldwide markets such as media and entertainment, call centers, government, enterprise business, and more. Built by scientists and research engineers who are recognized among the best in the world, AppTek’s solutions cover a wide array of languages, dialects, and channels. For more information, please visit

Media Contact:
Kyle Maddock