TAUS Releases V2 of EPIC API, Featuring Considerable Improvements for 5 EU Languages

by Anne-Maj van der Meer 29 Jul 2024

Discover the enhanced TAUS EPIC API V2 for improved translation accuracy in 5 EU languages, reducing post-editing efforts by up to 60%.

The TAUS EPIC API was first released in October 2022. Since then, users of the EPIC API have indicated a savings between 25% and 60% on human post-editing efforts and costs, as well as helping to mitigate risks of bad translation output for high-volume users in an MT-only setting. A generic model is available off-the-shelf and via the API it can be easily integrated in existing platforms and content and translation workflows. This generic model has undergone comprehensive training sessions in order to release V2 of the TAUS EPIC API.

The TAUS Data repository (consisting of 7+ billion words in about 600 language pairs and domains) has been instrumental in training the TAUS Estimate API models. To improve the generic model, the NLP team at TAUS has pulled millions of sentences from the repository in the IT, Healthcare, Commerce, Legal and Business domains for English into French, Italian, German and Spanish and curated high-quality training sets. After extensive training and analysis, the team is happy to report considerable improvements for these languages.

So what are the improvements you can expect when you start using v2 of the TAUS EPIC API? Below are some of the important updates:

V2 is highly sensitive to accuracy in translation, including proper names, numbers, and specific details.
It significantly penalizes additions, omissions, word swaps in translations whenever it changes the meaning or makes sentences syntactically incorrect, while still accounting for natural differences in word order between languages.
It expects similar punctuation between source and target, with some flexibility for language-specific rules.
It can detect and penalize overly formal or informal styles compared to the source text.
V2 does a better job at estimating the quality of short sentences.
It provides a more spread-out range of scores, making it easier to distinguish between different levels of translation quality.
While best for languages it was trained on, it can recognize similar error types in untrained languages (with some limitations) as well.

Examples

V 2.0 notices mistranslation in less obvious cases. Here is a case of polysemy in English, where Spanish has different words depending on context.

Source	Target	Score
To add a cell to a table row, you use the <td> tag.	Para agregar una celda a una fila de la tabla, usas la etiqueta <td>.	0.92
To add a cell to a table row, you use the <td> tag.	Para agregar una celda a una fila de la mesa, usas la etiqueta <td>.	0.67

Often, quality estimations fail to respond to incorrect gradations. Version 2.0 picks up relatively subtle errors (in relation to sentence size and word type).

Source	Target	Score
The Thundercloud Solution for E-Mail includes every option necessary for e-mail storage management designed for medium-sized businesses.	Thundercloud Solution for E-Mail comprend toutes les options nécessaires à la gestion du stockage des e-mails et est conçu pour les grandes entreprises.	0.82
The Thundercloud Solution for E-Mail includes every option necessary for e-mail storage management designed for medium-sized businesses.	Thundercloud Solution for E-Mail comprend toutes les options nécessaires à la gestion du stockage des e-mails et est conçu pour les moyennes entreprises.	0.91

Close-enough translations often go unnoticed for many types of quality estimation, that are also generally known to struggle with short sentences. Version 2.0 makes a clear distinction between 'software' and 'operating system'

Source	Target	Score
hardware and software requirements	Hardware- und Softwareanforderungen	0.94
hardware and software requirements	Hardware- und Betriebssystemanforderungen	0.85

Suggested Thresholds

The quality standard and expectations are of course subjective, so it is up to you and your use case to decide where to draw the line of good and bad quality. However, here are some guidelines from the NLP team to interpret the scores and make decisions when using V2 of the generic model:

Scores above 0.9 generally indicate good translations
0.88-0.9 is a gray area (can be good, might have issues)
Below 0.88 usually indicates at least minor errors
Below 0.8 suggests serious errors
Below 0.7 indicates very poor quality

Version 2 of the TAUS EPIC API was released on 8 July. To switch to V2, please follow the instructions here.

Interested to try out the improved generic model? Sign up for a free trial and get access to 500,000 characters in Sandbox mode.

5 minute read