For as long as there has been a translation industry there has been severe competition, first around craftsmanship and later also among tools and technology. Hundreds of thousands of translators and tens of thousands of translation agencies and companies, all claiming to be the best-in-class, unwillingly and unconsciously perhaps did exactly the opposite of what they all strived for. They kept the industry small and fragmented.
So, following up on our Nunc Est Tempus publication from December 2017 in which we pressed individual language service providers to redesign their translation business, we now call on the community at large, the buyers and sellers, the operators and practitioners, the politicians and policymakers, to fix the translation ecosystem.
In this article we speak to the business side of the translation ecosystem. In a follow-on article, we shall speak to the policymakers.
The translation business seems to be doing very well. In the past two years, waves of new investments have helped existing companies expand and start-ups develop new platforms. And yet, as Ofer Shoshan, CEO of One Hour Translations, argues in this white paper, fragmentation remains very high and no company seems to be able to cross the one billion dollar in value mark. By and large, we seem to keep doing things in the way we have always done them. Why is that?
Because old habits die hard. And, because we are all part of the existing ecosystem with its deeply-rooted processes, pricing structures, agreements and practices. If we want to benefit from the new technology breakthroughs, we shall have to change the ecosystem that we all depend on. Without a modern ecosystem in place, innovation will be an uphill battle.
So we would like to make three recommendations to the buyers and sellers, operators and practitioners in the translation business: fix the knowledge gap, fix the operational gap and fix the data gap.
The MT researchers have cracked the code and say that human parity for machine translation is here... or near. Researchers in Artificial Intelligence predict that in 2024 machines will be “better” at translating a text than humans. At the TAUS Annual Conference in Vancouver this year, Chris Wendt, the MT program leader at Microsoft, argued that MT beats human translators already on most points.
A catastrophic mistake made by an MT engine can really upset us. But we tend to be more forgiving for human translation errors. Similarly an accident with a self-driving car makes the headlines, while thousands of car crashes caused by humans every day go unreported. Necip Fazil Ayan, who leads AI and MT at Facebook, wondered during the Quantum Leap Conversation at the TAUS conference, why we are so anxiously holding on to the distinction between MT and human translation. It’s simply translation. We’d like to call this the holistic translation world: every piece of content everywhere can now be translated into the language of the user. In our Nunc Est Tempus book we described the Modern Translation Pipeline as invisible, autonomous and data-driven. But how far are most of us everyday translation practitioners from having such a pipeline in place?
There are a few innovators who are invading the translation industry with a total embrace of AI and MT in their platforms. But many of the stakeholders find themselves in a state of ignorance, denial or passivity. They don’t know (enough) about the technology breakthroughs and the implications they could have for their ambitions. So this knowledge gap is the first thing we need to fix. Not only does it hurt our own businesses, but our lack of progress will stifle innovation and growth across the entire industry. Because we are all connected and dependent on each other.
It all starts with learning. We need to get fluent with the new tools, learn how to work with the input and the output. Finding your way around the Trados and other CAT tool user interfaces is not enough anymore. MT should be part of every translation workflow. The translator becomes a reviewer, an editor, a computational linguist, a data analyst, a transcreator, a content marketeer, or a dialog polisher for chatbots, working with input and output from different technologies. Investments in training and human capital are the sine qua non for the current translation ecosystem to catch up with the times.
Once we master the new tools we need to define and document the new processes in the form of best practices, so that we can deploy them consistently across multi-vendor supply chains, and tweak and optimize them as necessary. Sharing knowledge on an industry-wide scale is the foundation for growth and innovation.
Of course there are already plenty of early adopters of MT, especially among TAUS members, who are beginning to realize benefits with cost savings and efficiency gains of 10% to 20%. But the full potential benefit of the new technology seems to be out of reach, because we are all stuck in an operational gap.
The new technology (MT and AI) allows us to translate and measure everything. Every bit of content that historically remained locked in its original language can be opened for end-users speaking many of the world’s different languages. However, in order for this to happen we need to upgrade our infrastructure. We need to ensure that all systems are connected, and that all content can be submitted to the translation pipeline. Our current operational models, metrics, tracking and pricing systems and agreements are still incompatible with this new reality of holistic translation.
To fix this operational gap, therefore, the industry community needs to get alignment on basic production principles. For instance, a pricing model with fixed word rates may not be the right approach in the Modern Translation Pipeline environment where production outputs per linguist’s hour vary from 50 to 5,000 words and where more and more content may even go through the pipes without being checked by a linguist at all. Fixed unit rates, the way they are used today, can be a source of mistrust, unfairness and risks for every party in the supply chain.
Measurements become crucial. Every step and every action in the global content production workflow needs be tracked and measured. This is the key to optimizing the use of technology and human resources. Dashboards for translation are the new trend. Predictive analysis and quality estimation are the new buzz words. But we must be careful to avoid getting caught in a Babelian confusion. To measure global content production across our supply chains we must make sure that we base the measurements on common metrics and that we are comparing apples with apples. This is where the TAUS DQF metric has already come a long way as a standard for benchmarking internally in organizations as well as on an industry-wide scale. DQF plug-ins are available for an increasing number of translation workflows and CAT tools.
Tracking our global content production is vital to the virtual data loop that we described in the Modern Translation Pipeline chapter. And this leads us to the third recommendation for upgrading the translation ecosystem: fix the data gap.
Let’s not be mistaken: there is no single magic MT engine that translates better than all others. All MT engines available today are similar and make use of algorithms that are quite often available under open source licenses. No, the difference really lies in the data that we feed into the engines. Data is the new oil, and that’s definitely true for the translation world. It is absolutely crucial for every translation operator, from individual translators up to corporate buyers of translations, to have full control over their translation data.
In recent years, data aggregation has been the exclusive domain of MT developers. They gathered their translation data mostly by crawling the web and asking the users of platforms and their customers permission to use their translation data. These methods are cumbersome and error-prone and sensitive to concerns over confidentiality and ownership of data. Besides, the new-generation MT engines need high-quality and more domain-specific translation data, and they can also learn from metadata, such as edits and peripheral project attributes. For the translation ecosystem to catch up with the new technologies we need to fix this data gap.
We believe that the most effective way to achieve this in the holistic translation world is through a data marketplace, where translators, language service providers, translation buyers and MT developers meet to buy, sell and exchange data. Small translation operators, translators and language service providers, may generate sufficient data for their own needs, and use the marketplace primarily to sell data. Larger operators will generate a lot more data in-house and for their customers, but they will always find themselves in situations where new and ‘fresh’ or updated data help to tweak and finetune their engines. The data marketplace can be the place to go to for every stakeholder in the translation industry.
The data marketplace can provide automated features for clustered search, cleaning and anonymization of data. It could also offer standard domain-specific corpora accumulated from data sets that are representative for industry verticals and niches. And a proper data marketplace ensures that all legal and privacy concerns are being addressed and managed properly. See also our blog from March 27 of this year Time to Build a Translation Market.
Fixing the ecosystem for the translation industry in the way described in this article is not very different from what’s been already done in other industries such as the financial and banking sector. Once the infrastructure was upgraded with central data clearing and standard interfaces for tracking and transferring money, financial services received a significant boost and the sector started to grow rapidly.
TAUS has the same vision for the translation industry. Fixing the translation ecosystem will remove the major pressures that fragment our industry. It will unleash oceans of content that need attention from a sector that will rapidly evolve into a plethora of new value-added services. Not business as usual, as Smith Yewell described it in our CEO interview last year.
Translators become reviewers (rather than post-editors), transcreators, data analysts and content marketeers. Language service providers become globalization consultancies. The industry will dissolve itself and become embedded as a function in different verticals to help enable global expansion and cross-cultural mitigation.
7 minute read