Getting to Equilibrium: Can We Use Machine Learning to Predict Translation Demand?

by Adam LaMontagne 10 May 2016

Machine learning is a powerful tool and if we, the translation industry, want to wield it wisely and to great effect, we need to get smart. keywords- machine learning, translation industry, surge pricing, Uber, predictive models, localization industry, translators, paying customers, crowdsourcing, translation memories, machine translation, demand forecasting, machine learning techniques, localization workflow, localization suppliers, localization consumers, localization teams, content authoring, transportation industry, capabilities of machine learning, limitations of machine learning, data collection, predictive analytics

In a recent NPR news piece Uber Plans To Kill Surge Pricing, Though Drivers Say It Makes Job Worth It, Jeff Schneider, engineering lead at Uber describes how they are using machine learning to hack the problem of supply and demand.

Initially, Uber came up with "Surge Pricing" as a feature designed to help mitigate the phenomenon of demand outstripping supply in a certain area at a certain time. Concretely, that means that when some event causes lots of people to need transportation at the same time in generally the same location, the Uber fare will automatically increase. This increase incentivizes drivers to be in that location, which in turn ensures that enough drivers are on hand to meet the demand. From Uber's Understanding Surge help article:

A variety of circumstances cause fares to surge. For example, heavy rain, local sports events, and holidays can contribute to a temporary increase in demand for rides that requires surge pricing. […] Surge pricing has 2 main effects. Some riders may choose to wait a few minutes or take another form of transportation, causing demand in the area to decrease. Drivers encouraged by surging fares will head to areas where rides are needed most. Once demand for rides returns to normal levels, surge pricing ends.

However, according to the NPR piece, Uber considers surge pricing to be a temporary solution to a supply and demand problem that could be solved with better data, machine learning and predictive models. The idea is to take rider behavior and environmental circumstances data and feed them into an algorithm that can learn to predict surges. Such an algorithm would allow Uber to tell drivers where to be and when, "so the surge pricing never has to happen."

So, can machine learning help us get smarter as well?

In the localization industry (and basically every other industry that I can think of), we are not immune to these same forces of supply and demand. We have supply in the form of translators and we have demand in the form of paying customers, or more directly in the form of consumers or end users.

Many in the industry are already trying to hack supply in order to get more done with the same resources using methods and technologies such as crowdsourcing, translation memories and machine translation. On the demand side, others are trying to bundle localization programs and projects to manage demand by working together.

Meanwhile, all localization stakeholders are trying their best to predict demand in order to ensure that the supply will meet the demand. The common term for this is forecasting. It is a well-known rule that everybody is terrible at forecasting. Traditionally, forecasting is done heuristically — which is to say, based on experience — and humans tend to be fairly bad at remembering enough data points to accurately spot trends.

Enter machine learning. What if we could feed all of the relevant data points: number of projects, number of words, required turnaround time, required quality, number of languages, stage of development cycle, release dates, weather, holidays, available capacity, etc. into an algorithm that could predict when and where the next spike in demand will occur? How would that change the behavior of the localization suppliers? How could it impact pricing and duration? As far as I know, nobody has yet applied machine learning techniques to this particular problem in our industry, but the potential for even a small percentage of success is extremely appealing.

While this challenge of finding equilibrium is universal, there are many other cases specific to localization worthy of exploration. For example, what if we knew, prior to localization, which segments are most likely to produce difficulty (and therefore greater costs) and which segments are most likely to be easily localized (and therefore lower cost)? What if we could predict in which segments TMs and MT are most helpful (or hurtful) to the translator or post editor?

Such predictive power could revolutionize the localization workflow as we know it, allowing localization suppliers to focus the right resources on the right problems at the right time and allowing localization consumers to better manage budgets and time to market. Moreover, armed with the knowledge of what makes a segment easy (lower cost) or difficult (higher cost) to localize, localization teams could pass that knowledge upstream to the content authoring end of the content workflow, taking steps towards erasing the arbitrary line between source and target.

Machine learning is not a silver bullet; it's a tool like any other, and wielded by an amateur it will still only produce amateurish results. The transportation industry — specifically Uber in this case — and many other industries understand this. Machine learning is a powerful tool and if we want to wield it wisely and to great effect, we need to get smart. We need to understand its capabilities, limitations, pros and cons. We need to collect the right data that machine learning needs to consume in order to predict. But most importantly, we need to understand what kind of questions we need answers to.

5 minute read