The TAUS Asia Conference in Beijing on March 22-23 is shaping up as exciting. Here’s a sneak preview from my own vantage point.
As my second contribution, I'll be continuing my advocacy for TAUS collection of speech as well as text data. I'll reprise my San Jose talk on the topic, departing from the uncontroversial recognition that big data has been a decisive enabler in the separate development of machine translation and speech recognition. I'll go on to make the case that big (and good quality) data will be equally important in developing the combination of these technologies – speech translation – and that TAUS can play an important role in coordinating data collection and sharing. The hope is to foster a virtuous circle, in which speech translation data can improve speech translation, which can then produce more data. I'll also argue the importance of correction data – naturally enough, since my own company has concentrated on verification and correction of speech translation. I'll consider the economic impact of speech and speech translation data: Who will own it? How will it eventually affect employment? I'll touch on related ethical issues (privacy, security, prejudice), and on the use cases which might be lucrative or important enough to justify TAUS’s data collection efforts. Lastly, I'll consider the types of speech translation data that can be collected (monolingual vs. bilingual; spontaneous vs. scripted; etc.).
From a philosophical viewpoint, I'll risk the suggestion that perceptually-grounded approaches to MT and other NLP can display intentionality, and consequently can provide the foundation for truly meaningful semantics. I think that perceptual grounding of this sort depends on the ability to learn and associate categories, and that this ability, in turn, is a necessary – though not sufficient – condition for higher cognitive processes. To lay the groundwork for these somewhat presumptuous claims, I'll be surveying the role of semantics in machine translation until now in terms of three paradigms: rule-based, statistical, and neural MT. Within the rule-based paradigm, we'll revisit direct, transfer-based, and interlingua-based variants.
Well, that ought to be enough for one conference!
To hear more from Mark Seligman and be a part of the live automatic interpretation experience while hearing what the Asian translation/localization market is up to, join us in Beijing at the TAUS Asia Conference!
5 minute read