Machine translation

Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation (MAHT) or interactive translation) is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another.

On a basic level, MT performs simple substitution of words in one language for words in another, but that alone usually cannot produce a good translation of a text because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with corpus statistical, and neural techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies.

Current machine translation software often allows for customization by domain or profession (such as weather reports), improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standard text.

Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are proper names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is (e.g., weather reports).

The progress and potential of machine translation have been debated much through its history. Since the 1950s, a number of scholars have questioned the possibility of achieving fully automatic machine translation of high quality, first and most notably by Yehoshua Bar-Hillel. Some critics claim that there are in-principle obstacles to automating the translation process.

Translation process

The human translation process may be described as:

Decoding the meaning of the source text; and
Re-encoding this meaning in the target language.

Behind this ostensibly simple procedure lies a complex cognitive operation. To decode the meaning of the source text in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the grammar, semantics, syntax, idioms, etc., of the source language, as well as the culture of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the target language.

Therein lies the challenge in machine translation: how to program a computer that will “understand” a text as a person does, and that will “create” a new text in the target language that sounds as if it has been written by a person.

In its most general application, this is beyond current technology. Though it works much faster, no automated translation program or procedure, with no human participation, can produce output even close to the quality a human translator can produce. What it can do, however, is provide a general, though imperfect, approximation of the original text, getting the “gist” of it (a process called “gisting”). This is sufficient for many purposes, including making best use of the finite and expensive time of a human translator, reserved for those cases in which total accuracy is indispensable.

This problem may be approached in a number of ways, through the evolution of which accuracy has improved.

MT Systems

Generic MT usually refers to platforms such as Google Translate, Bing, Yandex, and Naver. These platforms provide MT for ad hoc translations to millions of people. Companies can buy generic MT for batch pre-translation and connect to their own systems via API.

Customizable MT refers to MT software that has a basic component and can be trained to improve terminology accuracy in a chosen domain (medical, legal, IP, or a company’s own preferred terminology). For example, WIPO’s specialist MT engine translates patents more accurately than generalist MT engines, and eBay’s solution can understand and render into other languages hundreds of abbreviations used in electronic commerce.

Adaptive MT offers suggestions to translators as they type in their CAT-tool, and learns from their input continuously in real time. Introduced by Lilt in 2016 and by SDL in 2017, adaptive MT is believed to improve translator productivity significantly and can challenge translation memory technology in the future.

There are over 100 providers of MT technologies. Some of them are strictly MT developers, others are translation firms and IT giants.

MT Approaches

There are three main approaches to machine translation:

First-generation rule-based (RbMT) systems rely on countless algorithms based on the grammar, syntax, and phraseology of a language.
Statistical systems (SMT) arrived with search and big data. With lots of parallel texts becoming available, SMT developers learned to pattern-match reference texts to find translations that are statistically most likely to be suitable. These systems train faster than RbMT, provided there is enough existing language material to reference.
Neural MT (NMT) uses machine learning technology to teach software how to produce the best result. This process consumes large amounts of processing power, and that is why it’s often run on graphics units of CPUs. NMT started gaining visibility in 2016. Many MT providers are now switching to this technology.

A combination of two different MT methods is called Hybrid MT.

Availability: API, Cloud, Server, Desktop

Google, Microsoft, IBM, Amazon, Yandex, and many others run MT software on their own infrastructure and provide it as a Cloud API service, priced per symbol. For example, it costs $20 to translate 1 million characters with Google Translate. In contrast, developers of customizable MT, including Systran and Promt, offer server and desktop products priced per license.

In professional translations, MT is most often integrated into the CAT-tool. The human linguist can pick a suggestion from MT as they go through the text, if they don’t find a better match in the translation memory.

Build Your Own MT Engine

There are open-source toolkits anyone can use to build their own engines for any domain and language combination. The most popular baseline software are: Moses for SMT, OpenNMT for Neural and Apertium for RBT. Training statistical and neural engines requires a large collection of parallel texts in two languages. Some organizations such as TAUS have made a service out of providing baseline data, which companies can further expand by adding their own specialist translations.

Evaluating MT Quality

Translation companies and departments typically evaluate MT quality by the effort it takes for a human to post-edit the output. It is often measured in pages per hour, or in the number of key strokes per segment.

Specialists training MT engines rely on automated tests and metrics. They are better suited for A/B testing and experimentation and show the impact of the tiniest changes, where humans might not notice the difference.

The mainstay metric for auto-testing is called BLEU. “Bilingual evaluation understudy (BLEU)” shows how closely MT translation corresponds to human translation of the same text. It compares parallel translations and produces a score between 0 (worst) and 1 (best). While BLEU scores are widely used by MT researchers, they can be manipulated, and it takes a specialist to make sense of results.

Other MT quality metrics include METEOR, ROUGE, HyTER, and NIST. Quality metrics are the focus of the QT21 program supported by GALA.

Ethics for Translation Providers using MT

Confidentiality – Content translated by free MT platforms such as Google Translate and Microsoft Translator is not confidential. It is stored by the platform owners and may be reused for later translations.

Notifying the Client about MT Use – It’s a point of debate in the industry if a translation company should notify clients about use of MT on their projects. Many pundits are in favor of informing the customer of MT usage and others may not disclose the use of MT. Be sure to ask your provider if you have questions about MT usage.

What content is machine translation suitable for?

Machine translation can be a good choice for content that is either not mission critical or customer facing such as internal communications but can, in some cases, be used for large volumes of content with lots of repetition such as user-generated content, video transcripts or product descriptions.

Machine Translation service can be integrated with following applications:

Machine translation