

“Microsoft’s machine translation team has been one of the prominent developers in this area, and basically, that is the state of the art in machine translation at the moment.” “That’s pretty much the best method,” says Chris Manning, professor of linguistics and computer science at Stanford. Instead of just matching common phrases, syntactic SMT breaks up a phrase into individual words and then maps each word over to the other language.Ĭutting up phrases and connecting individual words may sound like a primitive approach, but it’s not. It builds on the phrasal SMT foundation but also understands syntax. Informed phrasal statistical machine translation (syntactic SMT). This is why the Microsoft Research team pioneered a system known as syntactically But the same sentence in Japanese would be subject, object, verb. For example, an English sentence usually runs subject, verb, object. Some of the confusion arises because SMT doesn’t really understand grammar and so can’t shift from the rules of one language to those of another. But if the words in an uncommon phrase are reordered, the system gets confused. For common phrases it can translate that exact phrase across several languages, and even if the words in the phrase are slightly reordered, it still works.

Phrasal SMT excels at memorizing and matching data. The system then spits out the most probable translation. When the machine is asked to translate a new phrase in English, the algorithm calculates the probability that the new English segment of text maps to one of the phrases it knows in German.
#MICROSOFT SKYPE TRANSLATOR CHALLENGES RATE SOFTWARE#
Once it has learned its fill from the n-gram alignment, the software is ready to encounter new, untranslated text. This process teaches the computer what each phrase translates to. If the system is trying to translate, say, English to German, then the n-gram from a text in English is mapped to the n-gram of the equivalent text in German.

The system chops up the text into a collection of small phrases called an n-gram, where n denotes the number of phrases. The data was then fed into a machine-learning pipeline that Microsoft calls phrasal statistical machine translation (phrasal SMT). Among them was a software system the company developed to translate social media musings.īefore turning to social media, Microsoft’s translation system extracted text from published books and Web sources that had been translated from one language to another. So Microsoft Research set about searching for techniques to help close that gap. The gap exists between translating text and translating speech because some of the best machine translation systems today are taught using large volumes of high-quality text, which does not include the awkwardness that speech recognition systems deal with. Suffice it to say, grumpy offspring would be the end product. Consider what would happen if a speech translation system misinterpreted the subtle difference between these two statements: For starters, real speech is peppered with vocalized “ums” and “ahs,” awkward pauses, varying intonations, and vocal stresses, which are all absent in text.
