Transitioning entirely to neural machine translation

GuiA · on Aug 3, 2017

Will be interesting to see how it performs in the wild.

Here's a sample sentence that I've never seen any automatic translator get right:

"My cousin and her wife"

Any human would infer from the context that my cousin is a woman, in a same sex marriage. Yet Google, who also uses fancy deep learning, gives us, for Spanish:

"Mi primo y su esposa"

This should be "Mi prima y su esposa". And that's just for Spanish, a language relatively close to English. With more convoluted examples and more distant languages, it still breaks down pretty fast.

See also: examples of gender bias when using deep learning to translate from languages with gender neutral pronouns:

https://twitter.com/amuellerml/status/799658925326995456

https://twitter.com/nobody_indepth/status/799700696572526592

schoen · on Aug 4, 2017

I've been looking a little bit at anaphora resolution in AI and there's the trouble that your phrase could appear in many contexts where Google's translation would be completely correct! If "cousin" is the antecedent of "her" (which is the only possibility if this phrase occurs in isolation), then Google's translation is clearly wrong. But if there is a preceding or following sentence which mentions a third person, Google's translation can be correct because "her" can then refer to that person.

For example "my cousin and her wife think that Sarah has good taste in ice cream": here one likely anaphora resolution is "my cousin and Sarah's wife think that Sarah has good taste in ice cream". Or "when Sarah got married, she invited my cousin to the wedding; my cousin and her wife turned out to have gone to college together" (again "my cousin and Sarah's wife turned out to have gone to college together").

Anaphora resolution in general is one of the hardest problems for machine translation because it appears to require so much knowledge about the world to do it as well as human beings do. But also, different resolutions can be correct (or maximum-probability) in different contexts depending on the additional information! For instance, there's the Winograd Schema structure where a single pronoun would be interpreted as referring to different people depending on the surrounding context (but not grammar). Winograd's classic example was

The city council members refused the demonstrators a permit because they feared violence.

The city council members refused the demonstrators a permit because they advocated violence.

Disturbingly for machine translation, in the former sentence "they" refers to "the city council members", while in the latter sentence "they" refers to "the demonstrators", even though the syntax of the two sentences is identical!

This, in turn, means that if a translation task required knowing the antecedent of "they" in "The city council members refused the demonstrators a permit because they", the translation task would have no unique solution because the antecedent is ambiguous. Formally this is also true of every reference to, for example, family members when one language marks gender and another doesn't, even if there is a likely resolution offered by the local context, as in your sentence. There is no unique translation available. Finding the one intended by the speaker will require more context, while even finding the one that other speakers find most probable with limited contexts is sometimes among the most challenges AI problems today.

gok · on Aug 4, 2017

This is an excellent example where phrase-based statistical machine translation is troublesome; the language model score of "his wife" is going to be dramatically larger "her wife" in any world where there are a lot more hereto married people. Neural MT should have a much better shot at getting it.

toomanybeersies · on Aug 3, 2017

I assume this is a case of implicit bias from the training materials used.

hasenj · on Aug 4, 2017

Actually as a human I did not immediately infer the lesbian relationship part.

erik14th · on Aug 4, 2017

Works from english to portuguese in google, maybe because they have more data? Considering this example alone, the logical difference between english/spanish and english/portuguese are pretty much the same: gender is made explicit in the pronoun in spanish/portuguese(primo/prima) in contrast with english that relies on an adjective(her). About ambiguity, biases, well, what the "machine" learns are pretty much biases, meaning it's biased by the data it was trained on.

tmsldd · on Aug 3, 2017

using google translation to German gives "meine Cousine und ihre Frau".. as you cited it might work just because English/German are close languages.. But, yeah, DL is just a fancy (and incomplete) mapping.

star-trek-fleet · on Aug 3, 2017

Is there anything isn't fancy mapping?

Use that term to down play the effectiveness of the approach seems quite surprising. Consider human needs years of study to do the same thing, which usually also makes mistakes.

Even for the example, without much analysis, it's very easy to miss the part.

tmsldd · on Aug 4, 2017

Don't get me wrong... but If it makes you feel better, I use DL for most of my research / projects. Well, after 7 years dealing with DL I couldn't really find a better single word to describe it.

visarga · on Aug 4, 2017

How's "Fancy context dependent matching"? More dignified, right /s

In this case the context dependence part wasn't up to par, so it was just fancy mapping.

AndrewKemendo · on Aug 3, 2017

This is much more impressive than I think it will get credit for.

I mean really, having an always on, immediate access massive NLP DNN system translating every piece of text from any language on the worlds largest platform is a staggering feat.

Facebook has the most impressive applications of ML right now in my opinion. They have Yann Lecun to thank for that (and Mark for recruiting him).

amelius · on Aug 3, 2017

> having an always on, immediate access massive NLP DNN system translating every piece of text from any language on the worlds largest platform is a staggering feat.

Too bad that they also have an always on system that's tracking my doings and whereabouts. Seriously, we need to stop applauding these companies. Sad to see that LeCun and colleagues don't care to find an employer with more noble goals.

visarga · on Aug 4, 2017

I avoid FB as a social network, but I love their open source projects such as those in ML (PyTorch) and web design (React). Weirdly, I feel better working with FB products than the equivalent from Google (Angular and TensorFlow).

It seems that approval for the behavior of a company and it's open tech stack don't need to be correlated.

amelius · on Aug 4, 2017

Is that a rational argument or an emotional argument?

tanilama · on Aug 4, 2017

Do u have a list of such noble companies to offer?

jawbone3 · on Aug 4, 2017

Gapminder, amnesty, msf, save the children... wait did you ask for ad companies like fb and google that sell clicks but had a noble agenda? I think the mercenary nature of ads precludes most moral stances in deference to the client.

whatrusmoking · on Aug 3, 2017

??? Google's been doing this at scale for a while.

gwern · on Aug 3, 2017

Google switched Google Translate over just last year, November 2016. Considering Facebook is developing all this internally and the scale they operate at and the presumable hardware disadvantages (does FB have any equivalent of Google's TPUs?), switching over in production so quickly is impressive.

kuschku · on Aug 3, 2017

Only for very few language pairs, though. But for those translation massively improved, especially for German the old solution was basically unusable.

dmcy22 · on Aug 3, 2017

Google only supports Neural Machine Translation for a subset of languages. Whereas as Facebook is doing it for 2,000 translation directions.

Retric · on Aug 3, 2017

45 languages * 44 other langues ~= 2000 translation directions. There are ~4,500 languages with more than 1,000 speakers ~4,500 * 4,499 ~= 20,254,500 language pairs so that's a very long way from every language.

hood_syntax · on Aug 3, 2017

Frankly though, I imagine the vast majority of those languages are unappealing targets in terms of return on investment.

flamedoge · on Aug 4, 2017

and much smaller training data..

actuator · on Aug 3, 2017

I guess they really don't need to cover all of them though as most of the people speaking a language with low number of speakers would be able to converse in other mainstream languages as well. I can speak four languages but I rarely converse in two of them as speakers of the two languages are usually comfortable with a mainstream language as well.

rjeli · on Aug 4, 2017

The most obvious thing to me is to use an intermediate language - I assume this has already been discussed extensively, anyone have references?

edit: I should have scrolled down https://techcrunch.com/2016/11/22/googles-ai-translation-too...

jacquesm · on Aug 4, 2017

Even Swedish is barely in the top 100 with 8.7 million speakers.

riku_iki · on Aug 3, 2017

Google may also have higher bar for translation quality..

zitterbewegung · on Aug 3, 2017

Google was doing a statistical method of translation for a long time for Google Translate. Last year they switched over to Neural Machine Translation.

dmcy22 · on Aug 3, 2017

Impressive indeed. Would you happen to know if Facebook is able to translate without using English as an intermediate language? For example, from Dutch directly to French, without needing to go from Dutch to English to French.

zlynx · on Aug 3, 2017

Reminds me reading about Google's new translator. It apparently uses a unique internal language it developed during training. All the languages it knows go through this internal language / neural pattern.

From what I recall reading, Google has people researching this internal language to see if they can discover any new interesting things about human thought.

https://techcrunch.com/2016/11/22/googles-ai-translation-too...

dmcy22 · on Aug 3, 2017

Interesting, thanks for sharing!

etiam · on Aug 4, 2017

Isn't that what they are doing?

My impression of the pure DL approaches is that the preferred way to go about it is to map from each available language example, via a common latent representation, to every other available language example. The common currency is not English but a vector space representing something much more akin to the meaning underlying the statements.

landon32 · on Aug 4, 2017

I don't know Facebook's specifics, but typically in research they will train a wide variety of examples, e.g.

English -> French French -> English English -> German French -> German

tanilama · on Aug 3, 2017

Google created GNMT and put it into production first, then improve it so single model can tranlsate multiple languages.

peterburkimsher · on Aug 4, 2017

I made Pingtype, a site to translate Chinese to English word-for-word.

https://pingtype.github.io

I also made an English-to-Chinese version, https://pingtype.github.io/english.html

One thing that struck me is that even the largest online dictionaries (Oxford, Wiktionary) are totally inadequate. I'm continually adding new words. Another thing I noticed is that spaces are not always between words! Chinese doesn't have spaces at all, so I wrote a word spacing tool. When I rewrote the app for English, I thought I could use the space character, but I can't.

Many verbs wrap around nouns e.g. "put [the phone] down". The source dictionary data has a definition for "put something down", and my program has support for a special word: "something", which causes it to look ahead for the second half of the phrase.

Pingtype doesn't reorder the sentence, because it's intended for education. But it's easy to train, unlike machine learning models.

toomanybeersies · on Aug 4, 2017

I don't think that even on a trivial level it's possible to word-for-word translate most languages, except the most closely related ones, since they have very different language structures.

rpmcmurphy · on Aug 4, 2017

And then there is the German word backpfeifengesicht, a personal favorite along with schadenfreude.

visarga · on Aug 4, 2017

You can still solve those with a dictionary of translations and for the missing words, a sub-word embedding system that gleans the meaning from the phonetic form.

visarga · on Aug 4, 2017

I really love the triple layout - hanji, phonetic, translation. Could you do that on Japanese as well?

toomanybeersies · on Aug 3, 2017

It's always interesting to compare machine translation of French to my (albeit limited) translation of French. I tend to prefer cognates, even if they're a bit obscure, to more colloquial translations, since I think it preserves a bit more of the original text.

For instance, "Drôle" in French is usually translated to "Funny", but I would translate it to "Droll". I also try to keep the sentence structure closer to the original, which does make my translations look like they were written by Shakespeare.

Something that I really like in regards to the implementation of FB translation is that I can select languages that I don't want to be automatically translated. For French, German, and Spanish, I'd prefer to at least have a crack at it before translating it, but for other languages like Turkish, there's no point.

One thing I don't like so much though is that for automatically translated languages, they replace the original text and don't make it super obvious that it's a translation. I do feel like there's a bit of language colonialism going on with that.

therein · on Aug 3, 2017

Native Turkish speaker here. Very impressive, especially since the sentence they display in the post is actually quite colloquial in its use of the language.

otto_ortega · on Aug 4, 2017

Is there a more detailed example of how such neural systems work?

Like, what are the inputs? how the architecture of the network looks like?

I ask because sentences could be of any length, so I'm not sure what's feed to the model.

My only experience with NLP is setting a toy text classifier using naive-bayes.

juanpino · on Aug 4, 2017

You can read more about it in https://arxiv.org/abs/1409.0473

make3 · on Aug 3, 2017

I'm surprised they weren't already fully neural. NTM (neural machine translation) has been much better than everything else for a while now.

landon32 · on Aug 4, 2017

I'd imagine part of it is just having to test all the edge cases-- for every language is their full neural better?

That, and also building a good pipeline for it to work at scale.