Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

New neural machine translation architectures are experimenting with pairs of neural encoders / decoders, one pair for each language and a shared language independent vector space for the meaning of all words:

http://arxiv.org/abs/1406.1078

http://arxiv.org/abs/1409.3215

http://arxiv.org/abs/1410.8206

So the total number of models is still linear with the number of languages.

I do not know whether this new generation of translation models is leveraged by the google translation app though.

Also pairs of languages for which their are big amount of parallel training data will still be favored.



> Also pairs of languages for which their are big amount of parallel training data will still be favored.

wouldn't bible translations help?


It might but:

- the vocabulary and topics covered in the bible is quite different from today's written and spoken text, especially phone discussions or social network messages.

- other aligned corpora such as http://www.statmt.org/europarl/ are much larger than the bible (several millions of tokens for most pairs vs less than 1 million for the Bible)

Agreed that http://www.statmt.org/europarl/ does not cover non-European languages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: