What: Cross-linguality and machine translation without bilingual data
Where: BCBL auditorium
Who: Eneko Agirre
, Professor at the University of the Basque Country, Member of the IXA Natural Language Processing group and the HiTZ research center, San Sebastian, Spain.
Machine translation is one of the most successful text processing applications. Current state-of-the-art systems leverage large
amounts of translated text to learn how to translate, but is it possible to translate between two languages without having any bilingual data? In this presentation we will show that this is indeed the case. We will first map the word embedding spaces of two languages to each other, with and without seed bilingual dictionaries. This allows to produce accurate bilingual dictionaries based on monolingual corpora alone, with the same quality as supervised methods. Based on these mappings, it is then
possible to train machine translation systems without accessing any bilingual data.