Faculty of Informatics and Statistics, Department of Information and Knowledge Engineering (DIKE)

Date and time: March 6 2008 (10:30 – 12:00). Non–standard date or time!

Room: 403 NB


English-to-Czech Machine Translation: Should We Go Shallow or Deep?


  • Ondřej Bojar, ÚFAL, MFF UK Praha

The purpose of my talk is to introduce two rather different approaches to machine translation (MT) I\\\'m actively involved with. The first is so-called phrase-based MT where sentences are treated as plain sequences of words--opaque symbols. An input sentence is segmented into \\\"phrases\\\" or rather n-grams and each phrase is translated nearly independently.

A completely different approach is to automatically obtain a deep syntactic structure of the sentence (a tectogrammatical, dependency, tree in our case), decompose the tree into treelets, translating the treelets independently.

Both of the methods rely on large collections of training data--i.e. texts that were previously translated by humans. The fact of shared training and evaluation data allows us to directly compare the performance and strong and weak points of the methods.

Downloads: slides 1 

