Faculty of Informatics and Statistics, Department of Information and Knowledge Engineering (DIKE)

Date and time: November 26 2009 (10:30 – 12:00). Non–standard date or time!

Room: 403 NB


Lexical Association Measures and Collocation Extraction


  • Pavel Pecina, ÚFAL, MFF UK Praha

We present an extensive empirical evaluation of collocation extraction methods based on lexical association measures and their combination. The experiments are performed on a set of collocation candidates extracted from the Prague Dependency Treebank with manual morphosyntactic annotation. The collocation candidates were manually labeled as collocational or non-collocational. The evaluation is based on measuring the quality of ranking the candidates according to their chance to form collocations. Performance of the methods is compared by precision-recall curves and mean average precision scores. Further, we study the possibility of combining lexical association measures and present empirical results of several combination methods that significantly improved the state-of-the art in this task. We also propose a model reduction algorithm significantly reducing the number of combined measures without a statistically significant difference in performance.

