Date and time: November 26 2009 (10:30 – 12:00).
Room: 403 NB
Lexical Association Measures and Collocation Extraction
- Pavel Pecina, ÚFAL, MFF UK Praha
We present an extensive empirical evaluation of collocation extraction methods based on lexical association measures and their combination. The experiments are performed on a set of collocation candidates extracted from the Prague Dependency Treebank with manual morphosyntactic annotation. The collocation candidates were manually labeled as collocational or non-collocational. The evaluation is based on measuring the quality of ranking the candidates according to their chance to form collocations. Performance of the methods is compared by precision-recall curves and mean average precision scores. Further, we study the possibility of combining lexical association measures and present empirical results of several combination methods that significantly improved the state-of-the art in this task. We also propose a model reduction algorithm significantly reducing the number of combined measures without a statistically significant difference in performance.