Faculty of Informatics and Statistics, Department of Information and Knowledge Engineering (DIKE)

Date and time: March 10 2011 (10:30 – 12:00).

Room: 403 NB


Information in Czech healthcare documents. Where is it hidden and how to extract it?


  • Karel Zvára, Ústav informatiky AV ČR, v.v.i., a 1. LF UK Praha

Czech healthcare documentation is usually in the form of a free text. I will give a brief overview of possible target structures (electronic health record), code lists commonly used in Czech republic/abroad and of the current state of my PhD thesis. I will show my approach to tokenization of text, morphological analysis using dictionaries (Czech iSpell-derived dictionary, Czech version of MeSH, Czech version of ICD10) and PoS tagging using regular expressions.

Downloads: slides 

