Faculty of Informatics and Statistics, Department of Information and Knowledge Engineering (DIKE)

Date and time: October 15 2009 (10:30 – 12:00). Non–standard date or time!

Room: 403 NB


Fuzzy ILP and Semantic Information Extraction from Texts


  • Jan Dědek, MFF, UK Praha

We deal with linguistic information extraction from Czech texts from the Web. Our method exploits existing linguistic tools created originally for a syntactically annotated corpus, Prague Dependency Treebank (PDT 2.0). We propose a system which captures text of web-pages, annotates it linguistically by PDT tools, extracts data and stores the data in an ontology. We present methods for learning queries over linguistically annotated data. Our experiments in the domain of reports of traffic accidents enable e.g. summarization of the number of injured people.
Inductive Logic Programming plays one of the most interesting parts in our solution. We also present an ILP based approach for fuzzy classification of textual web reports. Our approach is based on Fuzzy Inductive Logic Programming. Main contributions are formal models, prototype implementation and some evaluation experiments.

