Date and time: October 15 2009 (10:30 – 12:00).
Room: 403 NB
Fuzzy ILP and Semantic Information Extraction from Texts
- Jan Dědek, MFF, UK Praha
We deal with linguistic information extraction from Czech texts from the Web. Our method exploits existing linguistic tools created originally for a syntactically annotated corpus, Prague Dependency Treebank (PDT 2.0). We propose a system which captures text of web-pages, annotates it linguistically by PDT tools, extracts data and stores the data in an ontology. We present methods for learning queries over linguistically annotated data. Our experiments in the domain of reports of traffic accidents enable e.g. summarization of the number of injured people.
Inductive Logic Programming plays one of the most interesting parts in our solution. We also present an ILP based approach for fuzzy classification of textual web reports. Our approach is based on Fuzzy Inductive Logic Programming. Main contributions are formal models, prototype implementation and some evaluation experiments.
Downloads: slides 1