Date and time: March 30 2006 (10:30 – 12:00). Non–standard date or time!

Room: 403 NB


Information Extraction using Presentation Ontologies


  • Martin Labský, KIZI, VŠE Praha

We describe an approach to information extraction that attempts to integrate diverse sources of extraction knowledge. The aim of our IE system under construction is to perform reasonably well under a broad scale of scenarios, with large differences in amounts of training data, manually specified patterns and document formatting structure. In our approach, the user initially creates a presentation ontology which describes the to–be–extracted objects both from a domain and a presentation point of view. Training data can then be used to improve extraction performance. Initial experience with extracting computer monitor descriptions from heterogeneous websites will be presented.

