Faculty of Informatics and Statistics, Department of Information and Knowledge Engineering (DIKE)

Date and time: August 30 2012 (13:00 – 15:00). Non–standard date or time!

Room: 336 RB Non–standard venue!


Textual information processing at the SWORD group (CIn-UFPE, Brazil)


  • Fred Freitas, Federal University of Pernambuco, Recife, Brazil

The core works of 15 years of textual information processing research under the SWORD group (Semantic Web and Ontologies Research and Development group) will be described.

The three main axes will be:

- Information gathering systems on restricted Web domains. This kind of text processing is based on the use of domain-related ontologies, employing them as a well-defined and understandable semantic model for the software. The systems to be presented are able to draw inferences about the information present in the Web about these domains, so as to retrieve, classify and extract information from the Web about specific domains. As a proof of concept, we present experiments with good results in two distinct domains, showing the feasibility and portability between domains of the presented solution besides presenting a high degree of reuse during the portability.

- Blog crawlers. Due to the dynamic nature of the Blogsphere, the tasks of collecting and extracting relevant information from blogs have become hard and time consuming. Since the blogs have many points of variability,noise and specific problems to be addressed, such as social tagging, it is necessary to provide applications that can be easily adapted. We present RetriBlog, a framework for the development of blog crawlers. The framework provides services to solve specific problems found in several web applications, such as content extraction and tag recommendation.

- Information extraction for unstructured corpora, based on Natural Language Processing techniques. It includes two systems, one based on adaptative methods and POS-tagging and a new, under-development one whose main concern is extracting relation instances for ontology population based on deep NLP (use of Wordnet, Verbnet, Hearst patterns and co-reference included) in an Inductive Logic Programming environment.

Downloads: slides 1  slides 2  slides 3 

Powered by Resource Description Framework (RDF)