Date and time: March 12 2015 (10:30 – 12:00).
Room: 336 RB Non–standard venue!
Revelation of the author's identity using machine learning and stylometry
- Jan Rygl, FI MU Brno
Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed.
Currently prevailing techniques build upon the machine learning approach. Stylometry-based algorithms are used to extract features for machine learning.
In NLP Centre, we have developed Authorship Recognition Tool (ART) for the Ministry of the Interior. Now we are working on the Style & Identity Recognizer (SIR) that solves stylometry-based tasks such as authorship recognition; translation detection; age and gender prediction.
Techniques such as Double-layer machine learning; similarity-based features; and authorship corpora will be presented.
Downloads: slides 1