ETD

Digital archive of theses discussed at the University of Pisa

 

Thesis etd-07032020-115029


Thesis type
Tesi di laurea magistrale
Author
GAMBINI, MARGHERITA
URN
etd-07032020-115029
Thesis title
Developing and Experimenting Approaches for DeepFake Text Detection on Social Media
Department
INGEGNERIA DELL'INFORMAZIONE
Course of study
COMPUTER ENGINEERING
Supervisors
relatore Dott. Tesconi, Maurizio
relatore Dott. Falchi, Fabrizio
Keywords
  • deepfake
  • gpt2
  • text
  • detection
  • social media
Graduation session start date
20/07/2020
Availability
Full
Summary
Social media, by connecting people and spreading ideas, have always been the perfect vehicle to shape and alter the public opinion through bots, i.e. agents that behave as human users by liking, re-posting and publishing machine-generated messages that trick people into believing that they are human-written. Even the cheapest text generation techniques (e.g. the search-and-replace method) can deceive humans, as the Net Neutrality scandal (2017) proved. Meanwhile, more powerful generative models have been released, from RNN-based methods to the GPT-2 language model: these deep neural networks can produce "Deepfake Texts", i.e. autonomously generated and non-formulaic texts that resemble human-written contents. Even though Deepfake text can be found in social media, there is still no misuse episode on them, but this generative capability deeply worries: it is necessary to continuously probe the language generator models' abilities in producing human-like SM texts by developing appropriate detectors. In particular, this work aimed at developing a GPT2-based detector and testing it over a dataset composed by both human-written and machine-generated tweets produced by GPT-2, RNN and other non-better specified deep generative models. The results are satisfactory, as our detector can discriminate deepfake texts from human ones with an accuracy of 91%. However, particular settings can be harmful to our GPT2-based detector.
File