logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-07032020-115029


Tipo di tesi
Tesi di laurea magistrale
Autore
GAMBINI, MARGHERITA
URN
etd-07032020-115029
Titolo
Developing and Experimenting Approaches for DeepFake Text Detection on Social Media
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
COMPUTER ENGINEERING
Relatori
relatore Dott. Tesconi, Maurizio
relatore Dott. Falchi, Fabrizio
Parole chiave
  • deepfake
  • detection
  • gpt2
  • social media
  • text
Data inizio appello
20/07/2020
Consultabilità
Completa
Riassunto
Social media, by connecting people and spreading ideas, have always been the perfect vehicle to shape and alter the public opinion through bots, i.e. agents that behave as human users by liking, re-posting and publishing machine-generated messages that trick people into believing that they are human-written. Even the cheapest text generation techniques (e.g. the search-and-replace method) can deceive humans, as the Net Neutrality scandal (2017) proved. Meanwhile, more powerful generative models have been released, from RNN-based methods to the GPT-2 language model: these deep neural networks can produce "Deepfake Texts", i.e. autonomously generated and non-formulaic texts that resemble human-written contents. Even though Deepfake text can be found in social media, there is still no misuse episode on them, but this generative capability deeply worries: it is necessary to continuously probe the language generator models' abilities in producing human-like SM texts by developing appropriate detectors. In particular, this work aimed at developing a GPT2-based detector and testing it over a dataset composed by both human-written and machine-generated tweets produced by GPT-2, RNN and other non-better specified deep generative models. The results are satisfactory, as our detector can discriminate deepfake texts from human ones with an accuracy of 91%. However, particular settings can be harmful to our GPT2-based detector.
File