Tesi etd-07032020-115029

Tipo di tesi

Tesi di laurea magistrale

Autore

GAMBINI, MARGHERITA

URN

etd-07032020-115029

Titolo

Developing and Experimenting Approaches for DeepFake Text Detection on Social Media

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

COMPUTER ENGINEERING

Relatori

relatore Dott. Tesconi, Maurizio
relatore Dott. Falchi, Fabrizio

Parole chiave

deepfake
detection
gpt2
social media
text

Data inizio appello

20/07/2020

Consultabilità

Completa

Riassunto

Social media, by connecting people and spreading ideas, have always been the perfect vehicle to shape and alter the public opinion through bots, i.e. agents that behave as human users by liking, re-posting and publishing machine-generated messages that trick people into believing that they are human-written. Even the cheapest text generation techniques (e.g. the search-and-replace method) can deceive humans, as the Net Neutrality scandal (2017) proved. Meanwhile, more powerful generative models have been released, from RNN-based methods to the GPT-2 language model: these deep neural networks can produce "Deepfake Texts", i.e. autonomously generated and non-formulaic texts that resemble human-written contents. Even though Deepfake text can be found in social media, there is still no misuse episode on them, but this generative capability deeply worries: it is necessary to continuously probe the language generator models' abilities in producing human-like SM texts by developing appropriate detectors. In particular, this work aimed at developing a GPT2-based detector and testing it over a dataset composed by both human-written and machine-generated tweets produced by GPT-2, RNN and other non-better specified deep generative models. The results are satisfactory, as our detector can discriminate deepfake texts from human ones with an accuracy of 91%. However, particular settings can be harmful to our GPT2-based detector.

File

Nome file	Dimensione
masterTh...mbini.pdf	9.50 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-07032020-115029