logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-04142020-093303


Tipo di tesi
Tesi di laurea magistrale
Autore
PRIMAVERILI, ANDREA
URN
etd-04142020-093303
Titolo
Development of a Data Collection and Analysis System for Fake News Detection
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
COMPUTER ENGINEERING
Relatori
relatore Marcelloni, Francesco
Parole chiave
  • Fake News Detection Clusteting Bert Streaming
Data inizio appello
05/05/2020
Consultabilità
Non consultabile
Data di rilascio
05/05/2090
Riassunto
The diffusion of social media has greatly increased the spread of fake news: interest in automatic fake news detection tools has increased rapidly over the years and this study aims to address the problem by analysing the phenomenon in a streaming context. This particular scenario imposes limits on the tools that can be used and the data available for the construction of the model. Solutions published in literature require the presence of data not available in a live context, such as the number of reactions to social posts or data extracted from the network of users sharing such news. Frequently, the problem is treated with a supervised approach that applies poorly in this context; in the literature, fact checking sites or the intervention of a supervisor are usually used for the construction of the dataset. Moreover, the realization of a classifier to categorize events into true and fake news requires the knowledge of a groundtruth, which is not available in a streaming context where a news may not yet be born. A library for data collection is proposed to gather data from Twitter extracting the text of the tweets and the content of the articles shared by them via url. The features composing the dataset are extracted directly from the text through Word embedding techniques and the resulting data will be analyzed by proposing some techniques for the construction a model with an unsupervised approach that groups the news through clustering methods based on density.
File