ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-09132021-114204


Tipo di tesi
Tesi di laurea magistrale
Autore
MANNOCCI, LORENZO
URN
etd-09132021-114204
Titolo
Bot detection with unsupervised approach based on multivariate time series
Dipartimento
INFORMATICA
Corso di studi
DATA SCIENCE AND BUSINESS INFORMATICS
Relatori
relatore Prof.ssa Monreale, Anna
correlatore Prof. Cresci, Stefano
correlatore Prof.ssa Vakali, Athina
Parole chiave
  • autoencoder
  • multivariate time series
  • bot detection
  • unsupervised learning
Data inizio appello
08/10/2021
Consultabilità
Tesi non consultabile
Riassunto
It is estimated that between 9% and 17% of Twitter accounts are bots that contribute on average between 16% and 56% of tweets and that in 2017 it is estimated that bots are about 15% of all users active on Twitter and 11% of all Facebook accounts in 2019. Most of the proposed approaches in literature are supervised, but their main issue is that they fails to classify previous unseen bots, whose main characteristic is to evolve over the years. So group based approaches are needed and this is the main reason why we propose an unsupervised approach. We present a model, whose main contribution is the use of multivariate time series, which is a novelty in literature, as other unsupervised methods based on temporal features use univariate time series, losing a lot of information. Then, we use an autoencoder to reduce the dimensionality of the time series to a latent space that can be vectorial or an univariate time series. Finally we use a density clustering algorithm to discover groups of bots and genuine users. We can also evaluate the results, being the used dataset (cresci-17) labelled. Moreover, the method allows us to recognize and separate not only bots from the genuine users but also the different types of bots between them.
File