logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03262025-180149


Tipo di tesi
Tesi di laurea magistrale
Autore
AMADEI, DAVIDE
URN
etd-03262025-180149
Titolo
Exploring Multimodal Emotion Recognition on Social Media Content
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof. Bacciu, Davide
relatore Prof.ssa Passaro, Lucia C.
Parole chiave
  • affective computing
  • CLIP
  • image-text
  • multimodal emotion recognition
  • social media
Data inizio appello
11/04/2025
Consultabilità
Completa
Riassunto
Multimodal Emotion Recognition (MER) is an increasingly relevant challenge in Affective Computing, particularly in the context of social media. Despite its growing significance, research in this area remains limited. Furthermore, the use of multilabel datasets—while less common—may be more suitable given the inherent subjectivity of emotions. To date, no studies have explored multilabel MER in the context of social media.
This thesis aims to give a first approach to the problem by proposing a multimodal framework predicting the emotions conveyed by the content posted on social media. The multilabel dataset used in experiments consists of image-text pairs taken from tweets. The proposed framework adapts state-of-the-art vision-language models based on transformers, like CLIP, to encode image and text, extracting meaningful information from the input data to ease classification. Data enrichment techniques and adjustments to the model are tested, aiming to improve the performance of the framework.
The usefulness of a multimodal approach is verified by comparing the framework with unimodal models, BERT and ViT, both based on transformers as well. The framework is also compared with LLaVA, a state-of-the-art generative model, doing zero-shot classification. This makes it possible to verify that despite the versatility of LLaVA, a model specifically designed and trained for the task is better.
File