logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-07102024-094422


Tipo di tesi
Tesi di laurea magistrale
Autore
HIMMICHE, AIDA
URN
etd-07102024-094422
Titolo
Fine-tuning Generative Adversarial Networks for co-speech body gestures of a humanoid social robot
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Cimino, Mario Giovanni Cosimo Antonio
relatore Galatolo, Federico Andrea
relatore Cominelli, Lorenzo
relatore Greco, Alberto
Parole chiave
  • co-speech gestures
  • Conditional Generative Adversarial Networks (cGAN)
  • Generative Adversarial Networks (GAN)
  • human-robot interaction
  • humanoid robot
  • machine learning
Data inizio appello
26/07/2024
Consultabilità
Non consultabile
Data di rilascio
26/07/2094
Riassunto
This thesis explores the integration of natural co-speech gestures in humanoid social robots to improve the realism and effectiveness of human-robot interactions, with the objective of developing a gesture generation system that produces synchronized and contextually appropriate gestures aligned with speech. This work further contributes towards the area of HRI by presenting a machine learning focused method addressing the limitations of predefined-scripts, data-driven and probabilistic approaches previously proposed. The primary challenges addressed include the creation of a comprehensive dataset from the Trinity Speech-Gesture dataset by aligning the body motion and the speech data, robust machine learning model training after the due preprocessing of said data, and extensive performance optimization. The focus is on the latter step; fine-tuning Generative Adversarial Networks(GAN) with a generator-discriminator architecture, and Conditional Generative Adversarial Network (cGAN) with an encoder-decoder architecture and a Sequence-to-Sequence (Seq2Seq) model to generate suitable gestures from given speech data. Results showed improvement in the gesture predictions with a high test R² score for the cGAN model despite the task complexity. The naturalness evaluation of the model's raw outputs was done by visualizing the generated gestures on the 3D modeling software Blender due to their difficulty in interpretation.
File