Tesi etd-07102024-094422

Tipo di tesi

Tesi di laurea magistrale

Autore

HIMMICHE, AIDA

URN

etd-07102024-094422

Titolo

Fine-tuning Generative Adversarial Networks for co-speech body gestures of a humanoid social robot

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING

Relatori

relatore Prof. Cimino, Mario Giovanni Cosimo Antonio
relatore Galatolo, Federico Andrea
relatore Cominelli, Lorenzo
relatore Greco, Alberto

Parole chiave

co-speech gestures
Conditional Generative Adversarial Networks (cGAN)
Generative Adversarial Networks (GAN)
human-robot interaction
humanoid robot
machine learning

Data inizio appello

26/07/2024

Consultabilità

Non consultabile

Data di rilascio

26/07/2094

Riassunto

This thesis explores the integration of natural co-speech gestures in humanoid social robots to improve the realism and effectiveness of human-robot interactions, with the objective of developing a gesture generation system that produces synchronized and contextually appropriate gestures aligned with speech. This work further contributes towards the area of HRI by presenting a machine learning focused method addressing the limitations of predefined-scripts, data-driven and probabilistic approaches previously proposed. The primary challenges addressed include the creation of a comprehensive dataset from the Trinity Speech-Gesture dataset by aligning the body motion and the speech data, robust machine learning model training after the due preprocessing of said data, and extensive performance optimization. The focus is on the latter step; fine-tuning Generative Adversarial Networks(GAN) with a generator-discriminator architecture, and Conditional Generative Adversarial Network (cGAN) with an encoder-decoder architecture and a Sequence-to-Sequence (Seq2Seq) model to generate suitable gestures from given speech data. Results showed improvement in the gesture predictions with a high test R² score for the cGAN model despite the task complexity. The naturalness evaluation of the model's raw outputs was done by visualizing the generated gestures on the 3D modeling software Blender due to their difficulty in interpretation.

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-07102024-094422