logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-04062022-145234


Tipo di tesi
Tesi di laurea magistrale
Autore
FEDELE, ANDREA
URN
etd-04062022-145234
Titolo
Explaining Siamese Networks in Few-Shot Learning for Audio Data
Dipartimento
INFORMATICA
Corso di studi
DATA SCIENCE AND BUSINESS INFORMATICS
Relatori
relatore Dott. Guidotti, Riccardo
controrelatore Prof. Ghelli, Giorgio
Parole chiave
  • audio
  • explainable artificial intelligence
  • one-shot learning
  • perturbation technique
  • siamese neural networks
  • sound classification
Data inizio appello
22/04/2022
Consultabilità
Completa
Riassunto
Traditional Machine Learning models are not able to generalize correctly when queried on samples belonging to class distributions that were never seen during training. This is a critical issue, since real world application might need to quickly adapt without the necessity of re-training. To overcome these limitations, few-shot learning frameworks have been proposed and their applicability has been studied widely for computer vision tasks. Siamese Networks learn pairs similarity in form of a metric that can be easily extended on new unseen classes. Unfortunately, the biggest downside of such systems is the lack of explainability. In this thesis we verify the applicability of Siamese Networks in the context of few-shot learning for audio inputs and we propose a novel method to explain their outcomes. This objective is pursued through a perturbation-based method that quantifies how each input feature contributes to the final outcome by measuring the changes in the mean prediction when such feature is perturbed. We conduct several experiments on two distinct dataset to assess the method ability to explain Siamese Networks outcomes in a C-way one-shot framework. Qualitative and quantitative results demonstrate that our method is able to show common intra-class characteristics and an erroneous reliance on silent sections. Classification weaknesses get also uncovered when audio clips are generated from heterogeneous sources and recorded in different environments.
File