logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03302022-120253


Tipo di tesi
Tesi di laurea magistrale
Autore
SERAO, GIANLUCA
URN
etd-03302022-120253
Titolo
Development of deep learning models for fight detection in videos
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Gennaro, Claudio
Parole chiave
  • 3D convolutional neural networks
  • fight detection
  • optical flow
  • segment-based sampling
Data inizio appello
29/04/2022
Consultabilità
Non consultabile
Data di rilascio
29/04/2092
Riassunto
Video action recognition has gained much attention in recent years by the research community for its importance in many everyday applications such as human-machine interactions and surveillance. Video action recognition aims to classify human actions using a sequence of still images. This task is not easy: different actions may share similar patterns, making them difficult to distinguish. Moreover, videos pose several issues also from the appearance point of view, like camera motion and illuminance changes. Therefore, it is crucial to jointly model motion and appearance information in this context.

Early methods were mainly based on handcrafted features and motion estimation techniques. Afterwards, with the rise of deep learning, most methods were based on deep networks, such as Convolutional Neural Networks (CNN) and Residual Networks (ResNet). Initially, the most successful methods used simultaneously multiple networks based on 2D convolution to model the temporal dimension. Lately, networks based on 3D convolution have gained popularity as they can model the temporal and appearance information at once.

In this thesis work, we develop a tool that can detect violent actions, mainly composed of fights, within videos. To this aim, we use different state-of-the-art ResNets based on 3D convolution or its approximation. Along with these networks, we use other computer vision techniques such as optical flow and ad-hoc video sampling strategies. We train these architectures on several datasets, using different kinds of input: plain and optical flow videos. Finally, we test the models on datasets never seen before to evaluate the generalisation capabilities and compare them. Tests show that our methods reach state-of-the-art performances with all the datasets.
File