Tesi etd-03302022-120253

Tipo di tesi

Tesi di laurea magistrale

Autore

SERAO, GIANLUCA

URN

etd-03302022-120253

Titolo

Development of deep learning models for fight detection in videos

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING

Relatori

relatore Prof. Gennaro, Claudio

Parole chiave

3D convolutional neural networks
fight detection
optical flow
segment-based sampling

Data inizio appello

29/04/2022

Consultabilità

Non consultabile

Data di rilascio

29/04/2092

Riassunto

Video action recognition has gained much attention in recent years by the research community for its importance in many everyday applications such as human-machine interactions and surveillance. Video action recognition aims to classify human actions using a sequence of still images. This task is not easy: different actions may share similar patterns, making them difficult to distinguish. Moreover, videos pose several issues also from the appearance point of view, like camera motion and illuminance changes. Therefore, it is crucial to jointly model motion and appearance information in this context.

Early methods were mainly based on handcrafted features and motion estimation techniques. Afterwards, with the rise of deep learning, most methods were based on deep networks, such as Convolutional Neural Networks (CNN) and Residual Networks (ResNet). Initially, the most successful methods used simultaneously multiple networks based on 2D convolution to model the temporal dimension. Lately, networks based on 3D convolution have gained popularity as they can model the temporal and appearance information at once.

In this thesis work, we develop a tool that can detect violent actions, mainly composed of fights, within videos. To this aim, we use different state-of-the-art ResNets based on 3D convolution or its approximation. Along with these networks, we use other computer vision techniques such as optical flow and ad-hoc video sampling strategies. We train these architectures on several datasets, using different kinds of input: plain and optical flow videos. Finally, we test the models on datasets never seen before to evaluate the generalisation capabilities and compare them. Tests show that our methods reach state-of-the-art performances with all the datasets.

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03302022-120253