Tesi etd-03252024-111027

Tipo di tesi

Tesi di laurea magistrale

Autore

BELLIZZI, LEONARDO

URN

etd-03252024-111027

Titolo

Real-time heart rate estimation with visions transformers

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING

Relatori

relatore Prof. Tonellotto, Nicola
relatore Prof. Ducange, Pietro
relatore Prof. Vallati, Carlo

Parole chiave

heart rate prediction
neural networks
PyTorch
vision transformers
web application

Data inizio appello

17/04/2024

Consultabilità

Non consultabile

Data di rilascio

17/04/2094

Riassunto

This thesis is positioned within the field of health-care. The main objective is to be able to predict the heart rate from facial images, after extracting features from them using neural networks.
Specifically, the first objective is to find a neural network architecture capable of performing this regression task with a level of performance similar to that obtained by baseline models. This model is the core of a web application built to predict HR in real time on the video streaming of the PC webcam and allowing the user to make the predictions also on videos.
The first step involved a review of the state-of-the-art methodologies, to find which are benchmarks structures in this domain. CNN-based structures, known for their effectiveness in computer vision tasks, dominate even this field. EVMCNN and DeepPhys architectures stood out as benchmarks, with superior performance compared to their counterparts and for this, they were chosen as baseline models.
Taking into account the current state of the art, an approach utilizing the Vision Transformer (ViT) model has been developed, with modifications to suit the regression task.
The ViT implementation follows the Transformer model structure employed in Natural Language Processing tasks.
The ECG-Fitness Database was used as dataset, since it contains videos captured from 17 subjects engaged in six distinct physical activities, with each video linked to a corresponding CSV file containing heart rate data. The preprocessing phase was necessary to align video frames with the ECG CSV files to ensure temporal coherence. Face detection was performed every ten frames, to ensure a trade-off between data quantity and temporal information retention. The next step involved features extraction from the cheek regions of detected faces, through an algorithm, that from each raw frame, derives spectrograms representing temporal pixel variations and keeps relevant informations of heart rate frequencies.
Two dataset were set up, one allowing feature images from the same person to be present in both training and testing sets, while the other with separation by individuals, such that each person appeared in the training or test.
Models have been trained and tested on both dataset. Even if models suffered of performance degradation in the individual-separated dataset, as happened to DeepPhys researchers, ViT consistently outperformed the other architectures across both scenarios, reaching first objective.
Given ViT's performance, it was chosen as the most suitable model for real-time heart rate estimation in the web application, reaching second objective.

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03252024-111027