logo SBA


Digital archive of theses discussed at the University of Pisa


Thesis etd-04052017-162845

Thesis type
Tesi di laurea magistrale
Thesis title
Design and implementation of a system for real-time moving object detection and classification in video streams
Course of study
relatore Dott. Falchi, Fabrizio
relatore Dott. Gennaro, Claudio
relatore Dott. Amato, Giuseppe
  • Object Classification
  • Object Detection
  • Object Tracking
Graduation session start date
The aim of this study is to design and implement a complete system for moving object detection and classification that works in real-time. The system is designed as a pipeline of processing steps, applied to the incoming video stream. It is composed of three sub-modules: Object Detection, Object Tracking and Object Classification. First, for the object detection, a Gaussian Mixture-based Background/Foreground Segmentation Algorithm named MOG2 is used. Background subtraction is a widely used approach for detecting moving objects in videos recorded in steady conditions. Second, for the classification task, a Convolutional Neural Network is designed and trained. CNNs are Neural Network Architectures belonging to the Deep Learning field. These type of networks need to be trained with a large amount of data, hence to this purpose an ad-hoc dataset is created combining different datasets. Since the system is thought to be used for traffic monitoring, the identified classes of object are car, motorbike, tram, van, bicycle, person, truck and bus. Different networks have been tested in order to achieve the best trade-off between efficiency and accuracy. The framework used for training and testing the networks is Caffe, which is probably the most used in computer vision. In order to simplify the training, the NVIDIA Deep Learning GPU Training System (DIGITS) is employed. The system presented in this work is developed exploiting OpenCV and Caffe libraries and it is entirely written in C++. To reduce the number of objects to classify for each frame, a tracking mechanism is also designed and implemented. Such mechanism exploits the center of mass computed on each detected object, to discriminate whether the object was already present in the previous frames. Several parameters are made available, that can be tuned by the user in order to adapt the system depending on the demands of performance and the context in which it is inserted. A statistical study of these parameters has been carried out, in order to show how the system behaves when they change.