ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-05112021-200709


Tipo di tesi
Tesi di laurea magistrale
Autore
RASULO, ANTONIO
URN
etd-05112021-200709
Titolo
Implementation of a Convolutional Neural Network on a FPGA-based soft GPU through the OpenCL framework
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
INGEGNERIA ELETTRONICA
Relatori
relatore Prof. Fanucci, Luca
relatore Ing. Giuffrida, Gianluca
relatore Ing. Nannipieri, Pietro
Parole chiave
  • CNN
  • FPGA
  • OpenCL
  • FGPU
  • soft-GPU
  • GPGPU
Data inizio appello
21/06/2021
Consultabilità
Non consultabile
Data di rilascio
21/06/2091
Riassunto
The large adoption of smart algorithms requires hardware capable to run them efficiently, limiting the power consumption.
Nowadays, GPUs are the best solution to execute both artificial intelligence and computer vision algorithms.
In fact, their internal structure is designed to exploit the Single Instruction Multiple Thread (SIMT) paradigm execution.
However, GPUs are usually exploited in the Cloud, at the cost of some problems such as data transportation, security, etc.
To overcome these issues, computing is sliding towards edge solutions, which can reduce the complexity of the system and speedup the inference directly where the data are collected.
The aim of the thesis is to implement CloudScout, a Convolutional Neural Network (CNN) dedicated to on-board satellite computation, using a custom edge GPU-like hardware accelerator implemented on an FPGA board.
The accelerator, called soft-GPU, designed in collaboration with the University of Brandenburg, is able to perform general purpose computing, reducing both power consumption and inference time.
It is compliant with the SIMT Open-Source standard: OpenCL.
The proof-of-concept prototype of this heterogeneous design was synthesized on a Xilinx ZC706 development board. The SoC includes an ARM Cortex-A9 CPU, used as host device, the programmable logic where the soft-GPU was implemented, and 1 GB of off-chip RAM.
The CPU runs the control code and schedules the operation flow, handling the execution of soft-GPU.
The preliminary results show different performance according to the layers contained in the CloudScout architecture.
In particular, we obtained an average speedup of 3.7x in convolutional, and 21x in pooling operations with respect to the host processor with the Neon instructions enabled.
A winning aspect regarding the soft-processor based solution concerns the ease and speed of development: it took about a year to design the Cloudscout custom solution, while implementing Cloudscout on the soft-GPU took about 3 months.
Finally, the dynamic power consumption of the soft-GPU is only 0.55W, occupying only 27.13% and 7.41% of the Xilinx ZC706 programmable logic Look Up Tables (LUTs) and Digital Signal Processors (DSPs).
File