Tesi etd-05112021-200709

Tipo di tesi

Tesi di laurea magistrale

Autore

RASULO, ANTONIO

URN

etd-05112021-200709

Titolo

Implementation of a Convolutional Neural Network on a FPGA-based soft GPU through the OpenCL framework

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

INGEGNERIA ELETTRONICA

Relatori

relatore Prof. Fanucci, Luca
relatore Ing. Giuffrida, Gianluca
relatore Ing. Nannipieri, Pietro

Parole chiave

CNN
FGPU
FPGA
GPGPU
OpenCL
soft-GPU

Data inizio appello

21/06/2021

Consultabilità

Non consultabile

Data di rilascio

21/06/2091

Riassunto

The large adoption of smart algorithms requires hardware capable to run them efficiently, limiting the power consumption.
Nowadays, GPUs are the best solution to execute both artificial intelligence and computer vision algorithms.
In fact, their internal structure is designed to exploit the Single Instruction Multiple Thread (SIMT) paradigm execution.
However, GPUs are usually exploited in the Cloud, at the cost of some problems such as data transportation, security, etc.
To overcome these issues, computing is sliding towards edge solutions, which can reduce the complexity of the system and speedup the inference directly where the data are collected.
The aim of the thesis is to implement CloudScout, a Convolutional Neural Network (CNN) dedicated to on-board satellite computation, using a custom edge GPU-like hardware accelerator implemented on an FPGA board.
The accelerator, called soft-GPU, designed in collaboration with the University of Brandenburg, is able to perform general purpose computing, reducing both power consumption and inference time.
It is compliant with the SIMT Open-Source standard: OpenCL.
The proof-of-concept prototype of this heterogeneous design was synthesized on a Xilinx ZC706 development board. The SoC includes an ARM Cortex-A9 CPU, used as host device, the programmable logic where the soft-GPU was implemented, and 1 GB of off-chip RAM.
The CPU runs the control code and schedules the operation flow, handling the execution of soft-GPU.
The preliminary results show different performance according to the layers contained in the CloudScout architecture.
In particular, we obtained an average speedup of 3.7x in convolutional, and 21x in pooling operations with respect to the host processor with the Neon instructions enabled.
A winning aspect regarding the soft-processor based solution concerns the ease and speed of development: it took about a year to design the Cloudscout custom solution, while implementing Cloudscout on the soft-GPU took about 3 months.
Finally, the dynamic power consumption of the soft-GPU is only 0.55W, occupying only 27.13% and 7.41% of the Xilinx ZC706 programmable logic Look Up Tables (LUTs) and Digital Signal Processors (DSPs).

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-05112021-200709