ETD system

Electronic theses and dissertations repository


Tesi etd-04092019-165542

Thesis type
Tesi di laurea magistrale
VLSI Design of a Hardware Accelerator for Convolutional Neural Networks on the edge: the Keyword Spotting case study
Corso di studi
relatore Prof. Fanucci, Luca
correlatore Dinelli, Gianmarco
correlatore Meoni, Gabriele
Parole chiave
  • Artificial Intelligence
  • Machine Learning
  • Keyword Spotting
  • Convolutional Neural Network
  • FPGA
  • Hardware accelerator
Data inizio appello
Secretata d'ufficio
Data di rilascio
Riassunto analitico
During the last years, Convolutional Neural Networks have been used for different applications thanks to their potentiality to carry out tasks by using a reduced number of parameters if compared to other Deep Learning approaches. However, power consumption and memory footprints constraints typical of on the edge and portable applications collide with accuracy and latency requirements, which characterize these applications. For such reason, commercial hardware accelerators have become popular, thanks to their architecture designed for the inference of general Convolutional Neural Networks models.
Nevertheless, Field Programmable Gate Arrays represent an interesting perspective, since they offer the possibility of implement a hardware architecture tailored to a specific Convolutional Neural Network model, with promising results in terms of power consumption and timing performances.
In this thesis, we propose a Field Programmable Gate Array hardware accelerator for a Separable Convolutional Neural Network, which was designed for a Keyword Spotting application. We started from the model implemented in a previous work for the Intel Movidius Neural Compute Stick, chosen for its high accuracy despite its relatively little number of parameters and hidden layers. For our goals, we appropriately quantized such model through a bit-true simulation, and we realized a dedicated architecture.
A benchmark comparing the results on different Field Programmable Gate Arrays families by Xilinx with the implementation on Neural Computer Stick was realized. The analysis shows that better latency performances can be obtained with comparable accuracy and power consumption.