Tesi etd-06102024-220853

Tipo di tesi

Tesi di laurea magistrale

Autore

LA MANNA, DAVIDE

URN

etd-06102024-220853

Titolo

Neural Tangent kernel for approximate binarized neural network architecture

Dipartimento

MATEMATICA

Corso di studi

MATEMATICA

Relatori

relatore Prof. Trevisan, Dario

Parole chiave

asymmetric NTK operator
binary neural networks
complex models
computational challenges
computationally costly
deep neural networks
dirac's deltas
discontinuous functions
edge computing applications
edge platforms
embedded platforms
energy consumption
gaussian process
gradient propagation
hardtanh function
inference speeds
integer arithmetic
kernel gradient formulation
memory-intensive
network architectures
neural tangent kernel
parameter quantization
population loss
processing power
quantitative estimation
quantitative simulations
quantized neural networks
resource-constrained environments
small sensors
storage demands
straight-through estimator
training dynamics
wearable technology

Data inizio appello

12/07/2024

Consultabilità

Completa

Riassunto

The rapid increase in the adoption of deep neural networks (NN) has brought significant computational challenges, especially regarding storage demands, energy consumption, and processing power. These challenges are intensified when deploying complex models in environments with limited computational resources. Current deep neural network models are both memory-intensive and computationally costly. For instance, a neural network utilizing the ImageNet dataset can have 650,000 neurons and 60 million float point parameters, as described by Krizhevsky et al. Similarly, the VGG16 model by Simonyan and Zisserman includes over 130 million float point parameters. Utilizing such large models on small devices with constrained resources is difficult.

One solution to this problem is parameter quantization, which involves creating neural networks that use lower precision by quantizing the parameter space (QNN). This approach takes advantage of optimized support for integer arithmetic on embedded and edge platforms. An extreme example of parameter quantization is the Binary Neural Network (BNN) method, where parameters are constrained to the set {-1, 1}. BNNs are beneficial because they require fewer parameters and have faster inference speeds, making them suitable for implementation on devices with limited resources, such as wearable technology and small sensors.

Given the increasing trend towards practical and lightweight networks, more researchers are focusing on BNNs. The significance of BNNs is highlighted by the 2021 Computer Vision and Pattern Recognition (CVPR) workshop on binary networks for computer vision. BNNs have become a prominent research area within the AI community. However, training Quantized Neural Networks (QNNs) poses a critical problem: how to propagate gradients through the discontinuous functions that model quantized operands.

A piecewise constant function's distributional derivative is a linear combination of Dirac's deltas, while its classical derivative is zero at all continuity points. Consequently, training a QNN using the backpropagation method is not feasible. The conventional solution to this issue is to apply the Straight-Through Estimator (STE) to each of the QNN's discontinuous functions. The STE involves using two distinct functions during the forward and backward phases of the learning iteration, with the backward function being differentiable. The choice of replacement function is not exclusive. Previous studies have indicated that the population loss descends towards the alternative gradient calculated via STE replacements. Selecting appropriate backward functions is necessary to ensure convergence to a local minimum of the loss landscape.

This thesis addresses the problem of binary neural networks. Chapter 1 introduces a mathematical description of both classical and quantized neural networks. It also proves that the wide limit of a neural network with random weights converges to a Gaussian process. A quantitative estimation based on the work of Basteri et al. is provided.

In Chapter 2, the focus shifts to the kernel gradient formulation of a neural network proposed by Jacot et al. This chapter describes the wide boundary of the network via the neural tangent kernel for classical neural networks initialized with Gaussian weights. It formulates the gradient kernel for binarized networks, where the binarization step is approximated by using the hardtanh function in both the forward and backward passes. This method generalizes Bengio's STE approach and leads to an asymmetric Neural Tangent Kernel (NTK) operator. The chapter also examines the long NTK limit of the network at initialization and during training.

Chapter 3 presents quantitative simulations of the neural tangent kernel for the BNN architecture, considering networks with 0, 1, or 2 hidden layers as the network width increases.

File

Nome file	Dimensione
TESI_Mag...Manna.pdf	1.16 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06102024-220853