logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06102024-220853


Tipo di tesi
Tesi di laurea magistrale
Autore
LA MANNA, DAVIDE
URN
etd-06102024-220853
Titolo
Neural Tangent kernel for approximate binarized neural network architecture
Dipartimento
MATEMATICA
Corso di studi
MATEMATICA
Relatori
relatore Prof. Trevisan, Dario
Parole chiave
  • asymmetric NTK operator
  • binary neural networks
  • complex models
  • computational challenges
  • computationally costly
  • deep neural networks
  • dirac's deltas
  • discontinuous functions
  • edge computing applications
  • edge platforms
  • embedded platforms
  • energy consumption
  • gaussian process
  • gradient propagation
  • hardtanh function
  • inference speeds
  • integer arithmetic
  • kernel gradient formulation
  • memory-intensive
  • network architectures
  • neural tangent kernel
  • parameter quantization
  • population loss
  • processing power
  • quantitative estimation
  • quantitative simulations
  • quantized neural networks
  • resource-constrained environments
  • small sensors
  • storage demands
  • straight-through estimator
  • training dynamics
  • wearable technology
Data inizio appello
12/07/2024
Consultabilità
Completa
Riassunto
The rapid increase in the adoption of deep neural networks (NN) has brought significant computational challenges, especially regarding storage demands, energy consumption, and processing power. These challenges are intensified when deploying complex models in environments with limited computational resources. Current deep neural network models are both memory-intensive and computationally costly. For instance, a neural network utilizing the ImageNet dataset can have 650,000 neurons and 60 million float point parameters, as described by Krizhevsky et al. Similarly, the VGG16 model by Simonyan and Zisserman includes over 130 million float point parameters. Utilizing such large models on small devices with constrained resources is difficult.

One solution to this problem is parameter quantization, which involves creating neural networks that use lower precision by quantizing the parameter space (QNN). This approach takes advantage of optimized support for integer arithmetic on embedded and edge platforms. An extreme example of parameter quantization is the Binary Neural Network (BNN) method, where parameters are constrained to the set {-1, 1}. BNNs are beneficial because they require fewer parameters and have faster inference speeds, making them suitable for implementation on devices with limited resources, such as wearable technology and small sensors.

Given the increasing trend towards practical and lightweight networks, more researchers are focusing on BNNs. The significance of BNNs is highlighted by the 2021 Computer Vision and Pattern Recognition (CVPR) workshop on binary networks for computer vision. BNNs have become a prominent research area within the AI community. However, training Quantized Neural Networks (QNNs) poses a critical problem: how to propagate gradients through the discontinuous functions that model quantized operands.

A piecewise constant function's distributional derivative is a linear combination of Dirac's deltas, while its classical derivative is zero at all continuity points. Consequently, training a QNN using the backpropagation method is not feasible. The conventional solution to this issue is to apply the Straight-Through Estimator (STE) to each of the QNN's discontinuous functions. The STE involves using two distinct functions during the forward and backward phases of the learning iteration, with the backward function being differentiable. The choice of replacement function is not exclusive. Previous studies have indicated that the population loss descends towards the alternative gradient calculated via STE replacements. Selecting appropriate backward functions is necessary to ensure convergence to a local minimum of the loss landscape.

This thesis addresses the problem of binary neural networks. Chapter 1 introduces a mathematical description of both classical and quantized neural networks. It also proves that the wide limit of a neural network with random weights converges to a Gaussian process. A quantitative estimation based on the work of Basteri et al. is provided.

In Chapter 2, the focus shifts to the kernel gradient formulation of a neural network proposed by Jacot et al. This chapter describes the wide boundary of the network via the neural tangent kernel for classical neural networks initialized with Gaussian weights. It formulates the gradient kernel for binarized networks, where the binarization step is approximated by using the hardtanh function in both the forward and backward passes. This method generalizes Bengio's STE approach and leads to an asymmetric Neural Tangent Kernel (NTK) operator. The chapter also examines the long NTK limit of the network at initialization and during training.

Chapter 3 presents quantitative simulations of the neural tangent kernel for the BNN architecture, considering networks with 0, 1, or 2 hidden layers as the network width increases.

File