logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-01242023-151951


Tipo di tesi
Tesi di laurea magistrale
Autore
RAMO, MIRCO
URN
etd-01242023-151951
Titolo
Represent, Attend and Transform: Frameworks for Online Handwriting Recognition and Language Decoding
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Cimino, Mario Giovanni Cosimo Antonio
relatore Prof. Silvestre, Guénolé C.M.
relatore Prof. Veale, Tony
Parole chiave
  • Transformer
  • attention
  • online HTR
  • NLP
  • explainability
Data inizio appello
17/02/2023
Consultabilità
Non consultabile
Data di rilascio
17/02/2026
Riassunto
In this work, the Transformer is shown to provide a framework as an end-to-end model for building mathematical expression trees or solving recognition tasks related to natural language, starting from online handwritten gestures corresponding to glyph strokes.
In particular, the attention was successfully employed to learn and enforce the underlying syntax of the target grammars, independently from the specific natural language or mathematical formalism.
The encoder is fed with tokens of spatio-temporal data forming an infinitely large input vocabulary, introducing a technique that could find application beyond that of recognition.
New strategies are investigated to optimize the architecture for tasks involving spatio-temporal dependencies, focusing on the use of pre-encoding layers and different input representations. Furthermore, the transfer learning capabilities of the encoder are analyzed to assess its segmentation and recognition power, even when transferred to new domains with no fine-tuning or adaptation. Additionally, the ability to generate representations suitable for several target syntax and semantics is evaluated. Finally, ablation studies are conducted to test robustness against incorrect inputs, together with an explainability analysis comprising novel techniques here introduced.
Our model is suitable for real-time edge inference, achieving 94% of accuracy over expression tree building and 97% for online handwritten text recognition, relying only on 2M parameters.
File