Tesi etd-07042025-104732 |
Link copiato negli appunti
Tipo di tesi
Tesi di dottorato di ricerca
Autore
AURIEMMA, SERENA
URN
etd-07042025-104732
Titolo
Enhancing Public Administration with Computational Linguistics: a Language Model for Italian Bureacrutic Language
Settore scientifico disciplinare
GLOT-01/A - Glottologia e linguistica
Corso di studi
DISCIPLINE LINGUISTICHE E LETTERATURE STRANIERE
Relatori
tutor Prof. Lenci, Alessandro
Parole chiave
- administrative data
- BureauBERTo
- encoder
- fine-tuning
- further pre-training
- Italian bureaucratic language
- language model
- prompting
- public administration
- specialized model
Data inizio appello
11/07/2025
Consultabilità
Non consultabile
Data di rilascio
11/07/2028
Riassunto
This thesis addresses the automatic analysis of texts written in bureaucratic Italian through the development of resources and the identification of computational linguistics and NLP approaches applicable to data from the Italian Public Administration (PA), with the goal of supporting its digital transformation. The research focuses on two main areas of intervention: streamlining the processing of administrative documents and improving the readability of PA texts. Sector-specific languages, such as bureaucratic Italian, often pose challenges for general-purpose language models, which lack the linguistic knowledge required to accurately perform domain-specific tasks. To address this issue, the thesis describes the stages leading to the development of BureauBERTo, an encoder-based language model and the first to be specialized in the Italian bureaucratic domain. BureauBERTo’s performance was tested and compared to other models using supervised, unsupervised, and prompt-based learning approaches, demonstrating the effectiveness of specialized models in domain-specific tasks, even with limited annotated data. The research also showed that specialized encoders offer an efficient and more sustainable solution for discriminative tasks compared to current large language models, while ensuring internal data governance for public institutions and fostering AI applications that are accessible even to smaller entities within the public sector.
File
Nome file | Dimensione |
---|---|
La tesi non è consultabile. |