logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-01182023-184639


Tipo di tesi
Tesi di laurea magistrale
Autore
BUONGIOVANNI, CHIARA
URN
etd-01182023-184639
Titolo
Tecniche di Natural Language Processing in supporto alla compliance bancaria. Verso una gestione automatica del rischio di non conformità basato su misure di text similarity.
Dipartimento
FILOLOGIA, LETTERATURA E LINGUISTICA
Corso di studi
INFORMATICA UMANISTICA
Relatori
relatore Prof.ssa Passaro, Lucia C.
correlatore Dott. De Mattei, Lorenzo
Parole chiave
  • text similarity
  • sentence vector representation
  • regulatory technology
  • natural language processing
  • machine learning
  • automated compliance
Data inizio appello
02/02/2023
Consultabilità
Non consultabile
Data di rilascio
02/02/2093
Riassunto
The increase in supervisory scrutiny activities following the 2008 financial crisis has led to a significant increase in the complexity and cost of banking compliance functions. To respond to this, financial institutions have been investing in RegTech, which are technological solutions that leverage Artificial Intelligence to optimize compliance, conformity, and adherence to rules, regulations, laws, and reports. Natural Language Processing has been particularly important in this regard, as it provides the tools to make regulatory textual data - which is typically unstructured - machine-readable, actionable, and interpretable, in order to extract various types of relevant information.
This paper presents an initial attempt to automate the monitoring of noncompliance risk, using measures of text similarity, with the aim of identifying all documents affected by the hypothetical promulgation and publication of a set of laws, normative documents, and regulations within a collection of corporate regulations. Two experiments were conducted, one using a discrete vector representation of the texts through an unsupervised approach, and the other using a continuous, contextualized representation through a supervised approach, using a labeled dataset created ad hoc. Finally, the two models were tested on the same real-world use case, which was achieving GDPR compliance.
File