ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-06162022-155445


Tipo di tesi
Tesi di laurea magistrale
Autore
VARAGNOLO, DAVIDE
URN
etd-06162022-155445
Titolo
EPISA Project: Semantic Migration from DigitArq to CIDOC CRM
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof.ssa Bartalesi Lenzi, Valentina
Parole chiave
  • semantic web
  • CIDOC-CRM
  • knowledge representation
  • natural language processing
Data inizio appello
22/07/2022
Consultabilità
Completa
Riassunto
This thesis presents a strategy for the semantic migration of Portuguese National Archives records representation from the ISAD(G) standard into CIDOC-CRM standard , and the strategy to extract valuable information from these records. These two research activities were developed within the context of the EPISA project, a part of the ongoing renewal of Direção-Geral do Livro, dos Arquivos e das Bibliotecas (DGLAB) existing data infrastructure. The semantic migration was performed using the Migration Mapping Rules, a set of rules used to semantically translate the archives' descriptive information into CIDOC-CRM representation. The implementation of these rules is done with OWL API, a Java library that allows generating and manipulating ontologies. The extraction of valuable information was performed with the application of Natural Language Processing (NLP) techniques, like Named Entity Recognition (NER), and a set of pattern matching rules implemented in JAPE. The process is managed by GATE, a framework and graphical development environment for NLP tools. The analysis is performed in a different way depending on the type of record, selected at the beginning of the process with a multiclass classification, and implemented as a Decision Tree. The resultant Knowledge Base can be explored with Query Ontology Interface, an application developed with Spring Application that allows the domain expert users from DGLAB to evaluate the results of the migration and extraction process.
File