Thesis etd-06162022-155445 |
Link copiato negli appunti
Thesis type
Tesi di laurea magistrale
Author
VARAGNOLO, DAVIDE
URN
etd-06162022-155445
Thesis title
EPISA Project: Semantic Migration from DigitArq to CIDOC CRM
Department
INFORMATICA
Course of study
INFORMATICA
Supervisors
relatore Prof.ssa Bartalesi Lenzi, Valentina
Keywords
- CIDOC-CRM
- knowledge representation
- natural language processing
- semantic web
Graduation session start date
22/07/2022
Availability
Full
Summary
This thesis presents a strategy for the semantic migration of Portuguese National Archives records representation from the ISAD(G) standard into CIDOC-CRM standard , and the strategy to extract valuable information from these records. These two research activities were developed within the context of the EPISA project, a part of the ongoing renewal of Direção-Geral do Livro, dos Arquivos e das Bibliotecas (DGLAB) existing data infrastructure. The semantic migration was performed using the Migration Mapping Rules, a set of rules used to semantically translate the archives' descriptive information into CIDOC-CRM representation. The implementation of these rules is done with OWL API, a Java library that allows generating and manipulating ontologies. The extraction of valuable information was performed with the application of Natural Language Processing (NLP) techniques, like Named Entity Recognition (NER), and a set of pattern matching rules implemented in JAPE. The process is managed by GATE, a framework and graphical development environment for NLP tools. The analysis is performed in a different way depending on the type of record, selected at the beginning of the process with a multiclass classification, and implemented as a Decision Tree. The resultant Knowledge Base can be explored with Query Ontology Interface, an application developed with Spring Application that allows the domain expert users from DGLAB to evaluate the results of the migration and extraction process.
File
Nome file | Dimensione |
---|---|
MasterTh...Final.pdf | 61.00 Mb |
Contatta l’autore |