logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06192024-193200


Tipo di tesi
Tesi di laurea magistrale
Autore
FRATICELLI, ANDREA
URN
etd-06192024-193200
Titolo
Web semantico e Linked Data per l'integrazione di conoscenza relativa a manoscritti medievali e rinascimentali
Dipartimento
FILOLOGIA, LETTERATURA E LINGUISTICA
Corso di studi
INFORMATICA UMANISTICA
Relatori
relatore Prof.ssa Bartalesi Lenzi, Valentina
Parole chiave
  • Linked Data
  • Manoscritto
  • Web semantico
Data inizio appello
05/07/2024
Consultabilità
Non consultabile
Data di rilascio
05/07/2094
Riassunto
This thesis is concerned with the scientific domain of formal knowledge representation through the use of Semantic Web technologies and the Linked Data paradigm. It highlights the key role of Semantic Web languages and Linked Data in integrating data from diverse sources. Special attention is given to the application of semantic technologies in Digital Humanities projects, which resort to a variety of heterogeneous digital approaches and tools due to their multidisciplinary nature. This has resulted in several vocabularies and ontologies emerging as standard references and being increasingly adopted to ensure data interoperability, thereby facilitating the sharing, integration, and reuse of knowledge from various sources. In this context, the aim of the study is to define a methodology that allows for the integration of semantically structured knowledge related to medieval and Renaissance manuscripts stored in two distinct knowledge bases, both of which utilize Semantic Web technologies and conform to the Linked Data paradigm.
The knowledge bases selected as case studies are those created within the Mapping Manuscript Migrations (MMM) and the Index Medii Aevi Geographiae Operum (IMAGO) projects: the former is an international project that compiles data on a broad and diverse collection of manuscripts, while the latter confines its scope specifically to Latin manuscripts transmitting geographical and topographical literary works. For their respective ontologies, both projects employ the CIDOC CRM ISO standard, a widely used conceptual model for describing and representing information related to cultural heritage in a standardized manner. Moreover, MMM adheres to the Linked Data paradigm by aggregating publicly available data on medieval and Renaissance manuscripts from multiple sources (namely the databases of the Schoenberg Institute for Manuscript Studies, the Bodleian Libraries, and the Institut de recherche et d'histoire des textes), processing it according to its unified data model, and in turn, making it available for reuse as Linked Open Data. Following a similar pattern, IMAGO draws on resources from various external datasets to populate its ontology, including Wikidata, the MIRABILE digital archive, the Nuovo Soggettario thesaurus, and the Pleiades gazetteer. This approach has enabled the creation of a shared and interconnected information space that allows users to browse knowledge collected across different repositories.
In an attempt to further expand its linked information space, this study explored the potential for linking IMAGO's knowledge base to that of MMM, leveraging the shared semantic technologies which both projects are built upon. This was ultimately achieved by identifying matching manuscripts present in both knowledge bases, and subsequently integrating relevant information provided by MMM into the knowledge base of IMAGO, allowing to significantly enrich the latter with regard to the matched manuscripts. An outline is given of the employed methodology to link the two knowledge bases by means of mapping the corresponding manuscripts: this included retrieving the relevant data through SPARQL queries, defining an algorithm to compare the manuscript metadata (implemented in a Python program), and finally creating a web page to retrieve and visualize the newly integrated knowledge.
File