Tesi etd-08302021-121926 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
MINUTELLA, FILIPPO
URN
etd-08302021-121926
Titolo
Design and implementation of a deep learning system for knowledge graph analysis
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Falchi, Fabrizio
relatore Dott. Manghi, Paolo
relatore Dott. De Bonis, Michele
relatore Dott. Messina, Nicola
relatore Dott. Manghi, Paolo
relatore Dott. De Bonis, Michele
relatore Dott. Messina, Nicola
Parole chiave
- deep learning
- graph
- graph machine learning
- graph neural network
- knowledge graph
- machine learning
- open science
- scholarly communication
Data inizio appello
24/09/2021
Consultabilità
Completa
Riassunto
Nowadays a lot of data is in the form of Knowledge Graphs, i.e. a set of nodes and relationships between them. Many companies exclude relationships or don't use them to their full potential in order to convert naturally graph-like data into tabular data so that it can be organized in the usual databases and analyzed using simple, familiar processes.
This conversion process has the advantage of simplification but brings with it a loss of information that cannot always be ignored.
After a review of techniques aimed at performing different tasks on graph data types, some of these were used in the analysis of the data provided by OpenAIRE.
OpenAIRE is a platform to support Open Science in Europe and it provides a Research Graph, which is a graph composed of scientific resources linked to their authors, where they have been published, and the keywords in them.
For the analysis of the Research Graph, it has been used a metapath approach in order to allow the analysis of a heterogeneous graph by transforming it into a series of homogeneous graphs.
Such graphs are simpler to be analyzed and they allow to focus the analysis on a single type of element of the graph.
A framework was developed to analyze the Research Graph and to highlight the anomalies in the dataset.
The framework integrates the metapath approach and a neural network to perform Node Classification and Node Embedding, and the results were compared with the methods of Graph Neural Networks in the literature.
The result of our work is a method that can leverage the node attributes and graph metapaths to perform Node Classification or Node Embedding by identifying the most significant information.
The result of the work presented in this thesis is a framework that is scalable, easy to understand and fast. Moreover, it performs better than other unsupervised methods available in the literature.
This conversion process has the advantage of simplification but brings with it a loss of information that cannot always be ignored.
After a review of techniques aimed at performing different tasks on graph data types, some of these were used in the analysis of the data provided by OpenAIRE.
OpenAIRE is a platform to support Open Science in Europe and it provides a Research Graph, which is a graph composed of scientific resources linked to their authors, where they have been published, and the keywords in them.
For the analysis of the Research Graph, it has been used a metapath approach in order to allow the analysis of a heterogeneous graph by transforming it into a series of homogeneous graphs.
Such graphs are simpler to be analyzed and they allow to focus the analysis on a single type of element of the graph.
A framework was developed to analyze the Research Graph and to highlight the anomalies in the dataset.
The framework integrates the metapath approach and a neural network to perform Node Classification and Node Embedding, and the results were compared with the methods of Graph Neural Networks in the literature.
The result of our work is a method that can leverage the node attributes and graph metapaths to perform Node Classification or Node Embedding by identifying the most significant information.
The result of the work presented in this thesis is a framework that is scalable, easy to understand and fast. Moreover, it performs better than other unsupervised methods available in the literature.
File
Nome file | Dimensione |
---|---|
Master_Thesis.pdf | 2.17 Mb |
Contatta l’autore |