Tesi etd-02192019-141256

Tipo di tesi

Tesi di dottorato di ricerca

Autore

PONZA, MARCO

URN

etd-02192019-141256

Titolo

Algorithms for Knowledge and Information Extraction in Text with Wikipedia

Settore scientifico disciplinare

INF/01

Corso di studi

INFORMATICA

Relatori

tutor Prof. Ferragina, Paolo

Parole chiave

Knowledge Extraction
Natural Language Understanding
Information Retrieval
Information Extraction
Natural Language Processing
Wikipedia
Knowledge Graph
Machine Learning

Data inizio appello

08/03/2019

Consultabilità

Completa

Riassunto

This thesis focuses on the design of algorithms for the extraction of knowledge (in terms of entities belonging to a knowledge graph) and information (in terms of open facts) from text through the use of Wikipedia as main repository of world knowledge.

The first part of the dissertation focuses on research problems that specifically lie in the domain of knowledge and information extraction. In this context, we contribute to the scientific literature with the following three achievements: first, we study the problem of computing the relatedness between Wikipedia entities, through the introduction of a new dataset of human judgements complemented by a study of all entity relatedness measures proposed in recent literature as well as with the proposal of a new computationally lightweight two-stage framework for relatedness computation; second, we study the problem of entity salience through the design and implementation of a new system that aims at identifying the salient Wikipedia entities occurring in an input text and that improves the state-of-the-art over different datasets; third, we introduce a new research problem called fact salience, which addresses the task of detecting salient open facts extracted from an input text, and we propose, design and implement the first system that efficaciously solves it.

In the second part of the dissertation we study an application of knowledge extraction tools in the domain of expert finding. We propose a new system which hinges upon a novel profiling technique that models people (i.e., experts) through a small and labeled graph drawn from Wikipedia. This new profiling technique is then used for designing a novel suite of ranking algorithms for matching the user query and whose effectiveness is shown by improving state-of-the-art solutions.

File

Nome file	Dimensione
dissertation.pdf	7.00 Mb
report.pdf	30.14 Kb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-02192019-141256