ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-05302018-111148


Tipo di tesi
Tesi di laurea magistrale
Autore
MASINO, FEDERICA
URN
etd-05302018-111148
Titolo
Semi-Automatic Knowledge Augmentation: Methods and Tools
Dipartimento
INGEGNERIA DELL'ENERGIA, DEI SISTEMI, DEL TERRITORIO E DELLE COSTRUZIONI
Corso di studi
INGEGNERIA GESTIONALE
Relatori
relatore Prof. Fantoni, Gualtiero
correlatore Dott. Chiarello, Filippo
Parole chiave
  • technical dictionary
  • regular expressions
  • word embeddings
  • text mining
  • keywords
  • knowledge base
Data inizio appello
20/06/2018
Consultabilità
Non consultabile
Data di rilascio
20/06/2088
Riassunto
Text mining techniques are being adopted in many different fields to face the problem of extracting meaningful information hidden in unstructured data. Hybrid processes (human-machine) of knowledge extraction are usually the best solution for companies to achieve great results and to ensure the conformity of the output of the knowledge extraction process. Anyway, state-of-art literature on Natural Language Processing (NLP) lacks in process management studies. In particular, researchers have not yet studied the best way to integrate NLP outputs with human activities. To our best knowledge, the present thesis is a first step in the desired direction.
This work aims to investigate the techniques used for the development of Knowledge Base to be used in Text Mining applications and to develop a semi-automatic procedure for Knowledge Augmentation. After an overview on the state-of-art, different techniques of knowledge extraction are applied to four case studies:
1. A completely human-based approach;
2. An automatic keyword extraction approach based on the TF-IDF plus a manual review of the results;
3. POS-tagging based keyword extraction plus a manual review of the results;
4. Hybrid approach that uses regular expressions and an advanced deep-learning method (word embeddings) to extract keywords from documents. Statistical filters are then used to select meaningful words.
The amount of human intervention decreases from the first to the last case study.
File