Tesi etd-01122017-160211

Tipo di tesi

Tesi di dottorato di ricerca

URN

etd-01122017-160211

Titolo

NLP based Information Extraction methods for Patent Analysis

Settore scientifico disciplinare

ING-INF/05 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI

Corso di studi

INGEGNERIA DELL'INFORMAZIONE

Relatori

.

tutor Prof. Marcelloni, Francesco
tutor Prof. Dell'Orletta, Felice

Parole chiave

Information Extraction
Marketing
Natural Language Processing
Patent analysis

Data inizio appello

21/01/2017

Consultabilità

Completa

Riassunto (Inglese)

Riassunto (Italiano)

The focus of this thesis is the analysis of patents through NLP--based extraction systems. State-of-the-art systems for automatic patent analysis are designed for engineers and attorneys and they usually do not take into account that there is a variety of patent readers which are becoming more and more interested in this topic, such as marketers and designers. This new audience is interested in automatic patent analysis since patents contain relevant information that anticipates the availability of products on the market. Managing such information can help them to identify new market trends and define successful strategies.

The main novelty of this work is that the entire information extraction pipeline has been designed to extract relevant information for this new audience. This work focuses on the extraction of users that will possibly benefit from an invention, advantages that an invention brings or drawbacks that an innovation solves.

The extraction problem is addressed by adapting existing tools originally designed to extract information from general--purpose texts.

The adaptation process introduces important novelties. First, it is illustrated a semi-automatic method for the development of a domain specific training set to extract the relevant entities allowing to minimize the human annotation effort.

Secondly, several learning algorithms and feature configurations were tested to improve the overall accuracy of the information extraction process.

Finally, it has been tested a method that combines the information extracted from patents and the analysis of social media text specifically conceived to extract advantages and drawbacks. This method relies on sentiment analysis of text extracted of social media under the assumption that terms indicating advantages should be generally positively perceived by people, the contrary for drawbacks.

File

Nome file	Dimensione
summary_...ments.pdf	74.45 Kb
tesi.pdf	3.14 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-01122017-160211