ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-01122017-160211


Tipo di tesi
Tesi di dottorato di ricerca
Autore
CIMINO, ANDREA
URN
etd-01122017-160211
Titolo
NLP based Information Extraction methods for Patent Analysis
Settore scientifico disciplinare
ING-INF/05
Corso di studi
INGEGNERIA DELL'INFORMAZIONE
Relatori
tutor Prof. Marcelloni, Francesco
tutor Prof. Dell'Orletta, Felice
Parole chiave
  • Natural Language Processing
  • Information Extraction
  • Patent analysis
  • Marketing
Data inizio appello
21/01/2017
Consultabilità
Completa
Riassunto
The focus of this thesis is the analysis of patents through NLP--based extraction systems. State-of-the-art systems for automatic patent analysis are designed for engineers and attorneys and they usually do not take into account that there is a variety of patent readers which are becoming more and more interested in this topic, such as marketers and designers. This new audience is interested in automatic patent analysis since patents contain relevant information that anticipates the availability of products on the market. Managing such information can help them to identify new market trends and define successful strategies.

The main novelty of this work is that the entire information extraction pipeline has been designed to extract relevant information for this new audience. This work focuses on the extraction of users that will possibly benefit from an invention, advantages that an invention brings or drawbacks that an innovation solves.

The extraction problem is addressed by adapting existing tools originally designed to extract information from general--purpose texts.

The adaptation process introduces important novelties. First, it is illustrated a semi-automatic method for the development of a domain specific training set to extract the relevant entities allowing to minimize the human annotation effort.

Secondly, several learning algorithms and feature configurations were tested to improve the overall accuracy of the information extraction process.

Finally, it has been tested a method that combines the information extracted from patents and the analysis of social media text specifically conceived to extract advantages and drawbacks. This method relies on sentiment analysis of text extracted of social media under the assumption that terms indicating advantages should be generally positively perceived by people, the contrary for drawbacks.
File