Tesi etd-03282017-165324

Tipo di tesi

Tesi di laurea magistrale

Autore

FARERI, SILVIA

URN

etd-03282017-165324

Titolo

Advantages and Drawbacks: automatization of Extractions and Measurements from Patents

Dipartimento

INGEGNERIA DELL'ENERGIA, DEI SISTEMI, DEL TERRITORIO E DELLE COSTRUZIONI

Corso di studi

INGEGNERIA GESTIONALE

Relatori

relatore Prof. Fantoni, Gualtiero
correlatore Chiarello, Filippo

Parole chiave

algorithm
batteries
clue
cluster
data-mining
database
heuristic
named entity recognition
taxonomy
text-mining
tool
trend
trend

Data inizio appello

03/05/2017

Consultabilità

Non consultabile

Data di rilascio

03/05/2087

Riassunto

L'analisi automatica del testo (Text Mining) è un processo che consente di estrarre informazione implicita contenuta in testi non strutturati, attraverso l’applicazione di un algoritmo di Mining e il supporto di software specifici. All’interno dell’ampio spettro di possibilità offerte, la classificazione di documenti e parole risulta essere particolarmente rilevante, ed è resa possibile attraverso la Named Entity Recognition (NER), il riconoscimento di specifiche classi di parole nel testo. I Tool di NER, selezionata una determinata classe semantica, mirano ad estrarre tutte le parole che vi appartengono. Un sistema di NER risulta efficace solo attraverso la compresenza di algoritmi di Machine Learning allo stato dell’arte e la conoscenza tecnica del dominio al quale appartengono i documenti di analisi.
Nella presente tesi di Laurea è stato analizzato, attraverso un sistema di NER, un database costituito da estrazioni di parole contenute nei brevetti, con lo scopo di estrarre vantaggi e svantaggi delle invenzioni descritte. Il lavoro di ricerca è stato incentrato sulla formulazione di euristiche finalizzate a rendere il processo di analisi meno complesso e più performante.
Una prima parte del lavoro ha riguardato la definizione di una Tassonomia in grado di rendere il processo di estrazione il più preciso possibile, garantendo l’eliminazione di risultati non pertinenti. Una volta formalizzato, esso è stato applicato nella fase successiva, consentendo di massimizzare l’utilità e l’utilizzabilità della nuova base di dati, costituita da estrazioni provenienti da brevetti sulle Batterie Ricaricabili al Litio. Nel Caso Studio finale è stata quindi elaborata una procedura di analisi automatizzabile in cui operano sinergicamente l’Analisi Statistica, la Clusterizzazione, e la Trend Analysis, consentendo di individuare informazioni strategicamente significative sia per la Progettazione e Sviluppo che per il Marketing.

The Automatic Analysis of Texts (Text Mining Process) consists of the extraction of hidden information contained on unstructured texts, through the application of Mining Algorithms and the use of specific software. In detail, the classification of documents and words proves to be extremely relevant and it is made possible through the Named Entity Recognition (NER), which identifies specific word classes on the text. The NER Tools, selected a particular semantic class, aim at extracting all the words which belong to it. A NER system could be considered efficient only through the coexistence of the technical knowledge of the domain to which the documents belong and the state-of-the-art Machine Learning Algorithms.
On this thesis work, a starting database had been analyzed through a NER system; it had been made by words which belong to Patents, with the aim of extracting Advantages and Drawbacks of the technologies described. The research work focused on the wording of Heuristics intended to make the analysis process less complex and more efficient.
The first part of my work concerned the definition of a “Taxonomy” which let the following process be as precise as possible; moreover, this phase was fundamental to define some rules that allow me to delete all the unusable extractions. Once it was formalized, it was applied on a new database, made of extractions belonging to Lithium Rechargeable Batteries Patents. During the Final Case Study, the procedure of automatic analysis was worded; the method includes Statistical Analysis, Clustering and Innovative Trend Representation which work synergistically together, let anyone who would use it identify key information for both “Marketing” and “Research and Development” Function.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03282017-165324