Tesi etd-09182018-155646

Tipo di tesi

Tesi di laurea magistrale

Autore

WANG, QIONGGE

URN

etd-09182018-155646

Titolo

Using Natural Language Processing and Data Mining Techniques for Amazon Reviews Data Analytics:A study

Dipartimento

INFORMATICA

Corso di studi

INFORMATICA

Relatori

relatore Nanni, Mirco
correlatore Prof. Attardi, Giuseppe

Parole chiave

Visualization
Analysis.
Natural Language Processing
Deep Learning
Algorithms
Data Mining

Data inizio appello

05/10/2018

Consultabilità

Non consultabile

Data di rilascio

05/10/2088

Riassunto

This thesis focused on data mining (also machine learning) algorithms for Amazon extracted
non-text dataset and text dataset for building artificial neural networks. Deep
learning algorithms nowadays used in a wide variety of domains for Amazon review
comments analysis. We already understood the pupular and useful algorithms for nontext
clustering (unsupervised learning) and classification (supervised learning). The specific
algorithms used theoretical explanation in previous chapters, those chapters also
illustrated the performance measures for training and validation datesets. The following
were the metrics selected : accuracy, classification error, precision, recall, F Measure ,
false Positive, false negative, true Positive, true Negative, sensitivity, specificity, positive
predictive value and negative Predictive Value, etc.

About text classification part, it theoretically described four different architectures:
convolutional (CNN), recurrent models like long short-term memory (LSTM) neural
networks, GRU and Bi-LSTM networks. These networks were explained in terms of
their structures, their building blocks—artificial neurones, and some learning algorithms:
backpropagation and backpropagation through time. Four architectures (CNN, GRU,
LSTM and Bi-LSTM), by setting different parameters, lots of experiments of tasks were
compiled and trained and tested. In the process we using Tensorflow/Keras frameworks
and trained network can be easily connected to any module in the python.

To accomplish and find the best word embedding method and best RNN networks
model, it is necessary to select a proper architecture and to optimize hyperparameters of
the network. Thus, the experimental procedure for comparing different architectures in
terms of their ability to learn, the effectivity of the training process, and the classification
performance was proposed and implemented in the previous chapter of this thesis. The
process also includes automatic optimization of neural network’s hyperparameters using
scikit-learn grid and random search functions.

A good understanding of the quality of the data was achieved by applying different
data mining and natural language processing techniques, moreover multiple visualization
ways like tables and graphs were created for intuitively and subjectively understanding
of each model.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-09182018-155646