Tesi etd-10202008-150701

Tipo di tesi

Tesi di dottorato di ricerca

Autore

DELL'ORLETTA, FELICE

URN

etd-10202008-150701

Titolo

Improving the accuracy of Natural Language Dependency Parsing

Settore scientifico disciplinare

INF/01

Corso di studi

INFORMATICA

Relatori

Relatore Attardi, Giuseppe

Parole chiave

natural language parsing
natrual language processing
DeSR
dependency tree
chunker
animacy
parser
Shift Reduce

Data inizio appello

15/12/2008

Consultabilità

Non consultabile

Data di rilascio

15/12/2048

Riassunto

The aim of this thesis is to improve Natural Language Dependency Parsing. We employ a
linear Shift Reduce Dependency parsing algorithm avoiding the increase of computational
costs.

We start by presenting our experiments results achieved during our participation at the
multilingual dependency shared task of Conference on Computational Natural Language
(CoNLL) 2007. We perform an accurate error analysis of the best parsers presented at
the conference to reveal critical aspects of parsing systems.

This will lead us to introduce a new parsing method and a new parser combination algorithm with the purpose of improving the deterministic Shift Reduce parser’s accuracy. The new parsing method, called Reverse Revision Parsing, employs a Left-to-Right Shift Reduce parser that parses the sentence followed by a second Right-to-Left Shift Reduce parser that scans the sentence in reverse using additional features obtained from the prediction of the ﬁrst parser. The new parser combination algorithm, called Quasi-Linear Parser Combination, exploits the fact that its inputs are trees in order to avoid the quadratic cost of algorithms for computing the maximum spanning tree of a graph.

We report on our experiments’ results obtained during the participation at CoNLL-2008 evaluation task. These results have been achieved employing the Reverse Revision Parsing and a new combination algorithm presented during the course of this thesis.

We then present a number of experiments meant to select a set of features that provides
the greatest improvement to a Shift Reduce statistical dependency parser. We report on
the accuracy gains that such parser can obtain using features from gold chunks, from chunks produced using a statistical chunker and from approximate chunks obtained by detecting noun phrases through regular expression patterns. A parser exploiting features from approximate chunks is applied to a chunking task and its accuracy in chunking is compared to that of a specialized statistical chunker.

Finally, we investigate the performances achieved by parsers when they apply to lan-
guages that are characterized by a relatively free word order and by a rich morphology.
Thus, we perform a detailed quantitative analysis of distributional language data highlighting the relative contribution of a number of distributed grammatical and semantic factors in parsing. We therefore introduce Animacy, a semantic feature usually not present in available treebanks, and discuss its eﬀect in parsing.

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-10202008-150701