logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-10202008-150701


Tipo di tesi
Tesi di dottorato di ricerca
Autore
DELL'ORLETTA, FELICE
URN
etd-10202008-150701
Titolo
Improving the accuracy of Natural Language Dependency Parsing
Settore scientifico disciplinare
INF/01
Corso di studi
INFORMATICA
Relatori
Relatore Attardi, Giuseppe
Parole chiave
  • natural language parsing
  • natrual language processing
  • DeSR
  • dependency tree
  • chunker
  • animacy
  • parser
  • Shift Reduce
Data inizio appello
15/12/2008
Consultabilità
Non consultabile
Data di rilascio
15/12/2048
Riassunto
The aim of this thesis is to improve Natural Language Dependency Parsing. We employ a
linear Shift Reduce Dependency parsing algorithm avoiding the increase of computational
costs.

We start by presenting our experiments results achieved during our participation at the
multilingual dependency shared task of Conference on Computational Natural Language
(CoNLL) 2007. We perform an accurate error analysis of the best parsers presented at
the conference to reveal critical aspects of parsing systems.

This will lead us to introduce a new parsing method and a new parser combination algorithm with the purpose of improving the deterministic Shift Reduce parser’s accuracy. The new parsing method, called Reverse Revision Parsing, employs a Left-to-Right Shift Reduce parser that parses the sentence followed by a second Right-to-Left Shift Reduce parser that scans the sentence in reverse using additional features obtained from the prediction of the first parser. The new parser combination algorithm, called Quasi-Linear Parser Combination, exploits the fact that its inputs are trees in order to avoid the quadratic cost of algorithms for computing the maximum spanning tree of a graph.

We report on our experiments’ results obtained during the participation at CoNLL-2008 evaluation task. These results have been achieved employing the Reverse Revision Parsing and a new combination algorithm presented during the course of this thesis.

We then present a number of experiments meant to select a set of features that provides
the greatest improvement to a Shift Reduce statistical dependency parser. We report on
the accuracy gains that such parser can obtain using features from gold chunks, from chunks produced using a statistical chunker and from approximate chunks obtained by detecting noun phrases through regular expression patterns. A parser exploiting features from approximate chunks is applied to a chunking task and its accuracy in chunking is compared to that of a specialized statistical chunker.

Finally, we investigate the performances achieved by parsers when they apply to lan-
guages that are characterized by a relatively free word order and by a rich morphology.
Thus, we perform a detailed quantitative analysis of distributional language data highlighting the relative contribution of a number of distributed grammatical and semantic factors in parsing. We therefore introduce Animacy, a semantic feature usually not present in available treebanks, and discuss its effect in parsing.
File