logo SBA


Digital archive of theses discussed at the University of Pisa


Thesis etd-10202008-150701

Thesis type
Tesi di dottorato di ricerca
Thesis title
Improving the accuracy of Natural Language Dependency Parsing
Academic discipline
Course of study
Relatore Attardi, Giuseppe
  • animacy
  • chunker
  • dependency tree
  • DeSR
  • natrual language processing
  • natural language parsing
  • parser
  • Shift Reduce
Graduation session start date
Release date
The aim of this thesis is to improve Natural Language Dependency Parsing. We employ a
linear Shift Reduce Dependency parsing algorithm avoiding the increase of computational

We start by presenting our experiments results achieved during our participation at the
multilingual dependency shared task of Conference on Computational Natural Language
(CoNLL) 2007. We perform an accurate error analysis of the best parsers presented at
the conference to reveal critical aspects of parsing systems.

This will lead us to introduce a new parsing method and a new parser combination algorithm with the purpose of improving the deterministic Shift Reduce parser’s accuracy. The new parsing method, called Reverse Revision Parsing, employs a Left-to-Right Shift Reduce parser that parses the sentence followed by a second Right-to-Left Shift Reduce parser that scans the sentence in reverse using additional features obtained from the prediction of the first parser. The new parser combination algorithm, called Quasi-Linear Parser Combination, exploits the fact that its inputs are trees in order to avoid the quadratic cost of algorithms for computing the maximum spanning tree of a graph.

We report on our experiments’ results obtained during the participation at CoNLL-2008 evaluation task. These results have been achieved employing the Reverse Revision Parsing and a new combination algorithm presented during the course of this thesis.

We then present a number of experiments meant to select a set of features that provides
the greatest improvement to a Shift Reduce statistical dependency parser. We report on
the accuracy gains that such parser can obtain using features from gold chunks, from chunks produced using a statistical chunker and from approximate chunks obtained by detecting noun phrases through regular expression patterns. A parser exploiting features from approximate chunks is applied to a chunking task and its accuracy in chunking is compared to that of a specialized statistical chunker.

Finally, we investigate the performances achieved by parsers when they apply to lan-
guages that are characterized by a relatively free word order and by a rich morphology.
Thus, we perform a detailed quantitative analysis of distributional language data highlighting the relative contribution of a number of distributed grammatical and semantic factors in parsing. We therefore introduce Animacy, a semantic feature usually not present in available treebanks, and discuss its effect in parsing.