ETD system

Electronic theses and dissertations repository

 

Tesi etd-10202008-150701


Thesis type
Tesi di dottorato di ricerca
Author
DELL'ORLETTA, FELICE
URN
etd-10202008-150701
Title
Improving the accuracy of Natural Language Dependency Parsing
Settore scientifico disciplinare
INF/01
Corso di studi
INFORMATICA
Commissione
Relatore Attardi, Giuseppe
Parole chiave
  • natural language parsing
  • natrual language processing
  • DeSR
  • dependency tree
  • chunker
  • animacy
  • parser
  • Shift Reduce
Data inizio appello
15/12/2008;
Consultabilità
parziale
Data di rilascio
15/12/2048
Riassunto analitico
The aim of this thesis is to improve Natural Language Dependency Parsing. We employ a<br>linear Shift Reduce Dependency parsing algorithm avoiding the increase of computational<br>costs.<br><br> We start by presenting our experiments results achieved during our participation at the<br>multilingual dependency shared task of Conference on Computational Natural Language<br>(CoNLL) 2007. We perform an accurate error analysis of the best parsers presented at<br>the conference to reveal critical aspects of parsing systems.<br><br> This will lead us to introduce a new parsing method and a new parser combination algorithm with the purpose of improving the deterministic Shift Reduce parser’s accuracy. The new parsing method, called Reverse Revision Parsing, employs a Left-to-Right Shift Reduce parser that parses the sentence followed by a second Right-to-Left Shift Reduce parser that scans the sentence in reverse using additional features obtained from the prediction of the first parser. The new parser combination algorithm, called Quasi-Linear Parser Combination, exploits the fact that its inputs are trees in order to avoid the quadratic cost of algorithms for computing the maximum spanning tree of a graph.<br><br> We report on our experiments’ results obtained during the participation at CoNLL-2008 evaluation task. These results have been achieved employing the Reverse Revision Parsing and a new combination algorithm presented during the course of this thesis.<br><br> We then present a number of experiments meant to select a set of features that provides<br>the greatest improvement to a Shift Reduce statistical dependency parser. We report on<br>the accuracy gains that such parser can obtain using features from gold chunks, from chunks produced using a statistical chunker and from approximate chunks obtained by detecting noun phrases through regular expression patterns. A parser exploiting features from approximate chunks is applied to a chunking task and its accuracy in chunking is compared to that of a specialized statistical chunker.<br> <br> Finally, we investigate the performances achieved by parsers when they apply to lan-<br>guages that are characterized by a relatively free word order and by a rich morphology.<br>Thus, we perform a detailed quantitative analysis of distributional language data highlighting the relative contribution of a number of distributed grammatical and semantic factors in parsing. We therefore introduce Animacy, a semantic feature usually not present in available treebanks, and discuss its effect in parsing.<br>
File