logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09282006-145216


Tipo di tesi
Tesi di laurea specialistica
Autore
Nicotra, Luca
Indirizzo email
lnicotra@gmail.com
URN
etd-09282006-145216
Titolo
Generative Kernel Functions for Structured Data
Dipartimento
SCIENZE MATEMATICHE, FISICHE E NATURALI
Corso di studi
INFORMATICA
Relatori
Relatore Starita, Antonina
Relatore Micheli, Alessio
Parole chiave
  • machine learning
  • hidden recursive models
  • reti bayesiane
  • statistica applicata
  • modelli probabilistici
  • support vector machines
  • relative probability kernels
  • funzioni di kernel
  • fisher kernels
  • apprendimento automatico
Data inizio appello
13/10/2006
Consultabilità
Non consultabile
Data di rilascio
13/10/2046
Riassunto
In this thesis we explore ways of combining probabilistic models in the context of a class of machine learning algorithms whose data representation is mediated by special distance functions called kernels. A class of generative kernel functions is presented defining embeddings of the input domain based on probabilistic models of the data generating process and then combining these models in order to define a similarity measure on the domain. These kernel functions not only allow to deal easily with structured data, when these are modeled as stochastic processes but, also, consent to insert prior knowledge on the domain, on the data distribution and on hidden relationships as a whole providing a powerful and expressive modeling tool which can stand on the well grounded Bayesian theory. Another important feature of generative kernels is their adaptivity, that is their ability to adapt to specific datasets, at the opposite of other syntax kernels, to which they are compared together with other related approaches. Among the presented classes of kernels, some are extensions of previously defined approaches to more structured domains, while some other are completely new formulations, in particular the class of relative probability kernels. The performances of generative kernels are tested on various benchmarks, comprising a set of simulated data, a classification problem of biological sequences and two domains of molecules modeled as trees: a QSPR (Quantitative Structure Property Relationship) analysis problem on a class of alkanes and a QSAR (Quantitative Structure Activity Relationship) analysis problem on a benzodiazepines class. At last we describe Structlab, a machine learning and applied statistics software library for structured domains, developed during this thesis work. This library aims to be an easy to extend framework for learning experiments with structured data, and provides a toolbox of generative and discriminative learning methods, together with tools for loading, preprocessing, cross validating, and visualization. Structlab is accompanied by a graphical user interface which allows to setup, in a visual and intuitive way, elaborate machine learning experiments.
File