logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-05212015-181106


Tipo di tesi
Tesi di dottorato di ricerca
Autore
MICALE, GIOVANNI
URN
etd-05212015-181106
Titolo
A Gibbs sampling strategy for mining of protein-protein interaction networks and protein structures
Settore scientifico disciplinare
INF/01
Corso di studi
SCIENZE DI BASE "GALILEO GALILEI"
Relatori
relatore Prof. Ferro, Alfredo
tutor Prof. Ferragina, Paolo
Parole chiave
  • structure
  • network
  • network alignment
  • genes
  • proteins
  • gibbs sampling
Data inizio appello
22/06/2015
Consultabilità
Completa
Riassunto
Complex networks are general and can be used to model phenomena that belongs to different fields of research, from biochemical applications to social networks. However, due to the intrinsic complexity of real networks, their analysis can be computationally demanding. Recently, several statistic and probabilistic analysis approaches have been designed, resulting to be much faster, flexible and effective than deterministic algorithms. Among statistical methods, Gibbs sampling is one of the simplest and most powerful algorithms for solving complex optimization problems and it has been applied in different contexts. It has shown its effectiveness in computational biology but in sequence analysis rather than in network analysis. One approach to analyze complex networks is to compare them, in order to identify similar patterns of interconnections and predict the function or the role of some unknown nodes. Thus, this motivated the main goal of the thesis: designing and implementing novel graph mining techniques based on Gibbs sampling to compare two or more complex networks. The methodology is domain-independent and can work on any complex system of interacting entities with associated attributes. However, in this thesis we focus our attention on protein analysis overcoming the strong current limitations in this area. Proteins can be analyzed from two different points of view: (i) an internal perspective, i.e. the 3D structure of the protein, (ii) an external perspective, i.e. the interactions with other macromolecules. In both cases, a comparative analysis with other proteins of the same or distinct species can reveal important clues for the function of the protein and evolutionary convergences or divergences between different organisms in the way a specific function or process is carried out. First, we present two methods based on Gibbs sampling for the comparative analysis of protein-protein interaction networks: GASOLINE and SPECTRA. GASOLINE is a stochastic and greedy algorithm to find similar groups of interacting proteins in two or more networks. It can align many networks and more quickly than the state-of-the-art methods. SPECTRA is a framework to retrieve and compare networks of proteins that interact with one another in specific healthy or tumor tissues. The aim in this case is to identify changes in protein concentration or protein "behaviour" across different tissues. SPECTRA is an adaptation of GASOLINE for weighted protein-protein interaction networks with gene expressions as node weights. It is the first algorithm proposed for multiple comparison of tissue-specific interaction networks. We also describe a Gibbs sampling based algorithm for 3D protein structure comparison, called PROPOSAL, which finds local structural similarities across two or more protein structures. Experimental results confirm our computational predictions and show that the proposed algorithms are much faster and in most cases more accurate than existing methods.
File