ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-10012013-112229


Tipo di tesi
Tesi di laurea magistrale
Autore
SPAMPINATO, GIULIA LIA BEATRICE
URN
etd-10012013-112229
Titolo
A minimalist model for simulation of structure and dynamics of helical polypeptides
Dipartimento
FISICA
Corso di studi
FISICA
Relatori
relatore Dott. Tozzini, Valentina
Parole chiave
  • Protein secondary structure
  • Molecular Dynamics simulation
  • Statistical Potentials
Data inizio appello
24/10/2013
Consultabilità
Completa
Riassunto
Proteins are molecular machines, building block and arms of a living cell. They are finely structured biomolecules highly specialized for functional roles. A protein is organized in hierarchical levels. The primary structure is a chain formed by amino acids (of 20 different types) linked together by peptide bonds in a specific sequence forming the polypeptide. The secondary structure describes local recurrent structural motifs adopted by the polypeptide. The tertiary structure is the organization of secondary structures, through the interactions between residues often widely apart in the primary sequence. A quaternary structure can sometimes be recognized in how the tertiary structures of different polypeptide chains organize. The biological function of a protein depends on its overall 3D fold. The chain folds through a stepwise process, generally mirroring the hierarchical structural organization. Thus, to understand the final shape of a protein, the deep comprehension of the secondary structures and of their sequential determinants is mandatory, as the first step of this hierarchy.
Given the rigid geometry of the peptide bond, two internal variables (for each amino- acid) are sufficient to describe the conformation of the backbone of the polypeptide. These are the dihedral angles Φ and Ψ describing the rotation around the two single bonds connecting the central amino-acid Carbon (Cα) with its neighboring amino- and carbossilic- groups along the chain. The distribution of the (Φ,Ψ) couples represented in a plane is called the Ramachandran plot (RP). The two main classes of secondary structures namely helices and sheets, occupy well distinct areas of the RP. More difficult is the separation of the different sub-classes of helices (α-helix, 310-helix and π-helix), located in near and partially superposing areas of the RP. While the sheets are stabilized by hydrogen bonds connecting amino-acids belonging to different strands, often sequentially far, helices are stabilized by periodic and local intra-strand patterns of hydrogen bonds.
Considering the relatively low number of internal variables capable of describing the backbone conformation, several attempts exist in the literature of building a ”minimalist” model for proteins, i.e. the simplest and coarser possible less-than-atomic-resolution model having the following characteristics: (i) possibility of back-mapping to the full atomistic representation, (ii) capability of describing all the different kinds of secondary structures and their dynamics/thermodynamics and (iii) in general, capability of predicting accurately structure and dynamics of the global fold of the protein.
A linear chain of interacting center (“beads”) located on the Cαs is a possibility often considered in the literature. In this class of models, condition (i) is satisfied at least concerning the backbone conformation. The investigation of condition (ii) is one of the focus of this thesis. Condition (iii) is not completely satisfied by any of the existing models belonging to this class, which are either accurate but with low generality and predictive power, or rather general but poorly accurate. This thesis work pursues the goal of building an accurate and predictive minimalist model, and takes some important steps in the road for reaching it.
The first part of this work focuses on analyzing the situations in which (ii) is satisfied. This proceeds from the mapping of the atomistic structures of a protein onto the minimalist representation. In previous models, this mapping was based either on single or groups of structural data (from X-ray crystallography or NMR) or from data sets from atomistic simulations of a given protein of interest. In this work, and in view of giving more generality to the model, it was decided to extend the analysis, virtually to the whole set of existing experimental data. A software was build capable of downloading from the RCSB Protein Data Bank (the world-wide data set of freely available proteins structures) a data set with user-defined properties (e.g., maximum-minimum size of a protein, prevalence (or not) of a given secondary structure, given sequential/structural diversity among proteins, and many other). At will, the software can then “coarse grain” the structures at different levels included of course the one of the minimalist model, and analyze the distributions of internal variables, and their 2D and 3D correlations. The software, called SecStAnT, is made freely available to the scientific community. A paper describing it is currently under revision.
A first important results come from the comparison of the RP build with the atomistic representation with its counterpart for the minimalist model, involving the two conformational internal variables of the Cα chain (i.e. the pseudo-bond angle θ and the pseudo dihedral φ). It was observed that even in the minimalist representation, the secondary structures occupy separated areas in the (θ,φ) plane, indicating that the minimalist model can represent the secondary structures, and that back-mapping to the atomistic RP is possible.
The second part of this work focused on parameterizing the model in such a way that it is capable of reproducing the secondary structures with a high level of accuracy. In this part SecStAnT was used to produce distributions of internal variables θ, φ and other involved in the description of the secondary structures (e.g. the distances between the third, fourth and fifth neighboring Cαs along the chain, related to the hydrogen bonds stabilizing the helices) and their correlations. These data were then used as targets, and the parameterization optimized to reproduce them in the simulations of their minimalist model. The simulations were produced by means of general purpose molecular dynamics package DL POLY, in which the minimalist model was implemented, by means of in- house programmed software tools. The parameters optimization then proceeds by means of a physically driven trial-and-error procedure.
The general goal is to produce a general model capable of describing all the secondary structures. The force field (Hamiltonian) of the model contains a set of conformational terms directly related to the internal variables θ and φ, aimed at describing the general conformational flexibility of the backbone even in the case of weakly structured or de- structured proteins; terms mimicking the hydrogen bonds will then stabilize the different secondary structures. In the final model the relative weight of these terms will be chosen primary-structure dependent.
This is clearly a long time scale project, of which a part was concluded in this work, precisely the one regarding the helices. A model was produced capable of describing with a high degree of accuracy the three main different types of helices. Specifically, the structure, dynamics and distributions and correlations of internal variables of α– and 3–10 helices, from simulations compare well with available experimental data. For π–type helix the experimental data are very a few and elusive, thus our simulations can be considered a prediction in view of comparison with forthcoming experimental data, whose reliability is tested on the other two kinds of helices. These results are achieved with a minimal number of terms in the Hamiltonian, whose meaning can be directly understood in terms of physical interactions (e.g. hydrogen bonds). This, together with the high accuracy, can be considered the main innovation of this model: a physically based parameterization allow to straightforwardly extend the model to include other secondary structures, giving to it generality and predictive power.
This extension is, in fact, the most immediate possible development of this thesis work, which can proceed by including hydrogen bonding patterns typical of the different kinds of sheets. Other more subtle secondary and super-secondary structures can be included with the same procedures. The subsequent determination of their relative weight based on the sequence and the inclusion of accurate long range interactions, can endow the model with the capability of predicting the folding, besides correctly reproducing the internal dynamics and thermodynamics.
File