Version
Progress in Biophysics and Molecular Biology
Principles of Membrane Protein Assembly and Structure
Gunnar von Heijne
Department of Biochemistry
Stockholm University
S-106 91 Stockholm, Sweden
Fax: Int+46-8-15 36 79
E-mail: gunnar@dbb.su.se
A good test of the current level of knowledge in a field is the ability to make quantitative predictions. For membrane proteins, this means predictions of topology and structure from the amino acid sequence. Due to the rather strict structural constraints imposed by the lipid environment (basically the requirement that all hydrogen bonds in the lipid-embedded part of the molecule must be satisfied internally), it has long been thought that structural predictions will be simpler for membrane proteins than for globular ones, and, as the following discussion shows, this is probably true. A second simplification possible for membrane proteins of the helix-bundle class but not for §-barrel membrane proteins or for globular proteins is the assumption that membrane insertion and helix-helix packing are strictly separable events: according to the "two stage model" (Popot and de Vitry, 1990; Popot and Engelman, 1990), all transmembrane segments first insert into the bilayer as individually stable helices and these pre-formed helices then pack together.
Topology predictions start from two basic observations: (i) transmembrane helices are 20-30 residues long and have a high overall hydrophobicity, and (ii) short non-translocated loops contain many positively charged residues whereas short translocated loops contain few such residues (or, more generally, the average amino acid composition is different between translocated and non-translocated loops). Observation (i) is the basis for the identification of transmembrane segments from hydrophobicity plots (Kyte and Doolittle, 1992), and observation (ii) makes it possible to predict the orientation of a protein in the membrane, and even to choose the most likely topology when the identification of the transmembrane segments from the hydrophobicity plot is uncertain (von Heijne, 1992).
Topology prediction methods can be more or less sophisticated in terms of how the underlying hydrophobicity scale has been derived (Cornette et al., 1997; Cserzo et al., 1994; Samatey et al., 1995), in terms of the algorithm used (von Heijne, 1992; Jones et al., 1994; Rost et al., 1995; Rost et al., 1996), and in terms of whether or not information from homologous sequences is taken into account (Rost et al., 1995; Persson and Argos, 1996).
The best current methods claim that >90% of all transmembrane segments can be correctly identified and that the full topology is correctly predicted for >80% of all proteins (von Heijne, 1992; Claros and von Heijne, 1994; Persson and Argos, 1996; Rost et al., 1996). It seems that predictions are slightly better for prokaryotic than for eukaryotic proteins, probably in part because better experimental methods to determine topology such as fusion protein analysis have made the database of known topologies larger, and in part because prokaryotic membrane proteins tend to have more hydrophobic transmembrane helices and shorter extra-membraneous loops that conform better to the positive inside rule.
Topology prediction of §-barrel membrane proteins is more difficult since the membrane-spanning §-strands are short and thus hard to detect in the sequence. Reasonable predictions can be made when the protein can be aligned with the sequence of a protein of known structure (Welte et al., 1991). If this is not possible, one can look for short stretches of chain where every second residue is hydrophobic, where the turn potential is low, and that end with aromatic residues (Schirmer and Cowan, 1993), though this only works if one already has a good idea of which part of the protein forms the §-barrel (Nakai and Kanehisa, 1991).
MEMSAT - MEMbrane protein Structure And Topology. MEMSAT is a program which predicts the secondary structure and topology of all-helix integral membrane proteins based on the recognition of topological models. The method employs a set of statistical tables (log likelihood ratios) compiled from well- characterized membrane protein data, and a novel dynamic programming algorithm to recognize membrane topology models by expectation maximization. The statistical tables show definite biases towards certain amino acid species on the inside, middle and outside of a cellular membrane. The method is described in the following reference: Jones, D.T., Taylor, W.R. and Thornton, J. M. (1994) Biochemistry. 33:3038-3049.
Transmembrane helices in integral membrane
proteins are composed of stretches of 15-30 predominantly
hydrophobic residues separated by polar connecting loops. A number
of algorithms designed to identify putative transmembrane helices
in the primary amino acid sequence have been developed, and
current methods can identify around 90-95% of all true
transmembrane segments with an over-prediction rate of only a few
percent [1,2]. The best results have so far been obtained when
multiply aligned sequences can b e analyzed; however, in many
cases there are no homologs in the database and improvements in
single-sequence prediction performance are thus important.
Recently, the so-called Dense Alignment Surface (DAS) method was
introduced in an attempt to improve sequence alignments in the
G-protein coupled receptor family of transmembrane proteins
[3]. We have now generalized this method to predict
transmembrane s egments in any integral membrane protein. DAS is
based on low-stringency dot-plots of the query sequence against
a collection of non-homologous membrane proteins using a
previously derived, special scoring matrix.
We have compared the performance of four different transmembrane
segment prediction methods: a sliding window averaging with
trapezoid window, a method (TOPPRED) based on the "positive
inside" rule, a neural network method (PHDhtm) including
information f rom multiply aligned sequences, and the new DAS
method. The predictive power of DAS and PHDhtm is essentially
the same while the single-sequence based method performs
slightly worse. Incorporating extra information related to the
"positive inside" rule (T OPPRED) brings the predictive power to
the level of the two other methods. This suggests that the DAS
method, which uses only single sequence information, is as good
as the PHDhtm method (which uses multiple sequence alignments)
and TOPPRED (which uses extra information in the form of the
distribution of positively charged residues) in predicting
transmembrane segments in prokaryotic inner membrane proteins.
Transmembrane helices in integral membrane proteins are predicted by a system of neural networks. The shortcoming of the network system is that often too long helices are predicted. These are cut by an empirical filter. The final prediction (Rost et al., Protein Science, 1995, 4, 521-533; evaluation of accuracy) has an expected per-residue accuracy of about 95%. The number of false positives, i.e., transmembrane helices predicted in globular proteins, is about 2% (Rost et al. 1996). The neural network prediction of transmembrane helices (PHDhtm) is refined by a dynamic programming-like algorithm. This method resulted in correct predictions of all transmembrane helices for 89% of the 131 proteins used in a cross-validation test; more than 99% of the transmembrane helices were correctly predicted. The output of this method is used to predict topology, i.e., the orientation of the N-term with respect to the membrane. The expected accuracy of the topology prediction is > 86%. Prediction accuracy is higher than average for eukaryotic proteins and lower than average for prokaryotes. PHDtopology is more accurate than all other methods tested on identical data sets (Rost, Casadio & Fariselli, 1996a and 1996b; evaluation of accuracy).
Recentely improvements in membrane protein predictions has been obtained byusing hidden Markov models specially designed for predicting membrane regions.
A more demanding task is to predict helix-helix packing and 3D structure ab initio. Two different approaches have been tried so far: simulated annealing/energy minimization starting from a large number of systematically generated starting conformations (Treutlein et al., 1992; Arkin et al., 1994; Lemmon et al., 1994; Tuffery et al., 1994; Adams et al., 1995), or an initial prediction of the most likely lipid-exposed and buried helix faces (Baldwin, 1993; Suwa et al., 1995; Efremov and Vergoten, 1996) followed by a combinatorial packing algorithm (Taylor et al., 1994).
As the examples of the glycophorin A dimer and the phospholamban pentamer show (Treutlein et al., 1992; Adams et al., 1995), structure prediction ab initio is still not possible even for such relatively simple and highly symmetrical systems, and experimental information from, e.g., mutagenesis data are needed to identify the most likely model among the set of calculated low-energy structures. An interesting procedure for the systematic use of the available experimental data has recently been proposed in conjunction with the modeling of G-protein coupled receptors (Herzyk and Hubbard, 1995).
Where, finally, is the membrane protein field moving? The basic rules of membrane protein topology seem to be in place, although many interesting mechanistic questions relating to the insertion of membrane proteins into lipid bilayers in vivo remain. 3D structure prediction is still not possible unless there is a fair amount of experimental information to guide the search, and there is an acute need for more high-resolution structures to provide a database for detailed theoretical studies of helix-helix packing.
High-resolution structures are of course also indispensable for understanding the functions of membrane proteins - many of which are of central importance both for cell biology and for the pharmaceutical industry - and overexpression, purification, and crystallization of membrane proteins are now perceived as one of the major challenges in structural biology. It seems a fair bet that crystallization of membrane proteins - although still a very risky undertaking - will be on many peoples agenda during the next decade.
Arne Elofsson Stockholm Bioinformatics Center, Department of Biochemistry, Arrheniuslaboratoriet Stockholms Universitet 10691 Stockholm, Sweden |
Tel: +46-(0)8/161553 Fax: +46-(0)8/158057 Hem: +46-(0)8/6413158 Email: arne@sbc.su.se WWW: /~arne/ |
---|