|
二级结构预测方法和链接
有许多做结构预测的WEB服务器,下面是简单的总括:
• PSI-pred (PSI-BLAST profiles used for prediction; David Jones, Warwick)
• JPRED Consensus prediction (includes many of the methods given below; Cuff & Barton, EBI)
• DSC King & Sternberg (本服务器)
• PREDATORFrischman & Argos(EMBL)
• PHD home page Rost & Sander,EMBL,Germany
• ZPRED server Zvelebil et al.,Ludwig,U.K.
• nnPredict Cohen et al,UCSF,USA.
• BMERC PSA Server Boston University,USA
• SSP (Nearest-neighbor) Solovyev and Salamov,Baylor College, USA.
With no homologue of known structure from which to make a 3D model, a logical next step is to predict secondary structure. Although they differ in method, the aim of secondary structure prediction is to provide the location of alpha helices, and beta strands within a protein or protein family.
单条序列的方法
二级结构预测已经存在约1/4世纪了,早期的方法受制于缺乏数据,仅对单条序列进行预测,而不是对同源序列家族,而且能得到数据的已知3D结构较少。早期最有名的方法是Chou & Fasman,Garnier,Osguthorbe & Robson (GOR)以及Lim。尽管作者开始声称准确率很高(70-80 %),仔细检查后,这些方法仅有56 到60%的准确率(Kabsch & Sander,1984,见下)。早期预测二级结构的一个问题是 An early problem in secondary structure prediction had been the inclusion of structures used to derive parameters in the set of structures used to assess the accuracy of the method.
关于主题的一些好的参考资料:
• 对单条序列的早期方法Early methods on single sequences
o Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, 211-222.
o Lim, V.I. (1974). Journal of Molecular Biology, 88, 857-872.
o Garnier, J., Osguthorpe, D.~J. \& Robson, B. (1978).Journal of Molecular Biology, 120, 97-120.
o Kabsch, W. & Sander, C. (1983). FEBS Letters, 155, 179-182. (An assessment of the above methods)
• Later methods on single sequences
o Deleage, G. & Roux, B. (1987). Protein Engineering , 1, 289-294 (DPM)
o Presnell, S.R., Cohen, B.I. & Cohen, F.E. (1992). Biochemistry, 31, 983-993.
o Holley, H.L. & Karplus, M. (1989). Proceedings of the National Academy of Science, 86, 152-156.
o King, R. & Sternberg, M. J.E. (1990). Journal of Molecular Biology, 216, 441-457.
o D. G. Kneller, F. E. Cohen & R. Langridge (1990) Improvements in Protein Secondary Structure Prediction by an Enhanced Neural Network, Journal of Molecular Biology, 214, 171-182. (NNPRED)
Recent improvments
The availability of large families of homologous sequences revolutionised secondary structure prediction. Traditional methods, when applied to a family of proteins rather than a single sequence proved much more accurate at identifying core secondary structure elements. The combination of sequence data with sophisticated computing techniques such as neural networks has lead to accuracies well in excess of 70 %. Though this seems a small percentage increase, these predictions are actually much more useful than those for single sequence, since they tend to predict the core accurately. Moreover, the limit of 70-80% may be a function of secondary structure variation within homologous proteins.
Automated methods
There are numerous automated methods for predicting secondary structure from multiply aligned protein sequences. Some good references on the subject include (the acronyms in parentheses given after each reference refer to the associated WWW servers, given below):
• Zvelebil, M.J.J.M., Barton, G.J., Taylor, W.R. & Sternberg, M.J.E. (1987). Prediction of Protein Secondary Structure and Active Sites Using the Alignment of Homologous Sequences Journal of Molecular Biology, 195, 957-961. (ZPRED)
• Rost, B. & Sander, C. (1993), Prediction of protein secondary structure at better than 70 % Accuracy, Journal of Molecular Biology, 232, 584-599. PHD)
• Salamov A.A. & Solovyev V.V. (1995), Prediction of protein secondary sturcture by combining nearest-neighbor algorithms and multiply sequence alignments. Journal of Molecular Biology, 247,1 (NNSSP)
• Geourjon, C. & Deleage, G. (1994), SOPM : a self optimised prediction method for protein secondary structure prediction. Protein Engineering, 7, 157-16. (SOPMA)
• Solovyev V.V. & Salamov A.A. (1994) Predicting alpha-helix and beta-strand segments of globular proteins. (1994) Computer Applications in the Biosciences,10,661-669. (SSP)
• Wako, H. & Blundell, T. L. (1994), Use of amino-acid environment-depdendent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. 2. Secondary Structures, Journal of Molecular Biology, 238, 693-708.
• Mehta, P., Heringa, J. & Argos, P. (1995), A simple and fast approach to prediction of protein secondary structure from multiple aligned sequences with accuracy above 70 %. Protein Science, 4, 2517-2525. (SSPRED)
• King, R.D. & Sternberg, M.J.E. (1996) Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci,5, 2298-2310. (DSC).
Nearly all of these now run via the world wide web. For individual details, see the papers for the individual methods, or click on the underlined acronyms given after most of the references given above (note that you can also run the methods by going to the approriate WWW site).
Manual intervention
It has long been recognised that patterns of residue conservation are indicative of particular secondary structure types. Alpha helices have a periodicity of 3.6, which means that for helices with one face buried in the protein core, and the other exposed to solvent, will have residues at positions i, i+3, i+4 & i+7 (where i is a residue in an a helix) will lie on one face of the helix. Many alpha helices in proteins are amphipathic, meaning that one face is pointing towards the hydrophobic core and the other towards the solvent. Thus patterns of hydrophobic residue conservation showing the i, i+3, i+4, i+7 pattern are highly indicative of an alpha helix.
For example, this helix in myoglobin has this classic pattern of hydrophobic and polar residue conservation (i = 1):
Similarly, the geometry of beta strands means that adjacent residues have their side chains pointing in oppposite directions. Beta strands that are half buried in the protein core will tend to have hydrophobic residues at positions i, i+2, i+4, i+8 etc, and polar residues at positions i+1, i+3, i+5, etc.
For example, this beta strand in CD8 shows this classic pattern:
Beta strands that are completely buried (as is often the case in proteins containing both alpha helices and beta strands) usually contain a run of hydrophobic residues, since both faces are buried in the protein core.
This strand from Chemotaxis protein CheY is a good example:
The principle behind most manual secondary structure predictions is to look for patterns of residue conservation that are indicative of secondary structures like those shown above. It has been shown in numerous successful examples that this strategy often leads to nearly perfect predictions. The work of Barton et al, Nierman & Krischner, Bazan and Benner & co-workers provide good starting points for getting doing this sort of work oneself. Some useful references are:
• Recent reviews on the subject (and on secondary structure prediction generally) See also references therein
o Rost, B., Schneider, R. & Sander, C. (1993), Trends in Biochemical Sciences, 18, 120-123.
o Benner, S. A., Gerloff, D. L. & Jenny, T. F. (1994), Science, 265, 1642-1644.
o Barton, G. J. (1995), Protein Secondary Structure Prediction, Current Opinion in Structural Biology,5, 372-376.
o Russell, R. B. & Sternberg, M. J. E. (1995), Protein Structure Prediction: How Good Are We?, Current Biology, 5, 488-490.
• Some guides for predicting structure:
o Benner, S. A. (1989), Patterns of divergence in homolgous proteins as indicators of tertiary and quaternary structure, Advances in Enzyme Regulation, 31, 219-236.
o Benner, S. A. (1992), Predicting de novo the folded structure of proteins, Current Opinion in Structural Biology, 2, 402-412.
• Some particular examples of protein secondary structure predictions:
o Crawford, I. P., Niermann, T. & Kirschner, K. (1987), Predictions of secondary structure by evolutionary comparison: Application to the alpha subunit of tryptophan synthase, PROTEINS: Structure, Function and Genetics, 1, 118-129.
o Bazan, J. F. (1990), Structural Design and Molecular Evolution of a Cytokine Receptor Superfamily,Proceedings of the National Academy of Science, 87, 6934-6938.
o Benner, S. A. & Gerloff, D. (1990), Patterns of Divergence in Homologous Proteins and tertiary structure. A prediction of the structure of the catalytic domain of protein kinases, Advances in Enzyme Regulation, 31, 121-181.
o Jenny, T. F. & Benner, S. A. (1994) A prediction of the secondary structure of the pleckstrin homology domain, A prediction of the secondary structure of the pleckstrin homology domain, PROTEINS: Structure, Function and Genetics, 20, 1-3.
o Benner, S. A., Badcoe, I., Cohen, M. A. and Gerloff, D. L. (1993) Predicted secondary structure for the src homology 3 domain, Journal of Molecular Biology, 229, 295-305.
o Gerloff, D. L., Jenny, T. F., Knecht, L. J., Gonnet, G.H. & Benner, S. A. (1993), The nitrogenase MoFe protein. A secondary structure prediction. FEBS Letters, 318, 118-124.
o Gerloff, D. L., Chelvanayagam, G. & Benner, S. A. (1995), A predicted consensus structure for the protein-kinase c2 homology (c2h) domain, the repeating unit of synaptotagmin, PROTEINS: Structure, Function and Genetics, 22, 299-310.
o Barton, G. J., Newman, R. H., Freemont, P. F. & Crumpton, M. J. (1991), Amino acid sequence analysis of the annexin super-gene family of proteins, European Journal of Biochemistry, 198, 749-760.
o Russell, R. B., Breed, J. & Barton, G. J., (1992) Conservation analysis and secondary structure prediction of the SH2 family of phosphotyrosine binding domains, FEBS Letters, 304, 15-20.
o Livingstone, C. D. & Barton, G. J. (1994), Secondary structure prediction from multiple sequence data: Blood clotting factor XII and Yersinia protein tyrosine phosphatase, International Journal of Peptide and Protein Research
o Barton, G. J., Barford, D. A. & Cohen, P. T. (1994), European Journal of Biochemsitry, 220, 225-237.
o Perkins, S. J., Smith K. F., Williams, S. C., Haris, P. I., Chapman, D. & Sim, R. B. (1994), The secondary structure of the von Willebrand Factor Type A Domain in Factor B of Human Complement by Fourier Transform Infrared Spectroscopy, Journal of Molecular Biology, 238, 104-119.
o Edwards, Y. J. K. & Perkins, S. J., (1995) The protein fold of the von Willebrand factor type A is predicted to be similar to the open twisted beta-sheet flanked by alpha-helices found in human ras-p21, 358, 283-286.
o Lupas, A., Koster, A. J., Walz, J. & Baumeister, W. (1994) Predicted secondary structure of the 20S proteasome and model structure of the putative peptide channel, FEBS Letters, 354, 45-49.
A strategy for secondary structure prediction
In practice, I recommend getting as many state-of-the-art prediction approaches as possible and combining this with some human insight to give a consensus prediction for the family. If you then align all of your predictions (including ideas you have based on residue conservation) with your multiple sequence alignment you can get a consensus picture of the structure. For example, here is part of an alignment of a family of proteins I looked at recently:
In this figure, three automated secondary structure predictions (PHD, SOPMA and SSPRED) appear below the alignment of 12 glutamyl tRNA reductase sequences. Positions within the alignment showing a conservation of hydrophobic side-chain character are shown in yellow, and those showing near total conservation of non-hydrophobic residues (often indicative of active sites) are coloured green.
Predictions of accessibility performed by PHD (PHD Acc. Pred.) are also shown (b = buried, e = exposed), as is a prediction I performed by looking for patterns indicative of the three secondary structure types shown above. For example, positions (within the alignment) 38-45 exhibit the classical amphipathic helix pattern of hydrophobic residue conservation, with positions i, i+3, i+4 and i+7 showing a conservation of hydrophobicity, with intervening positions being mostly polar. Positions 13-16 comprise a short stretch of conserved hydrophobic residues, indicative of a beta-strand, similar to the example from CheY protein shown above.
By looking for these patterns I built up a prediction of the secondary structure for most regions of the protein. Note that most methods - automated and manual - agree for many regions of the alignment.
Given the results of several methods of predicting secondary structure, one can build up a consensus picture of the secondary structure, such as that shown at the bottom of the alignment above.
Note that you can get predictions like the above (i.e. consensus predictions) from the very useful JPRED server. |
|