24小时热门版块排行榜    

查看: 5067  |  回复: 32

cnlics

木虫 (小有名气)

[交流] 【分享】蛋白质结构预测流程已有23人参与

我慢慢翻译慢慢贴

这里贴的内容是以前收集的,应该是来自EMBL,我粗略浏览了下内容,还没有过时。

WORD文档可以在这里下载:
http://ifile.it/dwzy278

蛋白质结构预测一般流程见下图:


内容目录:

•相关实验数据
•序列数据和初步分析
•搜索序列数据库
•识别结构域
•多序列比对
•比较或同源建模
•二级结构预测
•折叠的识别
•折叠分析与二级结构比对
•序列与结构的比对

[ Last edited by cnlics on 2010-9-16 at 08:24 ]
回复此楼

» 收录本帖的淘贴专辑推荐

蛋白质生物学实验经验 分子生物实验及蛋白纯化结晶相关链接 生物信息学 生物化学和分子生物学
精品收藏 待下载 蛋白质 交叉知识
比偶长大 蛋白 分析软件 生物信息学

» 本帖已获得的红花(最新10朵)

» 猜你喜欢

» 本主题相关价值贴推荐,对您同样有帮助:

已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

cnlics

木虫 (小有名气)

实验数据

许多实验数据可以辅助结构预测过程,包括:
•二硫键,固定了半胱氨酸的空间位置
•光谱数据,可以提供蛋白的二级结构内容
•定位突变研究,可以发现活性或结合位点的残基
•蛋白酶切割位点,翻译后修饰如磷酸化或糖基化提示了残基必须是暴露的
•其他
预测时,必须清楚所有的数据。必须时刻考虑:预测与实验结果是否一致?如果不是,就有必要修改做法。

[ Last edited by cnlics on 2010-9-14 at 19:31 ]
2楼2010-09-14 01:41:00
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

cnlics

木虫 (小有名气)

蛋白序列数据

对蛋白序列的初步分析有一定价值。例如,如果蛋白是直接来自基因预测,就可能包含多个结构域。更严重的是,可能会包含不太可能是球形或可溶性的区域。此流程图假设你的蛋白是可溶的,可能是一个结构域并不包含非球形结构域。

需要考虑以下方面:
•是跨膜蛋白或者包含跨膜片段吗?有许多方法预测这些片段,包括:

    o TMAP (EMBL)
    o PredictProtein (EMBL/Columbia)
    o TMHMM (CBS, Denmark)
    o TMpred (Baylor College)
    o DAS (Stockholm)

•如果包含卷曲(coiled-coils)可以在COILS server 预测coiled coils 或者下载 COILS 程序(最近已经重写,注意GCG程序包里包含了COILS的一个版本)

•蛋白包含低复杂性区域?蛋白经常含有数个聚谷氨酸或聚丝氨酸区,这些地方不容易预测。可以用SEG(GCG程序包里包含了一个版本的SEG程序)检查 。

如果出现以上一种情况,就应该将序列打成碎片,或忽略序列中的特定区段,等等。这个问题与细胞定位结构域相关。

[ Last edited by cnlics on 2010-9-16 at 08:25 ]
3楼2010-09-14 01:41:58
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

cnlics

木虫 (小有名气)

搜索序列数据库

分析任何新序列的第一步显然是搜索序列数据库以发现同源序列。这样的搜索可以在任何地方或者在任何计算机上完成。而且,有许多WEB服务器可以进行此类搜索,可以输入或粘贴序列到服务器上并交互式地接收结果。

序列搜索也有许多方法,目前最有名的是BLAST程序。可以容易得到在本地运行的版本(从 NCBI 或者 Washington University),也有许多的WEB页面允许对多基因或蛋白质序列的数据库比较蛋白质或DNA序列,仅举几个例子:
•National Center for Biotechnology Information (USA) Searches
•European Bioinformatics Institute (UK) Searches
•BLAST search through SBASE (domain database; ICGEB, Trieste)
•还有更多的站点

最近序列比较的重要进展是发展了gapped BLAST 和PSI-BLAST (position specific interated BLAST),二者均使BLAST更敏感,后者通过选取一条搜索结果,建立模式(profile),然后用再它搜索数据库寻找其他同源序列(这个过程可以一直重复到发现不了新的序列为止),可以探测进化距离非常远的同源序列。很重要的一点是,在利用下面章节方法之前,通过PSI-BLAST把蛋白质序列和数据库比较,找寻是否有已知结构。
将一条序列和数据库比较的其他方法有:
•FASTA软件包 (William Pearson, University of Virginia, USA)
•SCANPS (Geoff Barton, European Bioinformatics Institute, UK)
•BLITZ (Compugen's fast Smith Waterman search)
•其他方法.

It is also possible to use multiple sequence information to perform more sensitive searches. Essentially this involves building a profile from some kind of multiple sequence alignment. A profile essentially gives a score for each type of amino acid at each position in the sequence, and generally makes searches more sentive. Tools for doing this include:
•PSI-BLAST (NCBI, Washington)
•ProfileScan Server (ISREC, Geneva)
•HMMER 隐马氏模型(Sean Eddy, Washington University)
•Wise package (Ewan Birney, Sanger Centre;用于蛋白质对DNA的比较)
•其他方法.

A different approach for incorporating multiple sequence information into a database search is to use a MOTIF. Instead of giving every amino acid some kind of score at every position in an alignment, a motif ignores all but the most invariant positions in an alignment, and just describes the key residues that are conserved and define the family. Sometimes this is called a "signature". For example, "H-[FW]-x-[LIVM]-x-G-x(5)-[LV]-H-x(3)-[DE]" describes a family of DNA binding proteins. It can be translated as "histidine, followed by either a phenylalanine or tryptophan, followed by an amino acid (x), followed by leucine, isoleucine, valine or methionine, followed by any amino acid (x), followed by glycine,... [etc.]".

PROSITE (ExPASy Geneva) contains a huge number of such patterns, and several sites allow you to search these data:
•ExPASy
•EBI

It is best to search a few different databases in order to find as many homologues as possible. A very important thing to do, and one which is sometimes overlooked, is to compare any new sequence to a database of sequences for which 3D structure information is available. Whether or not your sequence is homologous to a protein of known 3D structure is not obvious in the output from many searches of large sequence databases. Moreover, if the homology is weak, the similarity may not be apparent at all during the search through a larger database.

One last thing to remember is that one can save a lot of time by making use of pre-prepared protein alignments. Many of these alignments are hand edited by experts on the particular protein families, and thus represent probably the best alignment one can get given the data they contain (i.e. they are not always as up to date as the most recent sequence databases). These databases include:
•SMART (Oxford/EMBL)
•PFAM (Sanger Centre/Wash-U/Karolinska Intitutet)
•COGS (NCBI)
•PRINTS (UCL/Manchester)
•BLOCKS (Fred Hutchinson Cancer Research Centre, Seatle)
•SBASE (ICGEB, Trieste)

通常把蛋白质序列和数据比较都有很多的方法,这些对于识别结构域非常有用。

[ Last edited by cnlics on 2010-9-14 at 19:54 ]
4楼2010-09-14 01:42:52
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

cnlics

木虫 (小有名气)

确定结构域

If you have a sequence of more than about 500 amino acids, you can be nearly certain that it will be divided into discrete functional domains. If possible, it is preferable to split such large proteins up and consider each domain separately. You can predict the locatation of domains in a few different ways. The methods below are given (approximately) from most to least confident.
•        If homology to other sequences occurs only over a portion of the probe sequence and the other sequences are whole (i.e. not partial sequences), then this provides the strongest evidence for domain structure. You can either do database searches yourself or make use of well-curated, pre-defined databases of protein domains. Searches of these databases (see links below) will often assign domains easily.
o        SMART (Oxford/EMBL)
o        PFAM (Sanger Centre/Wash-U/Karolinska Intitutet)
o        COGS (NCBI)
o        PRINTS (UCL/Manchester)
o        BLOCKS (Fred Hutchinson Cancer Research Centre, Seatle)
o        SBASE (ICGEB, Trieste)
You can also find domain descriptions in the annotations in SWISSPROT.
•        Regions of low-complexity often separate domains in multidomain proteins. Long stretches of repeated residues, particularly Proline, Glutamine, Serine or Threonine often indicate linker sequences and are usually a good place to split proteins into domains.
Low complexity regions can be defined using the program SEG which is generally available in most BLAST distributions or web servers (a version of SEG is also contained within the GCG suite of programs).
•        Transmembrane segments are also very good dividing points, since they can easily separate extracellular from intracellular domains. There are many methods for predicting these segments, including:
o        TMAP (EMBL)
o        PredictProtein (EMBL/Columbia)
o        TMHMM (CBS, Denmark)
o        TMpred (Baylor College)
o        DAS (Stockholm)
•        Something else to consider are the presence of coiled-coils. These unusual structural features sometimes (but not always) indicate where proteins can be divided into domains. You can predict coiled coils at the COILS server or you can download the COILS program (recently re-written by me of all people; a version of SEG is also contained within the GCG suite of programs).
•        Secondary structure prediction methods (see below) will often predict regions of proteins to have different protein structural classes. For example one region of sequence may be predicted to contain only lpha helices and another to contain only beta sheets. These can often, though not always, suggest likely domain structure (e.g. an all alpha domain and an all beta domain)
If you have separated a sequence into domains, then it is very important to repeat all the database searches and alignments using the domains separately. Searches with sequences containing several domains may not find all sub-homologies, particularly if the domains are abundent in the database (e.g. kinases, SH2 domains, etc.). There may also be "hidden" domains. For example if there is a stretch of 80 amino acids with few homologues nested in between a kinase and an SH2 domain, then you may miss matches found when searching the whole sequence against a database.
Anyway, here is my slide from the talk related to this subject:
5楼2010-09-14 01:44:10
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

cnlics

木虫 (小有名气)

多序列比对

Regardless of the outcome of your searches, you will want a multiple sequence alignment containing your sequence and all the homologues you have found above.
Some sites for performing multiple alignment:
•        EBI (UK) Clustalw Server
•        IBCP (France) Multalin Server
•        IBCP (France) Clustalw Server
•        IBCP (France) Combined Multalin/Clustalw
•        MSA (USA) Server
•        BCM Multiple Sequence Alignment ClustalW Sever (USA)
If you are going to do a lot of alignments, then it is probably best to get your own copy of one of many programs, some FTP sites for some of these are:
•        HMMer (HMM method, Wash U)
•        SAM (HMM method, Santa Cruz)
•        ClustalW (EBI,UK)
•        ClustalW (USA)
•        MSA (USA)
•        AMPS (UK)
Note that PileUp is contained within the GCG commercial package. Most institutions with people doing this sort of work will have access to this software, so ask around if you want to use it.
Probably the most important advance since these pages first appeared are Hidden Markov Models for sequence alignment. Several methods are listed above.
Alignments can provide:
•        Information as to protein domain structure
•        The location of residues likely to be involved in protein function
•        Information of residues likely to be buried in the protein core or exposed to solvent
•        More information than a single sequence for applications like homology modelling and secondary structure prediction.
Some tips
•        Don't just take everything found in the searches and feed them directly into the alignment program. Searches will almost always return matches that do not indicate a significant sequence similarity. Look through the output carefully and throw things out if they don't appear to be a member of the sequence family. Inclusion of non-members in your alignment will confuse things and likely lead to errors later.
•        Remember that the programs for aligning sequences aren't perfect, and do not always provide the best alignment. This is particularly so for large families of proteins with low sequence identities. If you can see a better way of aligning the sequences, then by all means edit the alignment manually.
6楼2010-09-14 01:45:08
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

cnlics

木虫 (小有名气)

比较或同源建模

如果蛋白序列和已知三维结构的其他蛋白有显著的相似性,就可以通过同源建模的方法获得这个蛋白相当精确的3D结构。It is also possible to build models if you have found a suitable fold via fold recognition and are happy with the alignment of sequence to structure (Note that the accuracy of models constructed in this manner has not been assessed properly, so treat with caution).
It is possible now to generate models automatically using the very useful SWISSMODEL server.
Some other sites useful for homology modelling include:
•        WHAT IF (G. Vriend, EMBL, Heidelberg)
•        MODELLER (A. Sali, Rockefeller University)
•        MODELLER Mirror FTP site
Sequence alignments, particularly those involving proteins having low percent sequence identities can be inacurrate. If this is the case, then a model built using the alignment will obvious be wrong in some places. I would suggest that you look over the alignment carefully before building a model.
Note that when using SWISSMODEL it is possible to send in a protein sequence only. I would only recommend doing this if the degree of sequence homology is high (50% or greater) for the above reasons. It is best, particularly if one has edited an alignment, to send an alignment directly to the server.
Once you have a three-dimensional model, it is useful to look at protein 3D structures. There are numerous free programs for doing this, including:
•        GRASP Anthony Nicholls, Columbia, USA.
•        MolMol Reto Koradi, ETH, Zurrich, C.H.
•        Prepi Suhail Islam, ICRF, U.K.
•        RasMol Roger Sayle, Glaxo, U.K.
Most places with groups studying structural biology also have commercial packages, such as Quanta, SYBL or Insight, which contain more features than the visualisation packages described above. Crystallographers also tend to use O and FRODO, though these require a lot of experience to use with ease.
7楼2010-09-14 01:47:00
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

cnlics

木虫 (小有名气)

二级结构预测方法和链接

有许多做结构预测的WEB服务器,下面是简单的总括:
•        PSI-pred (PSI-BLAST profiles used for prediction; David Jones, Warwick)
•        JPRED Consensus prediction (includes many of the methods given below; Cuff & Barton, EBI)
•        DSC King & Sternberg (本服务器)
•        PREDATORFrischman & Argos(EMBL)
•        PHD home page Rost & Sander,EMBL,Germany
•        ZPRED server Zvelebil et al.,Ludwig,U.K.
•        nnPredict Cohen et al,UCSF,USA.
•        BMERC PSA Server Boston University,USA
•        SSP (Nearest-neighbor) Solovyev and Salamov,Baylor College, USA.
With no homologue of known structure from which to make a 3D model, a logical next step is to predict secondary structure. Although they differ in method, the aim of secondary structure prediction is to provide the location of alpha helices, and beta strands within a protein or protein family.
单条序列的方法
二级结构预测已经存在约1/4世纪了,早期的方法受制于缺乏数据,仅对单条序列进行预测,而不是对同源序列家族,而且能得到数据的已知3D结构较少。早期最有名的方法是Chou & Fasman,Garnier,Osguthorbe & Robson (GOR)以及Lim。尽管作者开始声称准确率很高(70-80 %),仔细检查后,这些方法仅有56 到60%的准确率(Kabsch & Sander,1984,见下)。早期预测二级结构的一个问题是 An early problem in secondary structure prediction had been the inclusion of structures used to derive parameters in the set of structures used to assess the accuracy of the method.
关于主题的一些好的参考资料:
•        对单条序列的早期方法Early methods on single sequences
o        Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, 211-222.
o        Lim, V.I. (1974). Journal of Molecular Biology, 88, 857-872.
o        Garnier, J., Osguthorpe, D.~J. \& Robson, B. (1978).Journal of Molecular Biology, 120, 97-120.
o        Kabsch, W. & Sander, C. (1983). FEBS Letters, 155, 179-182. (An assessment of the above methods)
•        Later methods on single sequences
o        Deleage, G. & Roux, B. (1987). Protein Engineering , 1, 289-294 (DPM)
o        Presnell, S.R., Cohen, B.I. & Cohen, F.E. (1992). Biochemistry, 31, 983-993.
o        Holley, H.L. & Karplus, M. (1989). Proceedings of the National Academy of Science, 86, 152-156.
o        King, R. & Sternberg, M. J.E. (1990). Journal of Molecular Biology, 216, 441-457.
o        D. G. Kneller, F. E. Cohen & R. Langridge (1990) Improvements in Protein Secondary Structure Prediction by an Enhanced Neural Network, Journal of Molecular Biology, 214, 171-182. (NNPRED)
Recent improvments
The availability of large families of homologous sequences revolutionised secondary structure prediction. Traditional methods, when applied to a family of proteins rather than a single sequence proved much more accurate at identifying core secondary structure elements. The combination of sequence data with sophisticated computing techniques such as neural networks has lead to accuracies well in excess of 70 %. Though this seems a small percentage increase, these predictions are actually much more useful than those for single sequence, since they tend to predict the core accurately. Moreover, the limit of 70-80% may be a function of secondary structure variation within homologous proteins.
Automated methods
There are numerous automated methods for predicting secondary structure from multiply aligned protein sequences. Some good references on the subject include (the acronyms in parentheses given after each reference refer to the associated WWW servers, given below):
•        Zvelebil, M.J.J.M., Barton, G.J., Taylor, W.R. & Sternberg, M.J.E. (1987). Prediction of Protein Secondary Structure and Active Sites Using the Alignment of Homologous Sequences Journal of Molecular Biology, 195, 957-961. (ZPRED)
•        Rost, B. & Sander, C. (1993), Prediction of protein secondary structure at better than 70 % Accuracy, Journal of Molecular Biology, 232, 584-599. PHD)
•        Salamov A.A. & Solovyev V.V. (1995), Prediction of protein secondary sturcture by combining nearest-neighbor algorithms and multiply sequence alignments. Journal of Molecular Biology, 247,1 (NNSSP)
•        Geourjon, C. & Deleage, G. (1994), SOPM : a self optimised prediction method for protein secondary structure prediction. Protein Engineering, 7, 157-16. (SOPMA)
•        Solovyev V.V. & Salamov A.A. (1994) Predicting alpha-helix and beta-strand segments of globular proteins. (1994) Computer Applications in the Biosciences,10,661-669. (SSP)
•        Wako, H. & Blundell, T. L. (1994), Use of amino-acid environment-depdendent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. 2. Secondary Structures, Journal of Molecular Biology, 238, 693-708.
•        Mehta, P., Heringa, J. & Argos, P. (1995), A simple and fast approach to prediction of protein secondary structure from multiple aligned sequences with accuracy above 70 %. Protein Science, 4, 2517-2525. (SSPRED)
•        King, R.D. & Sternberg, M.J.E. (1996) Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci,5, 2298-2310. (DSC).
Nearly all of these now run via the world wide web. For individual details, see the papers for the individual methods, or click on the underlined acronyms given after most of the references given above (note that you can also run the methods by going to the approriate WWW site).
Manual intervention
It has long been recognised that patterns of residue conservation are indicative of particular secondary structure types. Alpha helices have a periodicity of 3.6, which means that for helices with one face buried in the protein core, and the other exposed to solvent, will have residues at positions i, i+3, i+4 & i+7 (where i is a residue in an a helix) will lie on one face of the helix. Many alpha helices in proteins are amphipathic, meaning that one face is pointing towards the hydrophobic core and the other towards the solvent. Thus patterns of hydrophobic residue conservation showing the i, i+3, i+4, i+7 pattern are highly indicative of an alpha helix.
For example, this helix in myoglobin has this classic pattern of hydrophobic and polar residue conservation (i = 1):

Similarly, the geometry of beta strands means that adjacent residues have their side chains pointing in oppposite directions. Beta strands that are half buried in the protein core will tend to have hydrophobic residues at positions i, i+2, i+4, i+8 etc, and polar residues at positions i+1, i+3, i+5, etc.
For example, this beta strand in CD8 shows this classic pattern:

Beta strands that are completely buried (as is often the case in proteins containing both alpha helices and beta strands) usually contain a run of hydrophobic residues, since both faces are buried in the protein core.
This strand from Chemotaxis protein CheY is a good example:

The principle behind most manual secondary structure predictions is to look for patterns of residue conservation that are indicative of secondary structures like those shown above. It has been shown in numerous successful examples that this strategy often leads to nearly perfect predictions. The work of Barton et al, Nierman & Krischner, Bazan and Benner & co-workers provide good starting points for getting doing this sort of work oneself. Some useful references are:
•        Recent reviews on the subject (and on secondary structure prediction generally) See also references therein
o        Rost, B., Schneider, R. & Sander, C. (1993), Trends in Biochemical Sciences, 18, 120-123.
o        Benner, S. A., Gerloff, D. L. & Jenny, T. F. (1994), Science, 265, 1642-1644.
o        Barton, G. J. (1995), Protein Secondary Structure Prediction, Current Opinion in Structural Biology,5, 372-376.
o        Russell, R. B. & Sternberg, M. J. E. (1995), Protein Structure Prediction: How Good Are We?, Current Biology, 5, 488-490.
•        Some guides for predicting structure:
o        Benner, S. A. (1989), Patterns of divergence in homolgous proteins as indicators of tertiary and quaternary structure, Advances in Enzyme Regulation, 31, 219-236.
o        Benner, S. A. (1992), Predicting de novo the folded structure of proteins, Current Opinion in Structural Biology, 2, 402-412.
•        Some particular examples of protein secondary structure predictions:
o        Crawford, I. P., Niermann, T. & Kirschner, K. (1987), Predictions of secondary structure by evolutionary comparison: Application to the alpha subunit of tryptophan synthase, PROTEINS: Structure, Function and Genetics, 1, 118-129.
o        Bazan, J. F. (1990), Structural Design and Molecular Evolution of a Cytokine Receptor Superfamily,Proceedings of the National Academy of Science, 87, 6934-6938.
o        Benner, S. A. & Gerloff, D. (1990), Patterns of Divergence in Homologous Proteins and tertiary structure. A prediction of the structure of the catalytic domain of protein kinases, Advances in Enzyme Regulation, 31, 121-181.
o        Jenny, T. F. & Benner, S. A. (1994) A prediction of the secondary structure of the pleckstrin homology domain, A prediction of the secondary structure of the pleckstrin homology domain, PROTEINS: Structure, Function and Genetics, 20, 1-3.
o        Benner, S. A., Badcoe, I., Cohen, M. A. and Gerloff, D. L. (1993) Predicted secondary structure for the src homology 3 domain, Journal of Molecular Biology, 229, 295-305.
o        Gerloff, D. L., Jenny, T. F., Knecht, L. J., Gonnet, G.H. & Benner, S. A. (1993), The nitrogenase MoFe protein. A secondary structure prediction. FEBS Letters, 318, 118-124.
o        Gerloff, D. L., Chelvanayagam, G. & Benner, S. A. (1995), A predicted consensus structure for the protein-kinase c2 homology (c2h) domain, the repeating unit of synaptotagmin, PROTEINS: Structure, Function and Genetics, 22, 299-310.
o        Barton, G. J., Newman, R. H., Freemont, P. F. & Crumpton, M. J. (1991), Amino acid sequence analysis of the annexin super-gene family of proteins, European Journal of Biochemistry, 198, 749-760.
o        Russell, R. B., Breed, J. & Barton, G. J., (1992) Conservation analysis and secondary structure prediction of the SH2 family of phosphotyrosine binding domains, FEBS Letters, 304, 15-20.
o        Livingstone, C. D. & Barton, G. J. (1994), Secondary structure prediction from multiple sequence data: Blood clotting factor XII and Yersinia protein tyrosine phosphatase, International Journal of Peptide and Protein Research
o        Barton, G. J., Barford, D. A. & Cohen, P. T. (1994), European Journal of Biochemsitry, 220, 225-237.
o        Perkins, S. J., Smith K. F., Williams, S. C., Haris, P. I., Chapman, D. & Sim, R. B. (1994), The secondary structure of the von Willebrand Factor Type A Domain in Factor B of Human Complement by Fourier Transform Infrared Spectroscopy, Journal of Molecular Biology, 238, 104-119.
o        Edwards, Y. J. K. & Perkins, S. J., (1995) The protein fold of the von Willebrand factor type A is predicted to be similar to the open twisted beta-sheet flanked by alpha-helices found in human ras-p21, 358, 283-286.
o        Lupas, A., Koster, A. J., Walz, J. & Baumeister, W. (1994) Predicted secondary structure of the 20S proteasome and model structure of the putative peptide channel, FEBS Letters, 354, 45-49.
A strategy for secondary structure prediction
In practice, I recommend getting as many state-of-the-art prediction approaches as possible and combining this with some human insight to give a consensus prediction for the family. If you then align all of your predictions (including ideas you have based on residue conservation) with your multiple sequence alignment you can get a consensus picture of the structure. For example, here is part of an alignment of a family of proteins I looked at recently:

In this figure, three automated secondary structure predictions (PHD, SOPMA and SSPRED) appear below the alignment of 12 glutamyl tRNA reductase sequences. Positions within the alignment showing a conservation of hydrophobic side-chain character are shown in yellow, and those showing near total conservation of non-hydrophobic residues (often indicative of active sites) are coloured green.
Predictions of accessibility performed by PHD (PHD Acc. Pred.) are also shown (b = buried, e = exposed), as is a prediction I performed by looking for patterns indicative of the three secondary structure types shown above. For example, positions (within the alignment) 38-45 exhibit the classical amphipathic helix pattern of hydrophobic residue conservation, with positions i, i+3, i+4 and i+7 showing a conservation of hydrophobicity, with intervening positions being mostly polar. Positions 13-16 comprise a short stretch of conserved hydrophobic residues, indicative of a beta-strand, similar to the example from CheY protein shown above.
By looking for these patterns I built up a prediction of the secondary structure for most regions of the protein. Note that most methods - automated and manual - agree for many regions of the alignment.
Given the results of several methods of predicting secondary structure, one can build up a consensus picture of the secondary structure, such as that shown at the bottom of the alignment above.
Note that you can get predictions like the above (i.e. consensus predictions) from the very useful JPRED server.
8楼2010-09-14 01:52:29
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

cnlics

木虫 (小有名气)

折叠识别方法及其链接
一些折叠识别方法的链接(仅列名称):

•通过web网运行的方法:
    o 3D-pssm (本站)
    o TOPITS (EMBL)
    o UCLA-DOE Structre Prediction Server (UCLA)
    o 123D
    o UCSC HMM (UCSC)
    o FAS (Burnham Institute)
•有可执行程序或代码的方法:
    o THREADER(Warwick)
    o ProFIT CAME (Salzburg)
•其他相关链接:
    o Protein Structure Prediction Centre (US)
    o CASP1
    o CASP2
    o CASP3
    o UCLA-DOE Fold-Recognition Benchmark Home Page
即使不存在已知3D结构的同源蛋白,仍然可能通过折叠识别方法,从已知的3D结构中找到未知蛋白最接近的折叠。

3D结构的相似性:
目前(真正 意义上的)从头预测蛋白质3D结构仍然是不可能的,在较短的将来也不可能找到识别折叠的一般性方法。但是,长期以来人们就意识到,即使没有显著的序列或功能上的相似性,蛋白质常常采取相似的折叠,
Ab initio prediction of protein 3D structures is not possible at present, and a general solution to the protein folding problem is not likely to be found in the near future. However, it has long been recognised that proteins often adopt similar folds despite no significant sequence or functional similarity and that nature is apparently restricted to a limited number of protein folds.
There are numerous protein structure classifications now available via the WWW:
•        SCOP (MRC Cambridge)
•        CATH (University College, London)
•        FSSP (EBI, Cambridge)
•        3 Dee (EBI, Cambridge)
•        HOMSTRAD (Biochemistry, Cambridge)
•        VAST (NCBI, USA)
Thus for many proteins (~ 70%) there will be a suitable structure in the database from which to build a 3D model. Unfortuantely, the lack of sequence similarity will mean that many of these go undetected until after 3D structure determination.
The goal of fold recognition
Methods of protein fold recognition attempt to detect similarities between protein 3D structure that are not accompanied by any significant sequence similarity. There are many approaches, but the unifying theme is to try and find folds that are compatable with a particular sequence. Unlike sequence-only comparison, these methods take advantage of the extra information made available by 3D structure information. In effect, the turn the protein folding problem on it's head: rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence.
部分相关文章(略)

The structure was correctly predicted to adopt a ras-p21 type fold
The realities of fold recognition
Despite initially promising results, methods of fold recognition are not always accurate. Guides to the accuracy of protein fold recognition can be found in the proceedings of the Critical Assessment of Structure Predictions (CASP) conferences. At the first meeting in 1994 (CASP1) the methods were found to be about 50 % accurate at best with respect to their ability to place a correct fold at the top of a ranked list. Though many methods failed to detect the correct fold at the top of a ranked list, a correct fold was often found in the top 10 scoring folds. Even when the methods were successful, alignments of sequence on to protein 3D structure were usually incorrect, meaning that comparative modelling performed using such models would be inaccurate.
The CASP2 meeting held in December 1996, showed that many of the methods had improved, though it is difficult to compare the results of the two assessments (i.e. CASP1 & CASP2) since very different criteria were used to assess correct answers. It would be foolish and over-ambitious for me to present a detailed assessment of the results here. However, and important thing to note, was that Murzin & Bateman managed to attain near 100% success by the use of careful human insight, a knowledge of known structures, secondary structure predictions and thoughts about the function of the target sequences. Their results strongly support the arguments given below that human insight can be a powerful aid during fold recognition. A summary of the results from this meeting can be found in the PROTEINS issue dedicated to the meeting (PROTEINS, Suppl 1, 1997).
The CASP3 meeting was held in December 1998. It showed some progress in the ability of fold recognition methods to detect correct protein folds and in the quality of alignments obtained. A detailed summary of the results will appear towards the end of 1999 in the PROTEINS supplement.
For my talk, I did a crude assessment of 5 methods of fold recognition. I took 12 proteins of known structure (3 from each folding class) an ran each of the five methods using default parameters. I then asked how often was a correct fold (not allowing trival sequence detectable folds) found in the first rank, or in the top 10 scoring folds. I also asked how often the method found the correct folding class in the first rank. The results are summarised in here in a PostScript file.
Perhaps the worst result from this study is shown below:

One method suggested that the sequence for the Probe (left) (a four helix bundle) would best fit onto the structure shown on the right (an OB fold, comprising a six stranded barrel).
The results suggest that one should use caution when using these methods. In spite of this, the methods remain very useful.
A practical approach:
Although they are not 100 % accurate, the methods are still very useful. To use the methods I would suggest the following:
•        Run as many methods as you can, and run each method on as many sequences (from your homologous protein family) as you can. The methods almost always give somewhat different answers with the same sequences. I have also found that a single method will often give different results for sets of homologous sequences, so I would also suggest running each method on as many homologoues as possible. After all of these runs, one can build up a consensus picture of the likely fold in a manner similar to that used for secondary structure prediction above.
•        Remember the expected accuracy of the methods, and don't use them as black-boxes. Remember that a correct fold may not be at the top of the list, but that it is likely to be in the top 10 scoring folds.
•        Think about the function of your protein, and look into the function of the proteins that have been found by the various methods. If you see a functional similarity, then you may have detected a weak sequence homologue, or remote homologue. At CASP2, as said above, Murzin & Bateman managed to obtain remarkably accurate predictions by identification of remote homologues. Their paper appeard in the PROTEINS supplement for the CASP2 experiment:
Murzin AG, Bateman A (1997) Distant homology recognition using structural classification of proteins Proteins, Suppl 1, 105-112.
and provides some key insights into protein fold recognition using humans rather than computers.
•        Don't trust the alignments that are output by the programs. They can be used as a starting point, but the best alignment of sequence on to tertiary structure is still likely to come from careful human intervention. One strategy for doing this is discussed in the next section

[ Last edited by cnlics on 2010-9-19 at 16:59 ]
9楼2010-09-14 01:55:58
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

cnlics

木虫 (小有名气)

Analysis of protein folds and alignment of secondary structure elements
________________________________________
If you have predicted that your protein will adopt a particular fold within the database, then an important thing to consider to which fold your protein belongs, and other proteins that adopt a similar fold. To find out, look at one of the following databases:
•        SCOP (MRC Cambridge)
•        CATH (University College, London)
•        FSSP (EBI, Cambridge)
•        3 Dee (EBI, Cambridge)
•        HOMSTRAD (Biochemistry, Cambridge)
•        VAST (NCBI, USA)
(Note that these databases don't always agree as to what constitutes a similar fold, so I would recommend looking at as many of them as possible).
If your predicted fold has many "relatives", then have a look at what they are. Ask:
•        Do any of members show functional similarity to your protein? If there is any functional similarity between your protein and any members of the fold, then you may be able to back up your prediction of fold (possibly by the conservation of active site residues, or the approximate location of active site residues, etc.)
•        Is this fold a superfold? If so, does this superfold contain a supersite? Certain folds show a tendancy to bind ligands in a common location, even in the absense of any functional or clear evolutionary relationships. For an explanation of this, please see our work on supersites.
•        Are there core secondary structure elements that should really be present in any member of the fold?
•        Are there non-core secondary structure elements that might not be present in all members of the fold?
Core secondary structure elements, such as those comprising a beta-barrel, should really be present in a fold. If your predicted secondary structures can't be made to match up with what you think is the core of the protein fold, then your prediction of fold may be wrong (but be careful, since your secondary structure prediction may contain errors). You can also use your prediction together with the core secondary structure elements to derive an alignment of of predicted and observed secondary structures.
For example, we predicted that the glutamyl tRNA reductases (hemA family) would adopt an alpha-beta barrel fold using a combination of fold recognition and secondary structure prediction methods. We aligned the secondary structures of diverse members of the alpha-beta barrel fold using a structural alignment program, and aligned the secondary structures to the core (boxed below) secondary structure elements.

In the alignment above, each alpha and beta character refers to an entire secondary structure element. Those that are boxed are core secondary structure elements found in most members of the fold. The alignment of predicted secondary structures to the core elements appears at the bottom of the figure. Note that I have had to delete several alpha helices and beta strands from our prediction to allow for alignment. This is not surprising, because insertions or deletions of secondary structure elements are common across the diverse set of proteins that adopt this fold.
10楼2010-09-14 01:58:01
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
相关版块跳转 我要订阅楼主 cnlics 的主题更新
最具人气热帖推荐 [查看全部] 作者 回/看 最后发表
[基金申请] 希望今年自己国自然面上项目和老婆青年项目能中! +7 恐龙爸爸 2024-06-14 7/350 2024-06-16 14:48 by redfish105
[论文投稿] 二审返修送审10天了,原来一审的3个审稿人只有2个接受了审稿,会邀请新审稿人么? 50+3 huanpo116 2024-06-15 5/250 2024-06-16 10:27 by bobvan
[找工作] 成都产品质量检测研究院 200+3 鲸鱼663 2024-06-11 9/450 2024-06-16 10:08 by SNaiL1995
[有机交流] 车间生产,真空度很高,温度很高,但减压蒸馏速度很慢。 10+12 召唤鬼泣lL 2024-06-13 36/1800 2024-06-16 09:20 by ddc805
[教师之家] 关于2023的收入 +33 小龙虾2008 2024-06-10 34/1700 2024-06-15 23:01 by zeolitess
[文学芳草园] 累并快乐着 +13 MYHLD521 2024-06-14 13/650 2024-06-15 22:59 by zeolitess
[基金申请] Nature 11日发文,中国著名学者们称造假迫不得已 +5 babu2015 2024-06-14 5/250 2024-06-15 20:40 by lc231001
[论文投稿] 审稿人含糊拒稿,还需要回复吗?如何回复? 20+3 BruceChum 2024-06-15 17/850 2024-06-15 20:19 by arthas_007
[基金申请] BO4的YQ答辩通知发布了吗? +6 博学笃行 2024-06-11 6/300 2024-06-15 16:04 by 悲催科研狗
[基金申请] 为什么我的博后基金还在流动站审核中?不会是学院给我卡了吧? +14 王凯12 2024-06-13 26/1300 2024-06-15 15:22 by 好人与坏人
[基金申请] 有没有机械的前辈分享一下评上海优都是什么成果啊 +7 wulala800 2024-06-10 7/350 2024-06-15 09:33 by 晓目崇
[论文投稿] 投了一篇4区的SCI,审稿人一个拒稿,一个小修,编辑给了大修。 +9 安稳22123 2024-06-13 10/500 2024-06-14 23:45 by jurkat.1640
[基金申请] E12面上申请 +4 汉风之遗 2024-06-13 4/200 2024-06-14 15:28 by 天外飞去来
[考博] 申博找导师 +4 疏影横斜水清浅3 2024-06-13 6/300 2024-06-14 14:31 by zxl_1105
[有机交流] ππ堆积会发生在有机溶剂中吗 5+3 zibuyu0420 2024-06-13 4/200 2024-06-14 14:17 by 小肉干
[论文投稿] 最近写了一篇控制优化领域的文章,可以投哪里啊?有没有水一些的期刊推荐 +7 香瓜木香 2024-06-12 13/650 2024-06-14 07:05 by 香瓜木香
[论文投稿] 投稿后发现其他作者的邮箱填错了该怎么办呀 10+4 在飞的猪 2024-06-13 6/300 2024-06-14 04:45 by 小虫子咔咔
[基金申请] 博士后面上项目状态还是专家评审吗 10+9 Thatcheremu 2024-06-13 55/2750 2024-06-13 21:23 by 乌合麒麟
[基金申请] 博后特助这周出结果吗?往年都是啥时候啊? +13 jsqy 2024-06-12 17/850 2024-06-12 19:55 by Lynn212
[教师之家] 公办双非,学生论文升学就业都不行。一本大学的论文升学就业没问题吧? +5 河西夜郎 2024-06-09 5/250 2024-06-10 17:59 by yyallen2003
信息提示
请填处理意见