| 查看: 6027 | 回复: 32 | ||||||||||||||
| 当前只显示满足指定条件的回帖,点击这里查看本话题的所有回帖 | ||||||||||||||
cnlics木虫 (小有名气)
|
[交流]
【分享】蛋白质结构预测流程已有23人参与
|
|||||||||||||
|
我慢慢翻译慢慢贴 这里贴的内容是以前收集的,应该是来自EMBL,我粗略浏览了下内容,还没有过时。 WORD文档可以在这里下载: http://ifile.it/dwzy278 蛋白质结构预测一般流程见下图: ![]() 内容目录: •相关实验数据 •序列数据和初步分析 •搜索序列数据库 •识别结构域 •多序列比对 •比较或同源建模 •二级结构预测 •折叠的识别 •折叠分析与二级结构比对 •序列与结构的比对 [ Last edited by cnlics on 2010-9-16 at 08:24 ] |
» 收录本帖的淘帖专辑推荐
蛋白质生物学实验经验 | 分子生物实验及蛋白纯化结晶相关链接 | 生物信息学 | 生物化学和分子生物学 |
精品收藏 | 待下载 | 蛋白质 | 交叉知识 |
比偶长大 | 蛋白 分析软件 | 生物信息学 |
» 本帖已获得的红花(最新10朵)
» 猜你喜欢
孩子确诊有中度注意力缺陷
已经有12人回复
2025冷门绝学什么时候出结果
已经有3人回复
天津工业大学郑柳春团队欢迎化学化工、高分子化学或有机合成方向的博士生和硕士生加入
已经有4人回复
康复大学泰山学者周祺惠团队招收博士研究生
已经有6人回复
AI论文写作工具:是科研加速器还是学术作弊器?
已经有3人回复
2026博士申请-功能高分子,水凝胶方向
已经有6人回复
论文投稿,期刊推荐
已经有4人回复
硕士和导师闹得不愉快
已经有13人回复
请问2026国家基金面上项目会启动申2停1吗
已经有5人回复
同一篇文章,用不同账号投稿对编辑决定是否送审有没有影响?
已经有3人回复
» 本主题相关价值贴推荐,对您同样有帮助:
用SwissModel预测只能得到蛋白质的一大部分的三维结构,why?
已经有14人回复
蛋白分子建模 小分子化合物画图 酶与配体的分子模拟
已经有11人回复
蛋白质二级结构预测
已经有9人回复
蛋白质高级结构预测
已经有8人回复
蛋白质3-d 结构预测
已经有3人回复
关于两个蛋白质结构叠合的原理(或者相关的程序)
已经有12人回复
求一个认可度较高蛋白质二级结构预测软件
已经有1人回复
cnlics
木虫 (小有名气)
- 应助: 2 (幼儿园)
- 金币: 3014.2
- 红花: 4
- 帖子: 270
- 在线: 422.4小时
- 虫号: 795158
- 注册: 2009-06-16
- 性别: GG
- 专业: 当代宗教
|
折叠识别方法及其链接 一些折叠识别方法的链接(仅列名称): •通过web网运行的方法: o 3D-pssm (本站) o TOPITS (EMBL) o UCLA-DOE Structre Prediction Server (UCLA) o 123D o UCSC HMM (UCSC) o FAS (Burnham Institute) •有可执行程序或代码的方法: o THREADER(Warwick) o ProFIT CAME (Salzburg) •其他相关链接: o Protein Structure Prediction Centre (US) o CASP1 o CASP2 o CASP3 o UCLA-DOE Fold-Recognition Benchmark Home Page 即使不存在已知3D结构的同源蛋白,仍然可能通过折叠识别方法,从已知的3D结构中找到未知蛋白最接近的折叠。 3D结构的相似性: 目前(真正 意义上的)从头预测蛋白质3D结构仍然是不可能的,在较短的将来也不可能找到识别折叠的一般性方法。但是,长期以来人们就意识到,即使没有显著的序列或功能上的相似性,蛋白质常常采取相似的折叠, Ab initio prediction of protein 3D structures is not possible at present, and a general solution to the protein folding problem is not likely to be found in the near future. However, it has long been recognised that proteins often adopt similar folds despite no significant sequence or functional similarity and that nature is apparently restricted to a limited number of protein folds. There are numerous protein structure classifications now available via the WWW: • SCOP (MRC Cambridge) • CATH (University College, London) • FSSP (EBI, Cambridge) • 3 Dee (EBI, Cambridge) • HOMSTRAD (Biochemistry, Cambridge) • VAST (NCBI, USA) Thus for many proteins (~ 70%) there will be a suitable structure in the database from which to build a 3D model. Unfortuantely, the lack of sequence similarity will mean that many of these go undetected until after 3D structure determination. The goal of fold recognition Methods of protein fold recognition attempt to detect similarities between protein 3D structure that are not accompanied by any significant sequence similarity. There are many approaches, but the unifying theme is to try and find folds that are compatable with a particular sequence. Unlike sequence-only comparison, these methods take advantage of the extra information made available by 3D structure information. In effect, the turn the protein folding problem on it's head: rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence. 部分相关文章(略) The structure was correctly predicted to adopt a ras-p21 type fold The realities of fold recognition Despite initially promising results, methods of fold recognition are not always accurate. Guides to the accuracy of protein fold recognition can be found in the proceedings of the Critical Assessment of Structure Predictions (CASP) conferences. At the first meeting in 1994 (CASP1) the methods were found to be about 50 % accurate at best with respect to their ability to place a correct fold at the top of a ranked list. Though many methods failed to detect the correct fold at the top of a ranked list, a correct fold was often found in the top 10 scoring folds. Even when the methods were successful, alignments of sequence on to protein 3D structure were usually incorrect, meaning that comparative modelling performed using such models would be inaccurate. The CASP2 meeting held in December 1996, showed that many of the methods had improved, though it is difficult to compare the results of the two assessments (i.e. CASP1 & CASP2) since very different criteria were used to assess correct answers. It would be foolish and over-ambitious for me to present a detailed assessment of the results here. However, and important thing to note, was that Murzin & Bateman managed to attain near 100% success by the use of careful human insight, a knowledge of known structures, secondary structure predictions and thoughts about the function of the target sequences. Their results strongly support the arguments given below that human insight can be a powerful aid during fold recognition. A summary of the results from this meeting can be found in the PROTEINS issue dedicated to the meeting (PROTEINS, Suppl 1, 1997). The CASP3 meeting was held in December 1998. It showed some progress in the ability of fold recognition methods to detect correct protein folds and in the quality of alignments obtained. A detailed summary of the results will appear towards the end of 1999 in the PROTEINS supplement. For my talk, I did a crude assessment of 5 methods of fold recognition. I took 12 proteins of known structure (3 from each folding class) an ran each of the five methods using default parameters. I then asked how often was a correct fold (not allowing trival sequence detectable folds) found in the first rank, or in the top 10 scoring folds. I also asked how often the method found the correct folding class in the first rank. The results are summarised in here in a PostScript file. Perhaps the worst result from this study is shown below: One method suggested that the sequence for the Probe (left) (a four helix bundle) would best fit onto the structure shown on the right (an OB fold, comprising a six stranded barrel). The results suggest that one should use caution when using these methods. In spite of this, the methods remain very useful. A practical approach: Although they are not 100 % accurate, the methods are still very useful. To use the methods I would suggest the following: • Run as many methods as you can, and run each method on as many sequences (from your homologous protein family) as you can. The methods almost always give somewhat different answers with the same sequences. I have also found that a single method will often give different results for sets of homologous sequences, so I would also suggest running each method on as many homologoues as possible. After all of these runs, one can build up a consensus picture of the likely fold in a manner similar to that used for secondary structure prediction above. • Remember the expected accuracy of the methods, and don't use them as black-boxes. Remember that a correct fold may not be at the top of the list, but that it is likely to be in the top 10 scoring folds. • Think about the function of your protein, and look into the function of the proteins that have been found by the various methods. If you see a functional similarity, then you may have detected a weak sequence homologue, or remote homologue. At CASP2, as said above, Murzin & Bateman managed to obtain remarkably accurate predictions by identification of remote homologues. Their paper appeard in the PROTEINS supplement for the CASP2 experiment: Murzin AG, Bateman A (1997) Distant homology recognition using structural classification of proteins Proteins, Suppl 1, 105-112. and provides some key insights into protein fold recognition using humans rather than computers. • Don't trust the alignments that are output by the programs. They can be used as a starting point, but the best alignment of sequence on to tertiary structure is still likely to come from careful human intervention. One strategy for doing this is discussed in the next section [ Last edited by cnlics on 2010-9-19 at 16:59 ] |
9楼2010-09-14 01:55:58
cnlics
木虫 (小有名气)
- 应助: 2 (幼儿园)
- 金币: 3014.2
- 红花: 4
- 帖子: 270
- 在线: 422.4小时
- 虫号: 795158
- 注册: 2009-06-16
- 性别: GG
- 专业: 当代宗教
2楼2010-09-14 01:41:00
cnlics
木虫 (小有名气)
- 应助: 2 (幼儿园)
- 金币: 3014.2
- 红花: 4
- 帖子: 270
- 在线: 422.4小时
- 虫号: 795158
- 注册: 2009-06-16
- 性别: GG
- 专业: 当代宗教
|
蛋白序列数据 对蛋白序列的初步分析有一定价值。例如,如果蛋白是直接来自基因预测,就可能包含多个结构域。更严重的是,可能会包含不太可能是球形或可溶性的区域。此流程图假设你的蛋白是可溶的,可能是一个结构域并不包含非球形结构域。 需要考虑以下方面: •是跨膜蛋白或者包含跨膜片段吗?有许多方法预测这些片段,包括: o TMAP (EMBL) o PredictProtein (EMBL/Columbia) o TMHMM (CBS, Denmark) o TMpred (Baylor College) o DAS (Stockholm) •如果包含卷曲(coiled-coils)可以在COILS server 预测coiled coils 或者下载 COILS 程序(最近已经重写,注意GCG程序包里包含了COILS的一个版本) •蛋白包含低复杂性区域?蛋白经常含有数个聚谷氨酸或聚丝氨酸区,这些地方不容易预测。可以用SEG(GCG程序包里包含了一个版本的SEG程序)检查 。 如果出现以上一种情况,就应该将序列打成碎片,或忽略序列中的特定区段,等等。这个问题与细胞定位结构域相关。 [ Last edited by cnlics on 2010-9-16 at 08:25 ] |
3楼2010-09-14 01:41:58
cnlics
木虫 (小有名气)
- 应助: 2 (幼儿园)
- 金币: 3014.2
- 红花: 4
- 帖子: 270
- 在线: 422.4小时
- 虫号: 795158
- 注册: 2009-06-16
- 性别: GG
- 专业: 当代宗教
|
搜索序列数据库 分析任何新序列的第一步显然是搜索序列数据库以发现同源序列。这样的搜索可以在任何地方或者在任何计算机上完成。而且,有许多WEB服务器可以进行此类搜索,可以输入或粘贴序列到服务器上并交互式地接收结果。 序列搜索也有许多方法,目前最有名的是BLAST程序。可以容易得到在本地运行的版本(从 NCBI 或者 Washington University),也有许多的WEB页面允许对多基因或蛋白质序列的数据库比较蛋白质或DNA序列,仅举几个例子: •National Center for Biotechnology Information (USA) Searches •European Bioinformatics Institute (UK) Searches •BLAST search through SBASE (domain database; ICGEB, Trieste) •还有更多的站点 最近序列比较的重要进展是发展了gapped BLAST 和PSI-BLAST (position specific interated BLAST),二者均使BLAST更敏感,后者通过选取一条搜索结果,建立模式(profile),然后用再它搜索数据库寻找其他同源序列(这个过程可以一直重复到发现不了新的序列为止),可以探测进化距离非常远的同源序列。很重要的一点是,在利用下面章节方法之前,通过PSI-BLAST把蛋白质序列和数据库比较,找寻是否有已知结构。 将一条序列和数据库比较的其他方法有: •FASTA软件包 (William Pearson, University of Virginia, USA) •SCANPS (Geoff Barton, European Bioinformatics Institute, UK) •BLITZ (Compugen's fast Smith Waterman search) •其他方法. It is also possible to use multiple sequence information to perform more sensitive searches. Essentially this involves building a profile from some kind of multiple sequence alignment. A profile essentially gives a score for each type of amino acid at each position in the sequence, and generally makes searches more sentive. Tools for doing this include: •PSI-BLAST (NCBI, Washington) •ProfileScan Server (ISREC, Geneva) •HMMER 隐马氏模型(Sean Eddy, Washington University) •Wise package (Ewan Birney, Sanger Centre;用于蛋白质对DNA的比较) •其他方法. A different approach for incorporating multiple sequence information into a database search is to use a MOTIF. Instead of giving every amino acid some kind of score at every position in an alignment, a motif ignores all but the most invariant positions in an alignment, and just describes the key residues that are conserved and define the family. Sometimes this is called a "signature". For example, "H-[FW]-x-[LIVM]-x-G-x(5)-[LV]-H-x(3)-[DE]" describes a family of DNA binding proteins. It can be translated as "histidine, followed by either a phenylalanine or tryptophan, followed by an amino acid (x), followed by leucine, isoleucine, valine or methionine, followed by any amino acid (x), followed by glycine,... [etc.]". PROSITE (ExPASy Geneva) contains a huge number of such patterns, and several sites allow you to search these data: •ExPASy •EBI It is best to search a few different databases in order to find as many homologues as possible. A very important thing to do, and one which is sometimes overlooked, is to compare any new sequence to a database of sequences for which 3D structure information is available. Whether or not your sequence is homologous to a protein of known 3D structure is not obvious in the output from many searches of large sequence databases. Moreover, if the homology is weak, the similarity may not be apparent at all during the search through a larger database. One last thing to remember is that one can save a lot of time by making use of pre-prepared protein alignments. Many of these alignments are hand edited by experts on the particular protein families, and thus represent probably the best alignment one can get given the data they contain (i.e. they are not always as up to date as the most recent sequence databases). These databases include: •SMART (Oxford/EMBL) •PFAM (Sanger Centre/Wash-U/Karolinska Intitutet) •COGS (NCBI) •PRINTS (UCL/Manchester) •BLOCKS (Fred Hutchinson Cancer Research Centre, Seatle) •SBASE (ICGEB, Trieste) 通常把蛋白质序列和数据比较都有很多的方法,这些对于识别结构域非常有用。 [ Last edited by cnlics on 2010-9-14 at 19:54 ] |
4楼2010-09-14 01:42:52














回复此楼
beimi