| 查看: 6140 | 回复: 32 | |||
| 当前只显示满足指定条件的回帖,点击这里查看本话题的所有回帖 | |||
cnlics木虫 (小有名气)
|
[交流]
【分享】蛋白质结构预测流程已有23人参与
|
||
|
我慢慢翻译慢慢贴 这里贴的内容是以前收集的,应该是来自EMBL,我粗略浏览了下内容,还没有过时。 WORD文档可以在这里下载: http://ifile.it/dwzy278 蛋白质结构预测一般流程见下图: ![]() 内容目录: •相关实验数据 •序列数据和初步分析 •搜索序列数据库 •识别结构域 •多序列比对 •比较或同源建模 •二级结构预测 •折叠的识别 •折叠分析与二级结构比对 •序列与结构的比对 [ Last edited by cnlics on 2010-9-16 at 08:24 ] |
» 本帖已获得的红花(最新10朵)
» 猜你喜欢
三无产品还有机会吗
已经有4人回复
投稿返修后收到这样的回复,还有希望吗
已经有7人回复
压汞仪和BET测气凝胶孔隙率
已经有4人回复
博士申请都是内定的吗?
已经有14人回复
谈谈两天一夜的“延安行”
已经有13人回复
氨基封端PDMS和HDI反应快速固化
已经有11人回复
之前让一硕士生水了7个发明专利,现在这7个获批发明专利的维护费可从哪儿支出哈?
已经有11人回复
论文投稿求助
已经有4人回复
Applied Surface Science 这个期刊。有哪位虫友投过的能把word模板发给我参考一下嘛
已经有3人回复
cnlics
木虫 (小有名气)
- 应助: 2 (幼儿园)
- 金币: 3014.2
- 红花: 4
- 帖子: 270
- 在线: 422.4小时
- 虫号: 795158
- 注册: 2009-06-16
- 性别: GG
- 专业: 当代宗教
|
Alignment of sequence to tertiary structure ________________________________________ Remember that the alignments of sequence on to tertiary structure that one gets from fold recognition methods may be inaccurate. In instance where one has identified a remote homologue, then the fold recognition methods can sometimes give a very accurate alignment, though it is still sometimes fruitful to edit the alignment around variable regions (see the Multiple Sequence Alignment for ways of doing this). In other cases, it may be wise to create your own alignment by starting with the alignment from the fold recognition method, and considering the alignment of secondary structures. There is no generally accepted way for doing this, though one method (ie. mine) involves: • Ensuring that residues predicted to be buried/exposed align to those known to be buried or exposed in the template structure. Note that conserved hydrophobic/polar residues are more likely to be buried/exposed than non-conserved residues, which could simply be anomalies. One can predict residue accessibility manually, or by use of an automated server like PHD. • Ensuring that critical hydrogen bonding patterns are not disrupted in beta-sheet structures. • Trying to conserve residue properties (i.e. size, polarity, hydrophobicity) as best as possible across known and unknown structure. For example, in trying to align the prediction of the glutamyl tRNA reductases (hemA) with one alpha/beta barrel structure (2acs): [Sec.= known secondary structure from PDB code 2ACS (E = extended, H = alpha helix, G = 3-10 helix, B = beta-bridge); Bur. = known residue exposure for 2ACS (b = buried, h = half-buried, e = exposed); in/out = positioning of residues in the beta-barrel (i = pointing inwards, o = pointing outwards); Res. cons = conservation of residues (totally conserved = UPPER CASE, h = hydrophobic, p = polar, c = charged, a = aromatic, s = small, - = negaitve, + = positive) Pred denotes predicted burial and secondary structure for the glutamyl tRNA reductase family; boxed positions are those with the same known/predicted burial. Shaded positions show a conservation of hydrophobic character in BOTH families of proteins, and positions in inverse text show a conservation of polar character in BOTH families.] In the construction of this alignment, several things were considered: • The observed residue burial or exposure • The predicted residue burial or exposure • The conservation of residue properties in known and unknown structures • Whether or not the side chains on the core beta-strands pointed in towards the barrel or out towards the helices • The hydrogen bonding pattern of the beta-strands comprising the core beta-barrel. By using an initial alignment from one of the fold recognition methods as a guide, the alignment above was created by trying to optimise the match of features described above. Remember that proteins having similar three-dimensional structures with little or no sequence similarity can differ substantial with respect to the finer details of their structures (i.e. loops, precise orientation of side chains, orientation of secondary structures, etc.). See here for some work I did with Geoff Barton on this subject. |
11楼2010-09-14 01:59:28
cnlics
木虫 (小有名气)
- 应助: 2 (幼儿园)
- 金币: 3014.2
- 红花: 4
- 帖子: 270
- 在线: 422.4小时
- 虫号: 795158
- 注册: 2009-06-16
- 性别: GG
- 专业: 当代宗教
2楼2010-09-14 01:41:00
cnlics
木虫 (小有名气)
- 应助: 2 (幼儿园)
- 金币: 3014.2
- 红花: 4
- 帖子: 270
- 在线: 422.4小时
- 虫号: 795158
- 注册: 2009-06-16
- 性别: GG
- 专业: 当代宗教
|
蛋白序列数据 对蛋白序列的初步分析有一定价值。例如,如果蛋白是直接来自基因预测,就可能包含多个结构域。更严重的是,可能会包含不太可能是球形或可溶性的区域。此流程图假设你的蛋白是可溶的,可能是一个结构域并不包含非球形结构域。 需要考虑以下方面: •是跨膜蛋白或者包含跨膜片段吗?有许多方法预测这些片段,包括: o TMAP (EMBL) o PredictProtein (EMBL/Columbia) o TMHMM (CBS, Denmark) o TMpred (Baylor College) o DAS (Stockholm) •如果包含卷曲(coiled-coils)可以在COILS server 预测coiled coils 或者下载 COILS 程序(最近已经重写,注意GCG程序包里包含了COILS的一个版本) •蛋白包含低复杂性区域?蛋白经常含有数个聚谷氨酸或聚丝氨酸区,这些地方不容易预测。可以用SEG(GCG程序包里包含了一个版本的SEG程序)检查 。 如果出现以上一种情况,就应该将序列打成碎片,或忽略序列中的特定区段,等等。这个问题与细胞定位结构域相关。 [ Last edited by cnlics on 2010-9-16 at 08:25 ] |
3楼2010-09-14 01:41:58
cnlics
木虫 (小有名气)
- 应助: 2 (幼儿园)
- 金币: 3014.2
- 红花: 4
- 帖子: 270
- 在线: 422.4小时
- 虫号: 795158
- 注册: 2009-06-16
- 性别: GG
- 专业: 当代宗教
|
搜索序列数据库 分析任何新序列的第一步显然是搜索序列数据库以发现同源序列。这样的搜索可以在任何地方或者在任何计算机上完成。而且,有许多WEB服务器可以进行此类搜索,可以输入或粘贴序列到服务器上并交互式地接收结果。 序列搜索也有许多方法,目前最有名的是BLAST程序。可以容易得到在本地运行的版本(从 NCBI 或者 Washington University),也有许多的WEB页面允许对多基因或蛋白质序列的数据库比较蛋白质或DNA序列,仅举几个例子: •National Center for Biotechnology Information (USA) Searches •European Bioinformatics Institute (UK) Searches •BLAST search through SBASE (domain database; ICGEB, Trieste) •还有更多的站点 最近序列比较的重要进展是发展了gapped BLAST 和PSI-BLAST (position specific interated BLAST),二者均使BLAST更敏感,后者通过选取一条搜索结果,建立模式(profile),然后用再它搜索数据库寻找其他同源序列(这个过程可以一直重复到发现不了新的序列为止),可以探测进化距离非常远的同源序列。很重要的一点是,在利用下面章节方法之前,通过PSI-BLAST把蛋白质序列和数据库比较,找寻是否有已知结构。 将一条序列和数据库比较的其他方法有: •FASTA软件包 (William Pearson, University of Virginia, USA) •SCANPS (Geoff Barton, European Bioinformatics Institute, UK) •BLITZ (Compugen's fast Smith Waterman search) •其他方法. It is also possible to use multiple sequence information to perform more sensitive searches. Essentially this involves building a profile from some kind of multiple sequence alignment. A profile essentially gives a score for each type of amino acid at each position in the sequence, and generally makes searches more sentive. Tools for doing this include: •PSI-BLAST (NCBI, Washington) •ProfileScan Server (ISREC, Geneva) •HMMER 隐马氏模型(Sean Eddy, Washington University) •Wise package (Ewan Birney, Sanger Centre;用于蛋白质对DNA的比较) •其他方法. A different approach for incorporating multiple sequence information into a database search is to use a MOTIF. Instead of giving every amino acid some kind of score at every position in an alignment, a motif ignores all but the most invariant positions in an alignment, and just describes the key residues that are conserved and define the family. Sometimes this is called a "signature". For example, "H-[FW]-x-[LIVM]-x-G-x(5)-[LV]-H-x(3)-[DE]" describes a family of DNA binding proteins. It can be translated as "histidine, followed by either a phenylalanine or tryptophan, followed by an amino acid (x), followed by leucine, isoleucine, valine or methionine, followed by any amino acid (x), followed by glycine,... [etc.]". PROSITE (ExPASy Geneva) contains a huge number of such patterns, and several sites allow you to search these data: •ExPASy •EBI It is best to search a few different databases in order to find as many homologues as possible. A very important thing to do, and one which is sometimes overlooked, is to compare any new sequence to a database of sequences for which 3D structure information is available. Whether or not your sequence is homologous to a protein of known 3D structure is not obvious in the output from many searches of large sequence databases. Moreover, if the homology is weak, the similarity may not be apparent at all during the search through a larger database. One last thing to remember is that one can save a lot of time by making use of pre-prepared protein alignments. Many of these alignments are hand edited by experts on the particular protein families, and thus represent probably the best alignment one can get given the data they contain (i.e. they are not always as up to date as the most recent sequence databases). These databases include: •SMART (Oxford/EMBL) •PFAM (Sanger Centre/Wash-U/Karolinska Intitutet) •COGS (NCBI) •PRINTS (UCL/Manchester) •BLOCKS (Fred Hutchinson Cancer Research Centre, Seatle) •SBASE (ICGEB, Trieste) 通常把蛋白质序列和数据比较都有很多的方法,这些对于识别结构域非常有用。 [ Last edited by cnlics on 2010-9-14 at 19:54 ] |
4楼2010-09-14 01:42:52














回复此楼
beimi