【答案】应助回帖

» 收录本帖的淘帖专辑推荐

蛋白质生物学实验经验

» 猜你喜欢

327求调剂已经有30人回复
294求调剂已经有15人回复
307中医考研调剂已经有8人回复
求调剂已经有4人回复
0854求调剂已经有27人回复
材料相关专业344求调剂双非工科学校或课题组已经有27人回复
300求调剂已经有12人回复
291求调剂已经有10人回复
一志愿沪9，326求生物学调剂已经有12人回复
生物学调剂已经有10人回复

» 本主题相关价值贴推荐，对您同样有帮助:

青蒿素背后的故事引发的思考已经有26人回复

1楼 2013-06-28 00:05:32

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

回帖支持 ( 显示支持度最高的前 50 名 )

凌波丽

专家顾问 (知名作家)

专家经验: +218
BioEPI: 11
应助: 397 (硕士)
贵宾: 0.044
金币: 2118.9
散金: 10
红花: 208
帖子: 5633
在线: 571.6小时
虫号: 1766465
注册: 2012-04-19
性别: MM
专业: 生物大分子结构与功能
管辖: 生物科学综合

【答案】应助回帖

★ ★ ★
感谢参与，应助指数 +1
wizardfan: 金币+2, 谢谢参与。不过你的第一段描述更接近于BLAST算法，而不是PSM 2013-06-28 05:39:19
梦在农大: 金币+1, ★有帮助, 谢谢你水平太高了 2013-06-28 23:20:35

你的图我看不太清楚但是知道是在拟南芥的表达的蛋白组数据库中搜索目的蛋白质的最高分值的匹配信息。

我从和你的文字叙述我已经知道大概怎么回事了，大概就是用MALDI-TOF-MS测定出多肽段或者序列比对是将同源蛋白质或者基因序列位点上的匹配位点（相同或者相似残基）与不匹配位点（不相似残基）按照一定的记分规则转化为序列间相似性或者差异性的数值来加以比较，相似性最大的比对结果具有最多的匹配位点，从数学上讲，应该是最优的比对结果；但是从数学模型或算法得出的最优结果在多大程度上反映了序列之间的相似性以及它们的生物学特征之间的关系，将取决于将生物学问题简化成数学问题的过程，而这一过程也是生物信息处理最难解决的问题。

你给出的文章的作者是分别用质谱测定了一系列的未知蛋白质（34-53种蛋白质）的序列的某些有限酶解肽段或者全蛋白质的分子量（你的材料没说质谱测定的斑点回收物是什么以及怎么测定，我是按照一般用质谱确定蛋白质的方法猜测的）可能是用有限酶水解制备的多肽片段，从蛋白质数据库中搜索蛋白质完全没有进行蛋白质的全序列测定-----除非是数据库中没有的全新蛋白质！“The 53 variable spots were analyzed by MALDI-TOF-MS.”没有上下文，我不知道此句话的确切的意思，这句话可能说：质谱之前的双向电泳的凝胶上的蛋白质斑（回收的蛋白质斑的样品用于MALDI-TOF-MS）有34个在拟南芥的表达的蛋白质数据库中能够寻找匹配度极高的信息-----就是大概确定34个已知蛋白质。

你给出的文章的确定基因的原理是：根据MALDI-TOF-MS测定的一组蛋白质（从你的文章内容可知：共有53个电泳胶上的斑块回收物进行了MALDI-TOF-MS测定）的特征数据，在拟南芥的表达的蛋白质数据库中寻找匹配信息，以便确定MALDI-TOF-MS的数据的蛋白质归属。如此一旦确定目的蛋白质，那么根据蛋白质的序列，然后反推成核苷酸序列，再到进行基因数据库的序列比对，很容易确定基因。也可能拟南芥的蛋白质组的数据库中的每个蛋白质就对应了具体的基因，那么知道了蛋白质也就知道了基因。

你听懂将我说的内容了吗？（我说的已经很详细了吧，你的文章也没有给出，我还要从蛋白质组学的一般操作来猜你的文章的上下文内容和你可能被绊倒的地方。

）如果没有懂，把该论文提出来，我给你解释，如果这几天我不忙的话。

赞一下(2人)

2楼2013-06-28 02:19:34

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

凌波丽

专家顾问 (知名作家)

专家经验: +218
BioEPI: 11
应助: 397 (硕士)
贵宾: 0.044
金币: 2118.9
散金: 10
红花: 208
帖子: 5633
在线: 571.6小时
虫号: 1766465
注册: 2012-04-19
性别: MM
专业: 生物大分子结构与功能
管辖: 生物科学综合

这个问题倒是我自从注册以来第一次回答蛋白质组学的应用问题，坦率地讲：我认为难度不大，但是比较有意思）。

3楼2013-06-28 02:21:32

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

wizardfan

至尊木虫 (著名写手)

BioEPI: 18
应助: 599 (博士)
贵宾: 1.818
金币: 24632.2
散金: 197
红花: 48
沙发: 2
帖子: 2254
在线: 400.7小时
虫号: 1879241
注册: 2012-07-05
性别: GG
专业: 生物信息学

【答案】应助回帖

★ ★
感谢参与，应助指数 +1
137167741: 金币+1, 小木虫鼓励交流~~ 2013-06-28 08:49:35
梦在农大: 金币+1, ★★★很有帮助, 非常感谢 2013-06-28 23:15:53

arabidopsis是一个被研究的很透彻的基因组，在用tair作为目标蛋白质数据库的时候，可以很轻松的得到对应的基因信息。看不到score的来源，一般猜测mascot会给出一个score，分数越高，可靠性就越高（就是被鉴定出来的蛋白质就是真实的蛋白质）。倍数关系同样不清楚，有定量蛋白组学，可以估算蛋白质的含量，可能这个倍数的变化就是指treated/control之间蛋白质浓度的变化。

4楼2013-06-28 05:37:36

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

凌波丽

专家顾问 (知名作家)

专家经验: +218
BioEPI: 11
应助: 397 (硕士)
贵宾: 0.044
金币: 2118.9
散金: 10
红花: 208
帖子: 5633
在线: 571.6小时
虫号: 1766465
注册: 2012-04-19
性别: MM
专业: 生物大分子结构与功能
管辖: 生物科学综合

【答案】应助回帖

★
137167741: 金币+1, 小木虫鼓励交流~~ 2013-06-28 20:07:11

更正：蛋白质组的质谱数据库搜索原理与基因组不一样，我在二楼的第一段原理答错了，因为当时第一次看图时我以为是搜索基因，后来打完了也没有改过来，Sorry！
蛋白质组的质谱数据库搜索原理也有很多算法，我只举其中一种，有些代表性。一般蛋白质组的质谱数据库搜索原理第一步是将所得所测的肽质量数与蛋白质组的质谱数据库中的每一个蛋白质的理论肽谱进行比较。当计算值落在误差设定范围以内时，就记作一个匹配。与计算匹配的肽段数量不同，MOlecular Weight SEarh(MOWSE)使用经验因子来为每一个肽匹配设定一个“权重”。该权重因子矩阵在构建数据库时产生，方法是：
先产生一个频率因子矩阵F，在此矩阵中，每一行代表肽质量数相差100道尔顿，每一列代表蛋白质的质量数相差100kD道尔顿。当分析每一个序列时，设定合适的矩阵元素f(i,j)步长以便将肽质量的大小作为蛋白质的质量的函数进行统计分析。矩阵F中的每一列的元素除以该列中的最大值，从而使矩阵F归一化，并且得到MOlecular Weight SEarh(MOWSE)因子矩阵。在用肽质量实测值对理论肽质量数据库检索后，按下式计算每一次检索的应得的分数（score）：
score=50000/[M(protein)×∏m(i,j)],M(protein)是蛋白质的分子量，∏m(i,j)为MOlecular Weight SEarh(MOWSE)因子矩阵中的元素的乘积。

6楼2013-06-28 11:01:08

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

凌波丽

专家顾问 (知名作家)

专家经验: +218
BioEPI: 11
应助: 397 (硕士)
贵宾: 0.044
金币: 2118.9
散金: 10
红花: 208
帖子: 5633
在线: 571.6小时
虫号: 1766465
注册: 2012-04-19
性别: MM
专业: 生物大分子结构与功能
管辖: 生物科学综合

提醒：在2楼的我回答的第一段描述就是错了！

我是用BLAST算法讲的，序列比对的方法很多，蛋白质组的数组库搜索也有很多不同的算法，我不可能都说出来，图看不清，开始我以为是直接比对核酸序列。发帖前忘了改了。

wizardfan: 金币+2, 谢谢参与。不过你的第一段描述更接近于BLAST算法，而不是PSM 。wizardfan的提示是对的，

不然，我还纠正不过来！

要按照BLAST算法去蛋白质组学的质谱数据库搜索匹配信息，那就太累了。

9楼2013-06-28 23:23:12

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

wizardfan

至尊木虫 (著名写手)

BioEPI: 18
应助: 599 (博士)
贵宾: 1.818
金币: 24632.2
散金: 197
红花: 48
沙发: 2
帖子: 2254
在线: 400.7小时
虫号: 1879241
注册: 2012-07-05
性别: GG
专业: 生物信息学

【答案】应助回帖

★
梦在农大: 金币+1, ★有帮助, 非常感谢啊 2013-06-29 22:10:25

内容已删除

13楼2013-06-29 05:43:12

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

普通回帖

凌波丽

专家顾问 (知名作家)

专家经验: +218
BioEPI: 11
应助: 397 (硕士)
贵宾: 0.044
金币: 2118.9
散金: 10
红花: 208
帖子: 5633
在线: 571.6小时
虫号: 1766465
注册: 2012-04-19
性别: MM
专业: 生物大分子结构与功能
管辖: 生物科学综合

【答案】应助回帖

★
137167741: 金币+1, 小木虫鼓励交流~~ 2013-06-28 20:06:57

我回答的第一段描述就是用BLAST算法讲的，序列比对的方法很多，蛋白质组的数组库搜索也有很多不同的算法，我不可能都说出来，图看不起出，开始我以为是直接比对核酸序列。

5楼2013-06-28 10:29:16

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

凌波丽

专家顾问 (知名作家)

专家经验: +218
BioEPI: 11
应助: 397 (硕士)
贵宾: 0.044
金币: 2118.9
散金: 10
红花: 208
帖子: 5633
在线: 571.6小时
虫号: 1766465
注册: 2012-04-19
性别: MM
专业: 生物大分子结构与功能
管辖: 生物科学综合

【答案】应助回帖

★ ★ ★
137167741: 金币+1, 小木虫鼓励交流~~ 2013-06-28 20:07:27
梦在农大: 金币+2, ★★★很有帮助, 谢谢你耽误你时间啦 2013-06-28 23:17:22

再举一个蛋白质组的质谱数据库搜索的例子：
1.“A lgorithms and Software Tools for Id entifying Proteins from ESI
Ta ndem MS Data: Sequest
The firs t algorithm/pr ogram to identify proteins by matching MS-MS data to database sequences is Sequest, which was introduced  by John Yates and Jimmy Eng in 1995. Several similar software tools
Prot ei n  Id entification with MS Data  101 have  been  introduced and these will be discussed below. However,
Seques t  will be described in greatest detail as representative of this class  of tools. The value of programs such as Sequest is that they provide a relatively rapid assignment of MS-MS spectra to specific peptid e sequences in databases. This allows fast reduction of large volumes of LC-MS-MS data in pr oteomics an alyses. However, it
is important to emphas ize that Sequest and similar programs do not  actually perform de novo  interpretation of the spectra per se .Consequent ly , the output of these programs depends on the quality
of the  MS-MS data obtained and the completeness and accuracy of the  database used.
Here’s how Sequest works. When the MS instrument obtains an MS-MS scan, it not only records the MS-MS scan itself, but also the m/z  value of th e precursor ion. This information is stored together
with the scan  data. After the analysis is complete, the user sits at the computer and opens the Sequest program. The user then selects the datafile containing the MS-MS scans to be analyzed. The user can tell
Sequest what enzyme (e.g., trypsin) was used to digest the protein sample and also specifie s whether singly or doubly charged ions were subjected to MS-MS. Finally, the user selects a database against which
the  MS-MS data are to be compared.
Once the program starts, all of the proteins in the database are subjected to a virtual digestion with the enzyme specified by the user (e.g., trypsin) . This generates a master list of possible peptides for co mparison to the MS-MS scans. Then each MS-MS scan is analyzed as foll ows：
• The precursor  m/z for each MS-MS scan is used to select peptides
from the database with the same mass (within a defined mass
tolerance). If no digestion enzyme was specified, the program
simply select s all possible peptide sequences that correspond to
the mass of the pe ptide ion analyzed in that MS-MS scan.
•  Theoretical MS-MS spectra are generated from each of the selected
peptides.
•  The MS-MS spectrum being analyzed is compared with each of
the theoretical  MS-MS spectra generated from the database.
•  A correlation score is calculated for each match between the
MS-MS scan and the theoretical MS-MS spectra.”

2."Soft ware Tool s for Peptide Mass
Fingerprinting: Scoring the Results
In  MALDI- TOF spectra from real samples, there are typically dozens
of m/z  si gnal s. Peptide mass fingerprinting software can usually
match just about all of these to some entry in a database. However,
given  errors in  m/z m e a s u r e m e nt,  f r e q u e nt  s a mpl e  c o nt a m i n at i o n ,  a n d
the  presence of unanticipated posttranslational modifications, not all of  th e  matc hes  will point to the same proteins. So how do we score the hits  to determine which protein best matches the data?
The  simplest approach is to assign the highest score to proteins whose predicted tryptic peptides match the greatest number of  m/z signals in th e MS data. If we search only one  m/z value, then several
proteins could be equally good matches. Howe ver, as we search a  greater number of m/z values, mo re matches correspond to a particular protein and lead to a greater score for that protein vs others. This
fairly simple approach works reas onably well with very good MS data. However, it tends to  assign higher scores to larger proteins.
As  note d earlier, larger proteins yield more tryptic peptides, so the chances of a match to one of these is greater for larger proteins than fo r smaller proteins.
To  add r e s s  t hese  problems, several of the available peptide mass fingerprinting programs use more sophisticated scoring algorithms.
Thes e algorithms correct for scoring bias due to protein size, in which larger proteins give rise to greater numbers of peptides. They also correct for the tendency of smaller peptides in databases to have a
greate r number of matches with searched  m/z  values. Finally, some of these algori thms also apply pr obability-based statistics to better define the significance of protein identifications. At the time of this
writing, the principal tools available for peptide mass fingerprinting can be grouped into  three categories:
• First-generation freeware and subscription software tools that as sign scores based on the number of m/z values in a spectrum 86  To ol s of P r o t e om ic s
that match  database values within a given mass tolerance. These
programs include PepSea (http://www.protana.com) and
Pept Ident/MultIdent (http://www.expasy.ch/tools/peptident.html).
•  Second-generation freeware and subscription software tools that
employ scoring algo rith ms that take into account the effects
of protein size and peptide length on the probabilities of match-ing. These include MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse) and MS-Fit (http://prospector.ucsf.edu/).
• Third-generation so ftware that employs more extensive probability-based scoring to provide a statistical basis for scores and also to estimate the probabilities that matches may reflect random events, rather than true identities. These programsinclude  ProFound (http://prowl.rockefeller.edu/cgi-bin/Pro
Found) an d Mascot (http://www.matrixscience.com/)."

3."A lgorithms and Software Toolsfor Id entifying Proteins from ESI Tandem MS Data: Sequest
The firs t algorithm/pr ogram to identify proteins by matching MS-MS data to database sequences is Sequest, which was introduced by John Yates and Jimmy Eng in 1995. Several similar software tools Protein  Id entification with MS Data  101have  been  introduced and these will be discussed below. However,
Seques t  will be described in greatest detail as representative of this class  of tools. The value of programs such as Sequest is that they provide a relatively rapid assignment of MS-MS spectra to specific peptid e sequences in databases. This allows fast reduction of large volumes of LC-MS-MS data in pr oteomics an alyses. However, it
is important to emphas ize that Sequest and similar programs do not  actually perform de novo  interpretation of the spectra perse .
Consequent ly , the output of these programs depends on the quality of the  MS-MS data obtained and the completeness and accuracy of the  database used.
Here’s how Sequest works. When the MS instrument obtains an MS-MS scan, it not only records the MS-MS scan itself, but also the m/z  value of th e precursor ion. This information is stored together
with the scan  data. After the analysis is complete, the user sits at the computer and opens the Sequest program. The user then selects the datafile containing the MS-MS scans to be analyzed. The user can tell
Sequest what enzyme (e.g., trypsin) was used to digest the protein sample and also specifie s whether singly or doubly charged ions were subjected to MS-MS. Finally, the user selects a database against which
the  MS-MS data are to be compared.
Once the program starts, all of the proteins in the database are subjected to a virtual digestion with the enzyme specified by the user (e.g., trypsin) . This generates a master list of possible peptides for
co mparison to the MS-MS scans. Then each MS-MS scan is analyzed
as foll ows :
• The precursor  m/z for each MS-MS scan is used to select peptides from the database with the same mass (within a defined mass tolerance). If no digestion enzyme was specified, the program
simply select s all possible peptide sequences that correspond to the mass of the pe ptide ion analyzed in that MS-MS scan.
•  Theoretical MS-MS spectra are generated from each of the selected peptides.
•  The MS-MS spectrum being analyzed is compared with each of
the theoretical  MS-MS spectra generated from the database.
•  A correlation score is calculated for each match between the MS-MS scan and the theoretical MS-MS spectra."

-------From

ANIEL C. L IEBLER,INTRODUCTION TO P ROTEOMICS,Humana Press Inc,2002.

7楼2013-06-28 11:15:41

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

凌波丽

专家顾问 (知名作家)

专家经验: +218
BioEPI: 11
应助: 397 (硕士)
贵宾: 0.044
金币: 2118.9
散金: 10
红花: 208
帖子: 5633
在线: 571.6小时
虫号: 1766465
注册: 2012-04-19
性别: MM
专业: 生物大分子结构与功能
管辖: 生物科学综合

在7楼，我是新举出三种蛋白质组的质谱数据库搜索的算法的例子，不是一种。

8楼2013-06-28 11:17:18

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

梦在农大

银虫 (正式写手)

应助: 4 (幼儿园)
金币: 1586.6
散金: 25
红花: 1
帖子: 345
在线: 78.8小时
虫号: 2416586
注册: 2013-04-13
专业: 普通教育学

引用回帖:

4楼: Originally posted by wizardfan at 2013-06-28 05:37:36
arabidopsis是一个被研究的很透彻的基因组，在用tair作为目标蛋白质数据库的时候，可以很轻松的得到对应的基因信息。看不到score的来源，一般猜测mascot会给出一个score，分数越高，可靠性就越高（就是被鉴定出来的 ...

版主你好是这样。我们上生物信息学课老师让讲文献，说最好举一个例子说明这篇文章的数据是怎么弄得出来的。但是我不知道人家最开始测得的AA顺序，可不可以从它表里面表的基因反推啊谢谢