24小时热门版块排行榜    

CyRhmU.jpeg
查看: 2796  |  回复: 21
当前只显示满足指定条件的回帖,点击这里查看本话题的所有回帖

大爷给跪了

新虫 (小有名气)

[求助] 本地化blast注释后,想从结果中的Subject Seq-id得到Nr-annotation,求大神指导。已有1人参与

如题,现在注释结果已经出来,其中Subject Seq-id(也就是比对到的nr数据库中的gi)放进一个txt文件中,如图1。
这时候,我想根据这个txt,把nr数据库(如图2)中,这些gi所代表的基因的名称一起找出来,一对一输入一个excel中。
如图2中gi|67472372|sp|P0A7T7.2|RS18_ECOLI是nr水库中的,它在txt中,想找出它后的RecName: Full=30S ribosomal protein S18 [Escherichia coli K-12](对这个基因的说明和名字)。

本地化blast注释后,想从结果中的Subject Seq-id得到Nr-annotation,求大神指导。
图1.jpg


本地化blast注释后,想从结果中的Subject Seq-id得到Nr-annotation,求大神指导。-1
图2.jpg
回复此楼
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

大爷给跪了

新虫 (小有名气)

引用回帖:
10楼: Originally posted by peterrjp at 2014-10-21 21:25:14
那个脚本是利用|符号来分割每行数据的,对于你的文件2,脚本会取出分割后的第2(gi号)和第5个(蛋白名称)区段放进输出结果里
如果要正常运行,必须确保这个规则是适用于每一行。你多贴一些文件2的内容出来看看吧 ...

>gi|67472372|sp|P0A7T7.2|RS18_ECOLI RecName: Full=30S ribosomal protein S18 [Escherichia coli K-12]gi|67472373|sp|P0A7T8.2|RS18_ECOL6 RecName: Full=30S ribosomal protein S18 [Escherichia coli CFT073]gi|67472374|sp|P0A7T9.2|RS18_ECO57 RecName: Full=30S ribosomal protein S18 [Escherichia coli O157:H7]gi|67472375|sp|P0A7U0.2|RS18_PHOLL RecName: Full=30S ribosomal protein S18 [Photorhabdus luminescens subsp. laumondii TTO1]gi|67472376|sp|P0A7U1.2|RS18_SALTI RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Typhi]gi|67472377|sp|P0A7U2.2|RS18_SHIFL RecName: Full=30S ribosomal protein S18 [Shigella flexneri]gi|75505387|sp|Q57GI9.1|RS18_SALCH RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67]gi|81677980|sp|Q5PJ56.1|RS18_SALPA RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150]gi|123047658|sp|Q0SX84.1|RS18_SHIF8 RecName: Full=30S ribosomal protein S18 [Shigella flexneri 5 str. 8401]gi|123084302|sp|Q1R358.1|RS18_ECOUT RecName: Full=30S ribosomal protein S18 [Escherichia coli UTI89]gi|123343273|sp|Q0T9J1.1|RS18_ECOL5 RecName: Full=30S ribosomal protein S18 [Escherichia coli 536]gi|123728279|sp|Q31TD3.1|RS18_SHIBS RecName: Full=30S ribosomal protein S18 [Shigella boydii Sb227]gi|123728458|sp|Q328J7.1|RS18_SHIDS RecName: Full=30S ribosomal protein S18 [Shigella dysenteriae Sd197]gi|123759489|sp|Q3YUE5.1|RS18_SHISS RecName: Full=30S ribosomal protein S18 [Shigella sonnei Ss046]gi|166220942|sp|A8AMJ6.1|RS18_CITK8 RecName: Full=30S ribosomal protein S18 [Citrobacter koseri ATCC BAA-895]gi|166220951|sp|A7MM78.1|RS18_CROS8 RecName: Full=30S ribosomal protein S18 [Cronobacter sakazakii ATCC BAA-894]gi|166220955|sp|A6THB3.1|RS18_KLEP7 RecName: Full=30S ribosomal protein S18 [Klebsiella pneumoniae subsp. pneumoniae MGH 78578]gi|167011188|sp|A7ZV73.1|RS18_ECO24 RecName: Full=30S ribosomal protein S18 [Escherichia coli E24377A]gi|167011189|sp|A8A7U8.1|RS18_ECOHS RecName: Full=30S ribosomal protein S18 [Escherichia coli HS]gi|167011190|sp|A4W5T1.1|RS18_ENT38 RecName: Full=30S ribosomal protein S18 [Enterobacter sp. 638]gi|189029651|sp|A9N520.1|RS18_SALPB RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Paratyphi B str. SPB7]gi|189029677|sp|B1IT04.1|RS18_ECOLC RecName: Full=30S ribosomal protein S18 [Escherichia coli ATCC 8739]gi|189029686|sp|A9MFK9.1|RS18_SALAR RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. arizonae serovar 62:z4,z23:-]gi|226735113|sp|B7MLK7.1|RS18_ECO45 RecName: Full=30S ribosomal protein S18 [Escherichia coli S88]gi|226735114|sp|B7NTQ7.1|RS18_ECO7I RecName: Full=30S ribosomal protein S18 [Escherichia coli IAI39]gi|226735115|sp|B7M9G4.1|RS18_ECO8A RecName: Full=30S ribosomal protein S18 [Escherichia coli IAI1]gi|226735116|sp|B7NGD6.1|RS18_ECOLU RecName: Full=30S ribosomal protein S18 [Escherichia coli UMN026]gi|226735117|sp|B1LQM1.1|RS18_ECOSM RecName: Full=30S ribosomal protein S18 [Escherichia coli SMS-3-5]gi|226735119|sp|B7LLY4.1|RS18_ESCF3 RecName: Full=30S ribosomal protein S18 [Escherichia fergusonii ATCC 35469]gi|229559983|sp|B6I2A8.1|RS18_ECOSE RecName: Full=30S ribosomal protein S18 [Escherichia coli SE11]gi|229559984|sp|B2VCV9.1|RS18_ERWT9 RecName: Full=30S ribosomal protein S18 [Erwinia tasmaniensis]gi|229559986|sp|B5Z2K7.1|RS18_ECO5E RecName: Full=30S ribosomal protein S18 [Escherichia coli O157:H7 str. EC4115]gi|229559987|sp|B1XDV2.1|RS18_ECODH RecName: Full=30S ribosomal protein S18 [Escherichia coli str. K-12 substr. DH10B]gi|229560040|sp|B5Y306.1|RS18_KLEP3 RecName: Full=30S ribosomal protein S18 [Klebsiella pneumoniae 342]gi|229560053|sp|B4F277.1|RS18_PROMH RecName: Full=30S ribosomal protein S18 [Proteus mirabilis HI4320]gi|229560058|sp|B5F3B9.1|RS18_SALA4 RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Agona str. SL483]gi|229560059|sp|B5FSA3.1|RS18_SALDC RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Dublin str. CT_02021853]gi|229560060|sp|B5R0S0.1|RS18_SALEP RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Enteritidis str. P125109]gi|229560061|sp|B5R9F0.1|RS18_SALG2 RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Gallinarum str. 287/91]gi|229560062|sp|B4TFD7.1|RS18_SALHS RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Heidelberg str. SL476]gi|229560063|sp|B4T3F3.1|RS18_SALNS RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Newport str. SL254]gi|229560064|sp|B5BKL1.1|RS18_SALPK RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Paratyphi A str. AKU_12601]gi|229560065|sp|B4TT35.1|RS18_SALSV RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633]gi|229560074|sp|B2TY75.1|RS18_SHIB3 RecName: Full=30S ribosomal protein S18 [Shigella boydii CDC 3083-94]gi|254765429|sp|B7UQL1.1|RS18_ECO27 RecName: Full=30S ribosomal protein S18 [Escherichia coli O127:H6 str. E2348/69]gi|254765430|sp|B7LCR3.1|RS18_ECO55 RecName: Full=30S ribosomal protein S18 [Escherichia coli 55989]gi|254765431|sp|B7MST2.1|RS18_ECO81 RecName: Full=30S ribosomal protein S18 [Escherichia coli ED1a]gi|254812953|sp|C0Q6G0.1|RS18_SALPC RecName: Full=30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Paratyphi C str. RKS4594]gi|259494808|sp|C4ZR79.1|RS18_ECOBW RecName: Full=30S ribosomal protein S18 [Escherichia coli BW2952]gi|259494809|sp|C5BF78.1|RS18_EDWI9 RecName: Full=30S ribosomal protein S18 [Edwardsiella ictaluri 93-146]
MARYFRRRKFCRFTAEGVQEIDYKDIATLKNYITESGKIVPSRITGTRAKYQRQLARAIKRARYLSLLPYTDRHQ

这就是其中的某一个蛋白,它对应的gi有很多个,因为这些gi是不同的基因或者不同植物的。比如1文件中有gi号为gi|67472372|sp|P0A7T7.2|RS18_ECOLI,我就想得到RecName: Full=30S ribosomal protein S18 [Escherichia coli K-12]。
大神你的整个pl没有问题的,就是对重复项进行删除会有影响。
12楼2014-10-21 22:01:44
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
查看全部 22 个回答

peterrjp

铁杆木虫 (著名写手)

【答案】应助回帖

感谢参与,应助指数 +1
用perl脚本很容易实现,现成的脚本我有,不过功能和参数比较多,怕你不会用,晚上给你弄个简化版

» 本帖已获得的红花(最新10朵)

2楼2014-10-21 18:05:07
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

大爷给跪了

新虫 (小有名气)

送红花一朵
引用回帖:
2楼: Originally posted by peterrjp at 2014-10-21 18:05:07
用perl脚本很容易实现,现成的脚本我有,不过功能和参数比较多,怕你不会用,晚上给你弄个简化版

万分感谢啊

» 本帖已获得的红花(最新10朵)

3楼2014-10-21 19:22:07
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

peterrjp

铁杆木虫 (著名写手)

【答案】应助回帖

★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★
大爷给跪了: 金币+20, ★★★★★最佳答案 2014-10-21 20:05:45
代码如下,请粘贴到空白文档,保存成.pl文件
用法:假设你的第一个文件是11111111.txt(你的图1), 第二个文件是nr.txt(你的图2),双击pl脚本,产生的include.txt就是你要的结果,可以用excel打开,自动会分成两列,第一列gi编号,第二列是编码蛋白名称

#!/usr/bin/perl
my $list_file="11111111.txt"; # 输入文件1
my $tb_file="nr.txt"; # 输入文件2
my ($ll,%gi1,@f);
open INCLUDE, ">include.txt" || die "Can't open include.txt";
open TMP, $list_file || die "Can't open $list_file";
while($ll = <TMP> {
        @f=split/\|/,$ll;
        $gi1{$f[1]} = 1;
}
close(TMP);
open(TMP, $tb_file) || die;
while($ll = <TMP>{
    chomp $ll;
        @f=split/\|/,$ll;
    if ($gi1{$f[1]}){
        print INCLUDE "$f[1]\t$f[4]\n";
    }
}
close TMP;
close INCLUDE;

» 本帖已获得的红花(最新10朵)

4楼2014-10-21 19:40:34
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
信息提示
请填处理意见