★ 小木虫: 金币+0.5, 给个红包,谢谢回帖
你是不是有gi和description的对应关系,如果有,直接正则替换gi那部分即可.
如果必须去网络上查,查回来肯定gi序号和description同时有的,你处理完了再写文件
生物不懂,不过有bioperl,搜了下,也有biopython,照着例子改改,可以直接打印gi号和对应的description,可以看看.
准备学个脚本语言的时候,看过perl和python的语法,果断选了python,perl不懂啊
biopython教程:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
例子代码,测试过了CODE: # Import the modules for interfacing with BLAST and parsing the output
from Bio.Blast import NCBIWWW, NCBIXML
# Blast the sequence of interest (in this case using the accession number
result_handle = NCBIWWW.qblast("blastn", "nr", "8332116")
# Parse the resulting output
blast_record = NCBIXML.read(result_handle)
# Loop over the alignments printing some output of interest
E_VALUE_THRESH = 0.004
for alignment in blast_record.alignments:
result = alignment.title
print 'gi no.: '+result.split()[0]
print 'gi-desc: '+' '.join(result.split()[1:])
print
## for hsp in alignment.hsps:
## if hsp.expect < E_VALUE_THRESH:
## print
## print '****Alignment****'
## print 'sequence:', alignment.title
## print 'length:', alignment.length
## print 'e value:', hsp.expect
## print hsp.query[0:75] + '...'
## print hsp.match[0:75] + '...'
## print hsp.sbjct[0:75] + '...'
结果,gi号和description可以分别提取打印:CODE: gi no.: gi|224094601|ref|XM_002310151.1|
gi-desc: Populus trichocarpa predicted protein, mRNA
gi no.: gi|359495761|ref|XM_002274845.2|
gi-desc: PREDICTED: Vitis vinifera uncharacterized LOC100267774 (LOC100267774), mRNA
gi no.: gi|349709091|emb|FQ378501.1|
gi-desc: Vitis vinifera clone SS0AEB13YG07
gi no.: gi|255562758|ref|XM_002522339.1|
gi-desc: Ricinus communis COR413-PM2, putative, mRNA
gi no.: gi|358346403|ref|XM_003637210.1|
gi-desc: Medicago truncatula Cold acclimation protein-like protein (MTR_079s1009) mRNA, complete cds
gi no.: gi|358344000|ref|XM_003636035.1|
gi-desc: Medicago truncatula Cold acclimation protein-like protein (MTR_026s0005) mRNA, complete cds
gi no.: gi|356561272|ref|XM_003548859.1|
gi-desc: PREDICTED: Glycine max uncharacterized protein LOC100817084 (LOC100817084), mRNA
gi no.: gi|356502211|ref|XM_003519866.1|
gi-desc: PREDICTED: Glycine max uncharacterized protein LOC100810337 (LOC100810337), mRNA
gi no.: gi|225311746|dbj|AK326681.1|
gi-desc: Solanum lycopersicum cDNA, clone: LEFL2011M15, HTC in fruit
gi no.: gi|255762732|gb|GQ370517.1|
gi-desc: Salvia miltiorrhiza cold acclimation protein (COR) mRNA, complete cds
gi no.: gi|225428595|ref|XM_002284686.1|
gi-desc: PREDICTED: Vitis vinifera uncharacterized LOC100248690 (LOC100248690), mRNA
gi no.: gi|297819785|ref|XM_002877730.1|
gi-desc: Arabidopsis lyrata subsp. lyrata COR413-PM2, mRNA
gi no.: gi|86755971|gb|DQ359747.1|
gi-desc: Chimonanthus praecox cold acclimation protein COR413-PM1 mRNA, complete cds
gi no.: gi|145339339|ref|NM_114943.4|
gi-desc: Arabidopsis thaliana cold-regulated 413-plasma membrane 2 (COR413-PM2) mRNA, complete cds
gi no.: gi|15810634|gb|AY056356.1|
gi-desc: Arabidopsis thaliana putative cold acclimation protein (At3g50830) mRNA, complete cds
gi no.: gi|10121842|gb|AF283005.1|
gi-desc: Arabidopsis thaliana cold acclimation protein WCOR413-like protein beta form mRNA, complete cds
gi no.: gi|13430785|gb|AF360305.1|
gi-desc: Arabidopsis thaliana putative cold acclimation protein (At3g50830) mRNA, complete cds
gi no.: gi|60317457|gb|AY761065.1|
gi-desc: Gossypium barbadense cold-related protein Cor413 (Cor413) mRNA, complete cds
gi no.: gi|255556172|ref|XM_002519075.1|
gi-desc: Ricinus communis COR413-PM2, putative, mRNA
gi no.: gi|156567558|gb|EU077497.1|
gi-desc: Poncirus trifoliata cold acclimation WCOR413-like protein mRNA, complete cds
gi no.: gi|46577795|gb|AY587773.1|
gi-desc: Tamarix androssowii putative stress-responsive protein mRNA, complete cds
gi no.: gi|305690597|gb|HQ010041.1|
gi-desc: Corylus heterophylla COR413-PM1 mRNA, complete cds
gi no.: gi|224105476|ref|XM_002313788.1|
gi-desc: Populus trichocarpa predicted protein, mRNA
gi no.: gi|242389633|emb|FP100664.1|
gi-desc: Phyllostachys edulis cDNA clone: bphylf036p06, full insert sequence
gi no.: gi|242382816|emb|FP092058.1|
gi-desc: Phyllostachys edulis cDNA clone: bphyem114p22, full insert sequence
gi no.: gi|242382391|emb|FP097178.1|
gi-desc: Phyllostachys edulis cDNA clone: bphylf028m11, full insert sequence
gi no.: gi|242381728|emb|FP091375.1|
gi-desc: Phyllostachys edulis cDNA clone: bphyst020e14, full insert sequence
gi no.: gi|238007351|gb|BT084358.1|
gi-desc: Zea mays full-length cDNA clone ZM_BFb0105L06 mRNA, complete cds
gi no.: gi|195636267|gb|EU965484.1|
gi-desc: Zea mays clone 286348 cold acclimation protein COR413-PM1 mRNA, complete cds
gi no.: gi|54652523|gb|BT017742.1|
gi-desc: Zea mays clone EL01N0449E04.c mRNA sequence
gi no.: gi|162459269|ref|NM_001111732.1|
gi-desc: Zea mays LOC542099 (gpm455), mRNA >gi|27902672|gb|AY181208.1| Zea mays cold acclimation protein COR413-PM1 mRNA, complete cds
gi no.: gi|21209119|gb|AY106041.1|
gi-desc: Zea mays PCO103483 mRNA sequence
gi no.: gi|242037992|ref|XM_002466346.1|
gi-desc: Sorghum bicolor hypothetical protein, mRNA
gi no.: gi|255617390|ref|XM_002539789.1|
gi-desc: Ricinus communis COR413-PM2, putative, mRNA
gi no.: gi|30690903|ref|NM_119885.2|
gi-desc: Arabidopsis thaliana cold acclimation protein WCOR413 (AT4G37220) mRNA, complete cds
gi no.: gi|26449888|dbj|AK117399.1|
gi-desc: Arabidopsis thaliana At4g37220 mRNA for putative ap2 cold acclimation protein, complete cds, clone: RAFL16-98-J01
gi no.: gi|226504237|ref|NM_001155133.1|
gi-desc: Zea mays cold acclimation protein COR413-PM1 (LOC100282221), mRNA >gi|195620729|gb|EU960077.1| Zea mays clone 221611 cold acclimation protein COR413-PM1 mRNA, complete cds
gi no.: gi|166359605|gb|EU365626.1|
gi-desc: Thellungiella halophila stress responsive protein (COR) mRNA, complete cds
gi no.: gi|150172175|emb|CU406592.1|
gi-desc: Oryza rufipogon (W1943) cDNA clone: ORW1943C102K01, full insert sequence
gi no.: gi|115455578|ref|NM_001057925.1|
gi-desc: Oryza sativa Japonica Group Os03g0767800 (Os03g0767800) mRNA, complete cds
gi no.: gi|10121844|gb|AF283006.1|
gi-desc: Oryza sativa (japonica cultivar-group) cold acclimation protein WCOR413-like protein mRNA, complete cds
gi no.: gi|32976054|dbj|AK066036.1|
gi-desc: Oryza sativa Japonica Group cDNA clone:J013049B03, full insert sequence
gi no.: gi|32970924|dbj|AK060906.1|
gi-desc: Oryza sativa Japonica Group cDNA clone:001-035-F05, full insert sequence
gi no.: gi|32970018|dbj|AK060000.1|
gi-desc: Oryza sativa Japonica Group cDNA clone:006-301-G09, full insert sequence
gi no.: gi|28973358|gb|BT005584.1|
gi-desc: Arabidopsis thaliana clone U50435 putative cold acclimation protein homolog (At4g37220) mRNA, complete cds
gi no.: gi|326534181|dbj|AK358227.1|
gi-desc: Hordeum vulgare subsp. vulgare mRNA for predicted protein, complete cds, clone: NIASHv1071H11
gi no.: gi|160954667|emb|CU225096.1|
gi-desc: Populus EST from leave
gi no.: gi|160950966|emb|CU229055.1|
gi-desc: Populus EST from severe drought-stressed leaves
gi no.: gi|357114154|ref|XR_137736.1|
gi-desc: PREDICTED: Brachypodium distachyon uncharacterized LOC100844112 (LOC100844112), miscRNA
gi no.: gi|224035946|gb|BT070152.1|
gi-desc: Zea mays full-length cDNA clone ZM_BFc0138N11 mRNA, complete cds