²é¿´: 2277  |  »Ø¸´: 6
µ±Ç°Ö»ÏÔʾÂú×ãÖ¸¶¨Ìõ¼þµÄ»ØÌû£¬µã»÷ÕâÀï²é¿´±¾»°ÌâµÄËùÓлØÌû

ldy2140

½ð³æ (СÓÐÃûÆø)

[½»Á÷] ÌÖÂÛÏÂÔõôͨ¹ýgiºÅÅúÁ¿»ñµÃÎïÖÖµÄdefinition ÒÑÓÐ4È˲ÎÓë

×öÉúÎïÐÅϢѧµÄ´ó¶¼±ÜÃâ²»ÁËÒªblast ÓÐʱ¾¡¹ÜÎÒÃÇblast³öÀ´µÄ½á¹ûºÜ¶àºÜÏÅÈË µ«»¹ÊÇÒª½«ÕâЩ½á¹û»ã×ܳÉexcel±í¸ñ
×î½ü¾ÍÓöµ½Á˺ÜÈÃÎÒÍ·ÌÛµÄÊÂÇé ÎÒ×öÁ˺ܶàתÔ˵°°×µÄ΢ÉúÎïÈ«¿âµÄblast µ«µÃµ½µÄtableÀïÖ»ÓÐÆ¥ÅäÎïÖÖµÄgiºÅ ÔÚ»ã×ܽá¹ûµÄʱºòÎÒÏë°ÑgiºÅ»»³ÉÎïÖÖÐÅÏ¢ ±ÈÈçÏñGBFFÀïµÄdefinitionÕâÖÖÄÜ˵Ã÷ÎïÖÖÒÅ´«±³¾°µÄ×Ö·û´®
ËùÒÔÎÒ¿¼ÂÇÓÃperlµÄÕýÔò±í´ïÊ½Ìæ»» дÁËÈçϵijÌÐò
CODE:
#!/usr/bin/perl
use Bio::Seq;
use Bio::DB::GenBank;

$gb = new Bio::DB::GenBank;
$^I = ".bak";

while (<>) {
  $line = $_;
  if ( /gi\|(\d+)\|/ ) {
    $gi = $1;
    $seq_obj = $gb->get_Seq_by_gi ($1);
    $def = $seq_obj->desc;
  }
  $_ = $line;
  s#\t.*?$gi.*?\t#\t$def\t#;
  print;
}

µ«ÊÇÔËÐÐÆðÀ´ËٶȺÜÂý¶øÇÒºÜÀË·Ñ´ø¿í ÒòΪÓõ½µÄÄ£¿éÊǽ«giºÅ¶ÔÓ¦µÄÕû¸öÐòÁÐÐÅÏ¢¶¼ÏÂÔØÏÂÀ´ È»ºó´ÓÖÐÌáÈ¡definition ËùÒÔЧÂʺܲî ÕâÊÇÎÒ»¨ºÜ¶Ìʱ¼äѧϰperlºÍbioperl±àдµÄ¼±¹¦½üÀûµÄ³ÌÐò ÆÚ´ý¸ßÊÖÅÄש

[ Last edited by ldy2140 on 2012-8-28 at 21:55 ]
»Ø¸´´ËÂ¥

» ²ÂÄãϲ»¶

» ±¾Ö÷ÌâÏà¹Ø¼ÛÖµÌùÍÆ¼ö£¬¶ÔÄúͬÑùÓаïÖú:

ÉìÊÖÕªÐÇ£¬Î´±ØÄãÈçÔ¸£¬µ«²»»áŪÔàÄãµÄÊÖ¡£
ÒÑÔÄ   »Ø¸´´ËÂ¥   ¹Ø×¢TA ¸øTA·¢ÏûÏ¢ ËÍTAºì»¨ TAµÄ»ØÌû

wizardfan

ÖÁ×ðľ³æ (ÖøÃûдÊÖ)

ÓÅÐã°æÖ÷

¡ï
Сľ³æ: ½ð±Ò+0.5, ¸ø¸öºì°ü£¬Ð»Ð»»ØÌû
You know my comments on how to deal with high throughput data analysis: download the genbank flat file and parse the local file, which can improves the efficiency dramatically.

About your code:
1. Use "use strict;" all the time
2. Regular expression is fine, but I would use $` $' (special variables containing the previous and next part of the matching part) instead of another s/// statement
5Â¥2012-08-29 22:54:33
ÒÑÔÄ   »Ø¸´´ËÂ¥   ¹Ø×¢TA ¸øTA·¢ÏûÏ¢ ËÍTAºì»¨ TAµÄ»ØÌû
²é¿´È«²¿ 7 ¸ö»Ø´ð

libralibra

ÖÁ×ðľ³æ (ÖøÃûдÊÖ)

æôÆï½«¾ü

¡ï
Сľ³æ: ½ð±Ò+0.5, ¸ø¸öºì°ü£¬Ð»Ð»»ØÌû
Ìù¸öÀý×Ó¿´¿´

ÌùÒ»¸öblast³öÀ´µÄ½á¹û(δ´¦ÀíµÄ×Ö·û´®)
ÌùÒ»¸öÄãÏëÒªµÄ½á¹û(Ä¿±ê×Ö·û´®)
matlab/VB/python/c++/Javaд³ÌÐòÇë·¢QQÓʼþ:790404545@qq.com
2Â¥2012-08-28 22:41:10
ÒÑÔÄ   »Ø¸´´ËÂ¥   ¹Ø×¢TA ¸øTA·¢ÏûÏ¢ ËÍTAºì»¨ TAµÄ»ØÌû

ldy2140

½ð³æ (СÓÐÃûÆø)

ÒýÓûØÌû:
2Â¥: Originally posted by libralibra at 2012-08-28 22:41:10
Ìù¸öÀý×Ó¿´¿´

ÌùÒ»¸öblast³öÀ´µÄ½á¹û(δ´¦ÀíµÄ×Ö·û´®)
ÌùÒ»¸öÄãÏëÒªµÄ½á¹û(Ä¿±ê×Ö·û´®)

sp|P23936|LACY_STRTR        gi|169822596|gb|ABJK02000022.1|        61.65        631        241        1        5        634        344705        342813        0.0         714
sp|P23936|LACY_STRTR        gi|223555729|gb|ACGH01000016.1|        57.01        628        260        1        2        619        65439        67322        0.0         692
Ìæ»»ºó
sp|P23936|LACY_STRTR        Streptococcus infantarius subsp. infantarius ATCC BAA-102 S_infantarius-2.0.1_Cont245, whole genome shotgun sequence.        61.65        631        241        1        5        634        344705        342813        0.0         714
sp|P23936|LACY_STRTR        Lactobacillus buchneri ATCC 11577 contig00018, whole genome shotgun sequence.        57.01        628        260        1        2        619        65439        67322        0.0         692
ÉìÊÖÕªÐÇ£¬Î´±ØÄãÈçÔ¸£¬µ«²»»áŪÔàÄãµÄÊÖ¡£
3Â¥2012-08-29 09:30:29
ÒÑÔÄ   »Ø¸´´ËÂ¥   ¹Ø×¢TA ¸øTA·¢ÏûÏ¢ ËÍTAºì»¨ TAµÄ»ØÌû

libralibra

ÖÁ×ðľ³æ (ÖøÃûдÊÖ)

æôÆï½«¾ü

¡ï
Сľ³æ: ½ð±Ò+0.5, ¸ø¸öºì°ü£¬Ð»Ð»»ØÌû
ÄãÊDz»ÊÇÓÐgiºÍdescriptionµÄ¶ÔÓ¦¹ØÏµ,Èç¹ûÓÐ,Ö±½ÓÕýÔòÌæ»»giÄDz¿·Ö¼´¿É.
Èç¹û±ØÐëÈ¥ÍøÂçÉϲé,²é»ØÀ´¿Ï¶¨giÐòºÅºÍdescriptionͬʱÓеÄ,Äã´¦ÀíÍêÁËÔÙдÎļþ
ÉúÎï²»¶®,²»¹ýÓÐbioperl,ËÑÁËÏÂ,Ò²ÓÐbiopython,ÕÕ×ÅÀý×ӸĸÄ,¿ÉÒÔÖ±½Ó´òÓ¡giºÅºÍ¶ÔÓ¦µÄdescription,¿ÉÒÔ¿´¿´.
×¼±¸Ñ§¸ö½Å±¾ÓïÑÔµÄʱºò,¿´¹ýperlºÍpythonµÄÓï·¨,¹û¶ÏÑ¡ÁËpython,perl²»¶®°¡
biopython½Ì³Ì:
http://biopython.org/DIST/docs/tutorial/Tutorial.html

Àý×Ó´úÂë,²âÊÔ¹ýÁË
CODE:
# Import the modules for interfacing with BLAST and parsing the output
from Bio.Blast import NCBIWWW, NCBIXML

# Blast the sequence of interest (in this case using the accession number
result_handle = NCBIWWW.qblast("blastn", "nr", "8332116")

# Parse the resulting output
blast_record = NCBIXML.read(result_handle)

# Loop over the alignments printing some output of interest
E_VALUE_THRESH = 0.004
for alignment in blast_record.alignments:
    result = alignment.title
    print 'gi no.: '+result.split()[0]
    print 'gi-desc: '+' '.join(result.split()[1:])
    print
##    for hsp in alignment.hsps:
##        if hsp.expect < E_VALUE_THRESH:
##            print
##            print '****Alignment****'
##            print 'sequence:', alignment.title
##            print 'length:', alignment.length
##            print 'e value:', hsp.expect
##            print hsp.query[0:75] + '...'
##            print hsp.match[0:75] + '...'
##            print hsp.sbjct[0:75] + '...'

½á¹û,giºÅºÍdescription¿ÉÒÔ·Ö±ðÌáÈ¡´òÓ¡:
CODE:
gi no.: gi|224094601|ref|XM_002310151.1|
gi-desc: Populus trichocarpa predicted protein, mRNA

gi no.: gi|359495761|ref|XM_002274845.2|
gi-desc: PREDICTED: Vitis vinifera uncharacterized LOC100267774 (LOC100267774), mRNA

gi no.: gi|349709091|emb|FQ378501.1|
gi-desc: Vitis vinifera clone SS0AEB13YG07

gi no.: gi|255562758|ref|XM_002522339.1|
gi-desc: Ricinus communis COR413-PM2, putative, mRNA

gi no.: gi|358346403|ref|XM_003637210.1|
gi-desc: Medicago truncatula Cold acclimation protein-like protein (MTR_079s1009) mRNA, complete cds

gi no.: gi|358344000|ref|XM_003636035.1|
gi-desc: Medicago truncatula Cold acclimation protein-like protein (MTR_026s0005) mRNA, complete cds

gi no.: gi|356561272|ref|XM_003548859.1|
gi-desc: PREDICTED: Glycine max uncharacterized protein LOC100817084 (LOC100817084), mRNA

gi no.: gi|356502211|ref|XM_003519866.1|
gi-desc: PREDICTED: Glycine max uncharacterized protein LOC100810337 (LOC100810337), mRNA

gi no.: gi|225311746|dbj|AK326681.1|
gi-desc: Solanum lycopersicum cDNA, clone: LEFL2011M15, HTC in fruit

gi no.: gi|255762732|gb|GQ370517.1|
gi-desc: Salvia miltiorrhiza cold acclimation protein (COR) mRNA, complete cds

gi no.: gi|225428595|ref|XM_002284686.1|
gi-desc: PREDICTED: Vitis vinifera uncharacterized LOC100248690 (LOC100248690), mRNA

gi no.: gi|297819785|ref|XM_002877730.1|
gi-desc: Arabidopsis lyrata subsp. lyrata COR413-PM2, mRNA

gi no.: gi|86755971|gb|DQ359747.1|
gi-desc: Chimonanthus praecox cold acclimation protein COR413-PM1 mRNA, complete cds

gi no.: gi|145339339|ref|NM_114943.4|
gi-desc: Arabidopsis thaliana cold-regulated 413-plasma membrane 2 (COR413-PM2) mRNA, complete cds

gi no.: gi|15810634|gb|AY056356.1|
gi-desc: Arabidopsis thaliana putative cold acclimation protein (At3g50830) mRNA, complete cds

gi no.: gi|10121842|gb|AF283005.1|
gi-desc: Arabidopsis thaliana cold acclimation protein WCOR413-like protein beta form mRNA, complete cds

gi no.: gi|13430785|gb|AF360305.1|
gi-desc: Arabidopsis thaliana putative cold acclimation protein (At3g50830) mRNA, complete cds

gi no.: gi|60317457|gb|AY761065.1|
gi-desc: Gossypium barbadense cold-related protein Cor413 (Cor413) mRNA, complete cds

gi no.: gi|255556172|ref|XM_002519075.1|
gi-desc: Ricinus communis COR413-PM2, putative, mRNA

gi no.: gi|156567558|gb|EU077497.1|
gi-desc: Poncirus trifoliata cold acclimation WCOR413-like protein mRNA, complete cds

gi no.: gi|46577795|gb|AY587773.1|
gi-desc: Tamarix androssowii putative stress-responsive protein mRNA, complete cds

gi no.: gi|305690597|gb|HQ010041.1|
gi-desc: Corylus heterophylla COR413-PM1 mRNA, complete cds

gi no.: gi|224105476|ref|XM_002313788.1|
gi-desc: Populus trichocarpa predicted protein, mRNA

gi no.: gi|242389633|emb|FP100664.1|
gi-desc: Phyllostachys edulis cDNA clone: bphylf036p06, full insert sequence

gi no.: gi|242382816|emb|FP092058.1|
gi-desc: Phyllostachys edulis cDNA clone: bphyem114p22, full insert sequence

gi no.: gi|242382391|emb|FP097178.1|
gi-desc: Phyllostachys edulis cDNA clone: bphylf028m11, full insert sequence

gi no.: gi|242381728|emb|FP091375.1|
gi-desc: Phyllostachys edulis cDNA clone: bphyst020e14, full insert sequence

gi no.: gi|238007351|gb|BT084358.1|
gi-desc: Zea mays full-length cDNA clone ZM_BFb0105L06 mRNA, complete cds

gi no.: gi|195636267|gb|EU965484.1|
gi-desc: Zea mays clone 286348 cold acclimation protein COR413-PM1 mRNA, complete cds

gi no.: gi|54652523|gb|BT017742.1|
gi-desc: Zea mays clone EL01N0449E04.c mRNA sequence

gi no.: gi|162459269|ref|NM_001111732.1|
gi-desc: Zea mays LOC542099 (gpm455), mRNA >gi|27902672|gb|AY181208.1| Zea mays cold acclimation protein COR413-PM1 mRNA, complete cds

gi no.: gi|21209119|gb|AY106041.1|
gi-desc: Zea mays PCO103483 mRNA sequence

gi no.: gi|242037992|ref|XM_002466346.1|
gi-desc: Sorghum bicolor hypothetical protein, mRNA

gi no.: gi|255617390|ref|XM_002539789.1|
gi-desc: Ricinus communis COR413-PM2, putative, mRNA

gi no.: gi|30690903|ref|NM_119885.2|
gi-desc: Arabidopsis thaliana cold acclimation protein WCOR413 (AT4G37220) mRNA, complete cds

gi no.: gi|26449888|dbj|AK117399.1|
gi-desc: Arabidopsis thaliana At4g37220 mRNA for putative ap2 cold acclimation protein, complete cds, clone: RAFL16-98-J01

gi no.: gi|226504237|ref|NM_001155133.1|
gi-desc: Zea mays cold acclimation protein COR413-PM1 (LOC100282221), mRNA >gi|195620729|gb|EU960077.1| Zea mays clone 221611 cold acclimation protein COR413-PM1 mRNA, complete cds

gi no.: gi|166359605|gb|EU365626.1|
gi-desc: Thellungiella halophila stress responsive protein (COR) mRNA, complete cds

gi no.: gi|150172175|emb|CU406592.1|
gi-desc: Oryza rufipogon (W1943) cDNA clone: ORW1943C102K01, full insert sequence

gi no.: gi|115455578|ref|NM_001057925.1|
gi-desc: Oryza sativa Japonica Group Os03g0767800 (Os03g0767800) mRNA, complete cds

gi no.: gi|10121844|gb|AF283006.1|
gi-desc: Oryza sativa (japonica cultivar-group) cold acclimation protein WCOR413-like protein mRNA, complete cds

gi no.: gi|32976054|dbj|AK066036.1|
gi-desc: Oryza sativa Japonica Group cDNA clone:J013049B03, full insert sequence

gi no.: gi|32970924|dbj|AK060906.1|
gi-desc: Oryza sativa Japonica Group cDNA clone:001-035-F05, full insert sequence

gi no.: gi|32970018|dbj|AK060000.1|
gi-desc: Oryza sativa Japonica Group cDNA clone:006-301-G09, full insert sequence

gi no.: gi|28973358|gb|BT005584.1|
gi-desc: Arabidopsis thaliana clone U50435 putative cold acclimation protein homolog (At4g37220) mRNA, complete cds

gi no.: gi|326534181|dbj|AK358227.1|
gi-desc: Hordeum vulgare subsp. vulgare mRNA for predicted protein, complete cds, clone: NIASHv1071H11

gi no.: gi|160954667|emb|CU225096.1|
gi-desc: Populus EST from leave

gi no.: gi|160950966|emb|CU229055.1|
gi-desc: Populus EST from severe drought-stressed leaves

gi no.: gi|357114154|ref|XR_137736.1|
gi-desc: PREDICTED: Brachypodium distachyon uncharacterized LOC100844112 (LOC100844112), miscRNA

gi no.: gi|224035946|gb|BT070152.1|
gi-desc: Zea mays full-length cDNA clone ZM_BFc0138N11 mRNA, complete cds

matlab/VB/python/c++/Javaд³ÌÐòÇë·¢QQÓʼþ:790404545@qq.com
4Â¥2012-08-29 17:25:33
ÒÑÔÄ   »Ø¸´´ËÂ¥   ¹Ø×¢TA ¸øTA·¢ÏûÏ¢ ËÍTAºì»¨ TAµÄ»ØÌû
×î¾ßÈËÆøÈÈÌûÍÆ¼ö [²é¿´È«²¿] ×÷Õß »Ø/¿´ ×îºó·¢±í
[¿¼ÑÐ] ²ÄÁÏ292µ÷¼Á +4 éÙËÌ˼ÃÀÈË 2026-03-23 4/200 2026-03-23 23:16 by peike
[¿¼ÑÐ] Ò»Ö¾Ô¸ÎäÀí²ÄÁϹ¤³Ì348Çóµ÷¼Á +6 £þ^£þ©bº¹ 2026-03-19 9/450 2026-03-23 19:53 by pswait
[¿¼ÑÐ] Ò»Ö¾Ô¸ÉÂʦ´óÉúÎïѧ071000£¬298·Ö£¬Çóµ÷¼Á +3 SYA£¡ 2026-03-23 3/150 2026-03-23 19:09 by macy2011
[¿¼ÑÐ] ¿¼Ñл¯Ñ§308·ÖÇóµ÷¼Á +7 ÄãºÃÃ÷ÌìÄãºÃ 2026-03-23 8/400 2026-03-23 18:39 by macy2011
[¿¼ÑÐ] 08¹¤Ñ§µ÷¼Á +7 Óû§573181 2026-03-20 11/550 2026-03-23 15:47 by ÎÒ°®Ñ§Ï°Ñ§Ï°Ê¹Î
[¿¼ÑÐ] 276Çóµ÷¼Á +3 YNRYG 2026-03-21 4/200 2026-03-23 08:31 by ×íÔÚ·çÀï
[¿¼ÑÐ] Çóµ÷¼ÁÒ»Ö¾Ô¸º£´ó£¬0703»¯Ñ§Ñ§Ë¶304·Ö£¬Óдó´´ÏîÄ¿£¬Ëļ¶Òѹý +6 ÐÒÔËÁ¨Á¨ 2026-03-22 10/500 2026-03-22 20:10 by edmund7
[¿¼ÑÐ] 298Çóµ÷¼ÁÒ»Ö¾Ô¸211 +3 Éϰ¶6666@ 2026-03-20 3/150 2026-03-22 15:50 by ColorlessPI
[¿¼ÑÐ] 285Çóµ÷¼Á +6 ytter 2026-03-22 6/300 2026-03-22 12:09 by ÐÇ¿ÕÐÇÔÂ
[¿¼ÑÐ] ²ÄÁÏѧ˶301·ÖÇóµ÷¼Á +7 Liyouyumairs 2026-03-21 7/350 2026-03-21 22:31 by peike
[¿¼ÑÐ] ²ÄÁÏÇóµ÷¼Á +5 @taotao 2026-03-21 5/250 2026-03-21 20:55 by lbsjt
[¿¼ÑÐ] ²ÄÁϹ¤³Ìר˶ 348·ÖÇóµ÷¼Á +3 ¶¬´Ç. 2026-03-17 5/250 2026-03-21 18:47 by ѧԱ8dgXkO
[¿¼ÑÐ] Äϲý´óѧ²ÄÁÏר˶311·ÖÇóµ÷¼Á +6 77chaselx 2026-03-20 6/300 2026-03-21 07:24 by JourneyLucky
[¿¼ÑÐ] Ò»Ö¾Ô¸ Î÷±±´óѧ £¬070300»¯Ñ§Ñ§Ë¶£¬×Ü·Ö287£¬Ë«·ÇÒ»±¾£¬Çóµ÷¼Á¡£ +3 ³¿»èÏßÓëÐǺ£ 2026-03-18 3/150 2026-03-21 00:46 by JourneyLucky
[¿¼ÑÐ] 274Çóµ÷¼Á +10 S.H1 2026-03-18 10/500 2026-03-20 23:51 by JourneyLucky
[¿¼ÑÐ] ¿¼Ñе÷¼ÁÇóÑ§Ð£ÍÆ¼ö +3 ²®ÀÖ29 2026-03-18 5/250 2026-03-20 22:59 by JourneyLucky
[¿¼ÑÐ] 290Çóµ÷¼Á +7 ^O^Ø¿ 2026-03-19 7/350 2026-03-20 21:43 by JourneyLucky
[¿¼ÑÐ] 320Çóµ÷¼Á0856 +3 ²»ÏëÆðÃû×Ö112 2026-03-19 3/150 2026-03-19 22:53 by ѧԱ8dgXkO
[¿¼ÑÐ] ¡¾Í¬¼ÃÈí¼þ¡¿Èí¼þ£¨085405£©¿¼ÑÐÇóµ÷¼Á +3 2026eternal 2026-03-18 3/150 2026-03-18 19:09 by ²«»÷518
[¿¼²©] 26²©Ê¿ÉêÇë +3 1042136743 2026-03-17 3/150 2026-03-17 23:30 by ÇáËɲ»ÉÙËæ
ÐÅÏ¢Ìáʾ
ÇëÌî´¦ÀíÒâ¼û