24小时热门版块排行榜    

CyRhmU.jpeg
查看: 2176  |  回复: 7

ldy2140

金虫 (小有名气)


[交流] NCBI在线blast的问题

课题与微生物相关 近期需要做大量的序列比对工作 策略是以特定蛋白序列对微生物基因组核酸数据库做tblastn 从blast结果中初步确定哪些微生物具有同源蛋白 后续会做试验验证
按照设计的策略 需要一个全体微生物基因组的数据库 去冗余但还要全 前期考虑过用Nucleotide collection(nr/nt) 但结果数据量太大 处理起来效率不高 而且有冗余 后来发现NCBI提供全微生物基因组的blast功能 具体链接是
CODE:
http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=microb

其中的列表可以选择不同的微生物种属 而且包含了WGS未拼接好的片段 从这点看来还是很理想的 但缺点就是这些列表中的序列并没有合并成单一文件存放在NCBI服务器上 运算起来十分缓慢 也就是说blast程序需要一个条目一个条目的访问genbank 然后再把这些比对结果一个个汇总成起来 最后呈现在网页上 效率低下
所以我考虑是不是要建立一个本地数据库包含列表中的全部条目 但列表不定期更新实在是让人头疼 每天我都要检查看是否有新发布的全基因组序列 或者有哪些原本是contigs的拼接成了complete genome 然后增加或者替换本地数据库中的条目
不知道还有没有更好的解决方案 期待高手回复

[ Last edited by ldy2140 on 2012-9-13 at 16:30 ]
回复此楼

» 猜你喜欢

» 本主题相关价值贴推荐,对您同样有帮助:

» 抢金币啦!回帖就可以得到:

查看全部散金贴

已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
回帖置顶 ( 共有1个 )

ldy2140(金币+1): 谢谢参与
ldy2140: 回帖置顶 2012-09-13 13:58:37
nr好像是蛋白质数据库,而且它不含相同序列的。
看过RefSeq吗?是non-redundant的

不知道你这句话怎么来的“但缺点就是这些列表中的序列并没有合并成单一文件存放在NCBI服务器上 运算起来十分缓慢 也就是说blast程序需要一个条目一个条目的访问genbank 然后再把这些比对结果一个个汇总成起来 最后呈现在网页上 效率低下”
3楼2012-09-13 05:30:57
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

ldy2140

金虫 (小有名气)


引用回帖:
3楼: Originally posted by wizardfan at 2012-09-13 05:30:57
nr好像是蛋白质数据库,而且它不含相同序列的。
看过RefSeq吗?是non-redundant的

不知道你这句话怎么来的“但缺点就是这些列表中的序列并没有合并成单一文件存放在NCBI服务器上 运算起来十分缓慢 也就是说blas ...

我做的是tblastn 所以是核酸数据库 nr在这里指Nucleotide collection(nr/nt)
nr: All GenBank + EMBL + DDBJ + PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). No longer "non-redundant" due to computational cost.
RefSeq做过但感觉不是很全
因为得到的blast结果中的database后面跟的是很长一段的字符串 而单一文件是只有一个数据库的名字而已
5楼2012-09-13 16:44:30
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
引用回帖:
5楼: Originally posted by ldy2140 at 2012-09-13 16:44:30
我做的是tblastn 所以是核酸数据库 nr在这里指Nucleotide collection(nr/nt)
nr: All GenBank + EMBL + DDBJ + PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). No longer "non- ...

我记得nt是DNA库。
“因为得到的blast结果中的database后面跟的是很长一段的字符串 而单一文件是只有一个数据库的名字而已” 有例子吗?
7楼2012-09-14 06:42:20
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

ldy2140

金虫 (小有名气)


引用回帖:
7楼: Originally posted by wizardfan at 2012-09-14 06:42:20
我记得nt是DNA库。
“因为得到的blast结果中的database后面跟的是很长一段的字符串 而单一文件是只有一个数据库的名字而已” 有例子吗?...

CODE:
Database: Completed Aeropyrum pernix K1; Completed Desulfurococcus
fermentans DSM 16532; Completed Desulfurococcus kamchatkensis 1221n;
Completed Desulfurococcus mucosus DSM 2162; Completed Hyperthermus
butylicus DSM 5456; Completed Ignicoccus hospitalis KIN4/I; Completed
Ignisphaera aggregans DSM 17230; Completed Pyrolobus fumarii 1A;
Completed Staphylothermus hellenicus DSM 12710; Completed
Staphylothermus marinus F1; Completed Thermogladius cellulolyticus
1633; Completed Thermosphaera aggregans DSM 11486; Completed Acidianus
hospitalis W1; Completed Metallosphaera cuprina Ar-4; Completed
Metallosphaera sedula DSM 5348; Metallosphaera yellowstonensis MK1
genomic sequences; Completed Sulfolobus acidocaldarius DSM 639;
Sulfolobus islandicus genomic sequences; Completed Sulfolobus
solfataricus; Sulfolobus solfataricus 98/2 genomic sequences;
Completed Sulfolobus tokodaii str. 7; Completed Caldivirga
maquilingensis IC-167; Completed Pyrobaculum aerophilum str. IM2;
Completed Pyrobaculum arsenaticum DSM 13514; Completed Pyrobaculum
calidifontis JCM 11548; Completed Pyrobaculum islandicum DSM 4184;
Completed Pyrobaculum neutrophilum V24Sta; Completed Pyrobaculum
oguniense TE7; Completed Pyrobaculum sp. 1860; Completed Thermofilum
pendens Hrk 5; Completed Thermoproteus uzoniensis 768-20; Completed
Vulcanisaeta distributa DSM 14429; Completed Vulcanisaeta moutnovskia
768-28; Completed Archaeoglobus fulgidus DSM 4304; Completed
Archaeoglobus profundus DSM 5631; Completed Archaeoglobus veneficus
SNP6; Completed Ferroglobus placidus DSM 10642; Haladaptatus
paucihalophilus DX253 genomic sequences; Completed Halalkalicoccus
jeotgali B3; Completed Haloarcula hispanica ATCC 33960; Completed
Haloarcula marismortui ATCC 43049; Completed Halobacterium salinarum;
Halobacterium sp. DL1 genomic sequences; Halobiforma lacisalsi AJ5
genomic sequences; Halococcus hamelinensis 100A6 genomic sequences;
Completed Haloferax mediterranei ATCC 33500; Completed Haloferax
volcanii DS2; Completed Halogeometricum borinquense DSM 11551;
Halogranum salarium B-1 genomic sequences; Completed Halomicrobium
mukohataei DSM 12286; Completed Halopiger xanaduensis SH-6; Completed
Haloquadratum walsbyi; Halorhabdus tiamatea SARL4B genomic sequences;
Completed Halorhabdus utahensis DSM 12940; Completed Halorubrum
lacusprofundi ATCC 49239; Completed Haloterrigena turkmenica DSM 5511;
Completed Natrialba magadii ATCC 43099; Natrinema pellirubrum DSM
15624 genomic sequences; Completed Natrinema sp. J7-2;
Natronobacterium gregoryi SP2 genomic sequences; Completed
Natronomonas pharaonis DSM 2160; Completed Methanobacterium sp. AL-21;
Completed Methanobacterium sp. SWAN-1; Completed Methanobrevibacter
ruminantium M1; Methanobrevibacter smithii genomic sequences;
Completed Methanosphaera stadtmanae DSM 3091; Completed
Methanothermobacter marburgensis str. Marburg; Completed
Methanothermobacter thermautotrophicus str. Delta H; Completed
Methanothermus fervidus DSM 2088; Completed Methanocaldococcus fervens
AG86; Completed Methanocaldococcus infernus ME; Completed
Methanocaldococcus jannaschii DSM 2661; Completed Methanocaldococcus
sp. FS406-22; Completed Methanocaldococcus vulcanius M7; Completed
Methanococcus aeolicus Nankai-3; Completed Methanococcus maripaludis;
Completed Methanococcus vannielii SB; Completed Methanococcus voltae
A3; Completed Methanothermococcus okinawensis IH1; Methanotorris
formicicus Mc-S-70 genomic sequences; Completed Methanotorris igneus
Kol 5; Completed Methanocorpusculum labreanum Z; Completed
Methanoculleus bourgensis MS2; Completed Methanoculleus marisnigri
JR1; Methanofollis liminatans DSM 4140 genomic sequences; Methanolinea
tarda NOBI-1 genomic sequences; Methanoplanus limicola DSM 2279
genomic sequences; Completed Methanoplanus petrolearius DSM 11571;
Completed Methanoregula boonei 6A8; Completed Methanosphaerula
palustris E1-9c; Completed Methanospirillum hungatei JF-1; Completed
Methanopyrus kandleri AV19; Completed Methanococcoides burtonii DSM
6242; Completed Methanohalobium evestigatum Z-7303; Completed
Methanohalophilus mahii DSM 5219; Completed Methanosaeta concilii GP6;
Completed Methanosaeta harundinacea 6Ac; Completed Methanosaeta
thermophila PT; Completed Methanosalsum zhilinae DSM 4017; Completed
Methanosarcina acetivorans C2A; Completed Methanosarcina barkeri str.
Fusaro; Completed Methanosarcina mazei Go1; Completed Pyrococcus
abyssi GE5; Completed Pyrococcus furiosus DSM 3638; Completed
Pyrococcus horikoshii OT3; Completed Pyrococcus sp. NA2; Completed
Pyrococcus sp. ST04; Completed Pyrococcus yayanosii CH1; Completed
Thermococcus barophilus MP; Completed Thermococcus gammatolerans EJ3;
Completed Thermococcus kodakarensis KOD1; Thermococcus litoralis DSM
5473 genomic sequences; Completed Thermococcus onnurineus NA1;
Completed Thermococcus sibiricus MM 739; Completed Thermococcus sp.
4557; Completed Thermococcus sp. AM4; Completed Thermococcus sp. CL1;
Thermococcus zilligii AN1 genomic sequences; Ferroplasma acidarmanus
fer1 genomic sequences; Completed Picrophilus torridus DSM 9790;
Completed Thermoplasma acidophilum DSM 1728; Completed Thermoplasma
volcanium GSS1; Completed Nanoarchaeum equitans Kin4-M; Completed
Acidimicrobium ferrooxidans DSM 10331; Completed Acidothermus
cellulolyticus 11B; Actinoalloteichus spitiensis RMV-1378 genomic
sequences; Actinomyces coleocanis DSM 15436 genomic sequences;
Actinomyces georgiae F0490 genomic sequences; Actinomyces graevenitzii
C83 genomic sequences; Actinomyces massiliensis genomic sequences;
Actinomyces naeslundii str. Howell 279 genomic sequences; Actinomyces
odontolyticus genomic sequences; Actinomyces oris K20 genomic
sequences; Actinomyces sp. ICM39 genomic sequences; Actinomyces sp.
ICM47 genomic sequences; Actinomyces sp. oral taxon 170 str. F0386
genomic sequences; Actinomyces sp. oral taxon 171 str. F0337 genomic
sequences; Actinomyces sp. oral taxon 175 str. F0384 genomic
sequences; Actinomyces sp. oral taxon 178 str. F0338 genomic
sequences; Actinomyces sp. oral taxon 180 str. F0310 genomic
sequences; Actinomyces sp. oral taxon 448 str. F0400 genomic
sequences; Actinomyces sp. oral taxon 848 str. F0332 genomic
sequences; Actinomyces sp. oral taxon 849 str. F0330 genomic
sequences; Actinomyces sp. ph3 genomic sequences; Actinomyces
urogenitalis DSM 15434 genomic sequences; Actinomyces viscosus C505
genomic sequences; Completed Actinoplanes missouriensis 431; Completed
Actinoplanes sp. SE50/110; Completed Actinosynnema mirum DSM 43827;
Aeromicrobium marinum DSM 15272 genomic sequences; Aeromicrobium sp.
JC14 genomic sequences; Completed Amycolatopsis mediterranei;
Amycolatopsis sp. ATCC 39116 genomic sequences; Completed
Amycolicicoccus subflavus DQS3-9A1; Completed Arcanobacterium
haemolyticum DSM 20595; Completed Arthrobacter arilaitensis Re117;
Completed Arthrobacter aurescens TC1; Completed Arthrobacter
chlorophenolicus A6; Arthrobacter globiformis NBRC 12137 genomic
sequences; Completed Arthrobacter phenanthrenivorans Sphe3; Completed
Arthrobacter sp. FB24; Arthrobacter sp. M2012083 genomic sequences;
Completed Atopobium parvulum DSM 20469; Atopobium rimae ATCC 49626
genomic sequences; Atopobium sp. ICM58 genomic sequences; Atopobium
vaginae genomic sequences; Completed Beutenbergia cavernae DSM 12333;
Completed Bifidobacterium adolescentis; Bifidobacterium adolescentis
L2-32 genomic sequences; Bifidobacterium angulatum DSM 20098 = JCM
7096 genomic sequences; Bifidobacterium animalis genomic sequences;
Bifidobacterium bifidum genomic sequences; Bifidobacterium breve
genomic sequences; Bifidobacterium catenulatum DSM 16992 = JCM 1194
genomic sequences; Bifidobacterium dentium genomic sequences;
Bifidobacterium gallicum DSM 20093 genomic sequences; Bifidobacterium
longum genomic sequences; Bifidobacterium pseudocatenulatum DSM 20438
= JCM 1200 genomic sequences; Bifidobacterium sp. 12_1_47BFAA genomic
sequences; Completed Blastococcus saxobsidens DD2; Completed
Brachybacterium faecium DSM 4810; Brachybacterium paraconglomeratum
LC44 genomic sequences; Brachybacterium squillarum M-6-3 genomic
sequences; Brevibacterium linens BL2 genomic sequences; Brevibacterium
massiliense 5401308 genomic sequences; Brevibacterium mcbrellneri ATCC
49030 genomic sequences; Brevibacterium sp. JC43 genomic sequences;
Candidatus Aquiluna sp. IMCC13023 genomic sequences; Completed
Catenulispora acidiphila DSM 44928; Completed Cellulomonas fimi ATCC
484; Completed Cellulomonas flavigena DSM 20109; Cellulomonas sp.
JC225 genomic sequences; Citricoccus sp. CH26A genomic sequences;
Completed Clavibacter michiganensis; Collinsella aerofaciens ATCC
25986 genomic sequences; Collinsella intestinalis DSM 13280 genomic
sequences; Collinsella stercoris DSM 13279 genomic sequences;
Collinsella tanakaei YIT 12063 genomic sequences; Completed
Conexibacter woesei DSM 14684; Coriobacteriaceae bacterium JC110
genomic sequences; Coriobacteriaceae bacterium phI genomic sequences;
Completed Coriobacterium glomerans PW2; Corynebacterium accolens
genomic sequences; Corynebacterium ammoniagenes DSM 20306 genomic
sequences; Corynebacterium amycolatum SK46 genomic sequences;
Corynebacterium aurimucosum ATCC 700975 genomic sequences;
Corynebacterium bovis DSM 20582 genomic sequences; Corynebacterium
casei UCMA 3821 genomic sequences; Corynebacterium diphtheriae genomic
sequences; Corynebacterium efficiens YS-314 genomic sequences;
Corynebacterium genitalium ATCC 33030 genomic sequences;
Corynebacterium glucuronolyticum genomic sequences; Corynebacterium
glutamicum genomic sequences; Completed Corynebacterium jeikeium;
Corynebacterium jeikeium ATCC 43734 genomic sequences; Completed
Corynebacterium kroppenstedtii DSM 44385; Corynebacterium
lipophiloflavum DSM 44291 genomic sequences……

后面还有很多 不能都粘过来

[ Last edited by ldy2140 on 2012-9-14 at 19:55 ]
8楼2012-09-14 19:54:22
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
简单回复
2012-09-12 22:53   回复  
ldy2140(金币+1): 谢谢参与
2012-09-13 09:36   回复  
ldy2140(金币+1): 谢谢参与
litingwen6楼
2012-09-13 20:01   回复  
ldy2140(金币+1): 谢谢参与
相关版块跳转 我要订阅楼主 ldy2140 的主题更新
普通表情 高级回复(可上传附件)
信息提示
请填处理意见