24小时热门版块排行榜    

查看: 791  |  回复: 0

emanlee

木虫 (小有名气)

[求助] 从两个不同fasta文件中寻找不重复的序列

题目:从两个不同fasta文件中寻找不重复的序列

第一个fasta文件aaa.fa中有40000条碱基序列或者氨基酸序列:
>gi|118600994|ref|NM_001079530.1| Homo sapiens cripto, FRL-1, cryptic family 1B (CFC1B), mRNA
ATGCCAAATACAGCCATGAAGAAAAAGGTGCTGCTGATGGGGAAGAGCGGGTCGGGGAAGACCAGCATGAGGTCGATAATCTT
>gi|57863286|ref|NM_006570.4| Homo sapiens Ras-related GTP binding A (RRAGA), mRNA
ACGCTCTACAAAGCCTGGTCCAGCATCGTCTACCAGCTGATTCCCAACGTTCAGCAGCTGGAGATGAACCTCAGGAATTTTG
>gi|254587897|ref|NM_178495.5| Homo sapiens inositol 1,4,5-trisphosphate receptor
CGCCAATTACATTGCTCGCGACACCCGGCGCCTGGGGGCCACCATTGACGTGGAACACTCCCACGTCCGATTCCTAGGGAACC
>gi|191252813|ref|NM_001128635.1| Homo sapiens RIMS binding protein 3B (RIMBP3B), mRNA
TGGTGCTGAACCTGTGGGACTGTGGCGGTCAGGACACCTTCATGGAAAATTACTTCACCAGCCAGCGAGACAATATCTTCCGTA
>gi|61656209|ref|NM_001013355.1| Homo sapiens olfactory receptor, family 2, subfamily G, member 6 (OR2G6), mRNA
ACGTGGAAGTTTTGATTTACGTGTTTGACGTGGAGAGCCGCGAACTGGAAAAGGACATGCATTATTACCAGTCGTGTCTGGAGG
第二个fasta文件bbb.fa中有40000条碱基序列或者氨基酸序列:
>gi|83267870|ref|NM_080431.4| Homo sapiens actin-related protein T2 (ACTRT2), mRNA
CCATCCTCCAGAACTCTCCTGACGCCAAAATCTTCTGCCTGGTGCACAAAATGGATCTGGTTCAGGAGGATCAGCGTGACCTGA
>gi|53828675|ref|NM_001001923.1| Homo sapiens olfactory receptor, family 5, subfamily C, member 1 (OR5C1), mRNA
TTTTTAAAGAGCGAGAGGAAGACCTGAGGCGTCTGTCTCGCCCGCTGGAGTGTGCTTGTTTTCGAACGTCCATCTGGGATGAG
>gi|52627150|ref|NM_001005276.1| Homo sapiens olfactory receptor, family 2, subfamily AE, member 1 (OR2AE1), mRNA
TTTTTAAAGAGCGAGAGGAAGACCTGAGGCGTCTGTCTCGCCCGCTGGAGTGTGCTTGTTTTCGAACGTCCATCTGGGATGAG
>gi|61656211|ref|NM_001013357.1| Homo sapiens olfactory receptor, family 8, subfamily U, member 9 (OR8U9), mRNA
ACGCTCTACAAAGCCTGGTCCAGCATCGTCTACCAGCTGATTCCCAACGTTCAGCAGCTGGAGATGAACCTCAGGAATTTTG
>gi|51871366|ref|NM_001004124.1| Homo sapiens olfactory receptor, family 4, subfamily P, member 4 (OR4P4), mRNA
ATGCCAAATACAGCCATGAAGAAAAAGGTGCTGCTGATGGGGAAGAGCGGGTCGGGGAAGACCAGCATGAGGTCGATAATCTT

我们想从第二个文件bbb.fa中找出与aaa.fa中的序列overlapping的序列(overlap-ratio<0.8),请问如何使用blast比对?
是否有现成的perl或者python,或者C代码可以直接使用?
回复此楼
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

智能机器人

Robot (super robot)

我们都爱小木虫

相关版块跳转 我要订阅楼主 emanlee 的主题更新
信息提示
请填处理意见