|
ËÑË÷ÐòÁÐÊý¾Ý¿â
·ÖÎöÈκÎÐÂÐòÁеĵÚÒ»²½ÏÔÈ»ÊÇËÑË÷ÐòÁÐÊý¾Ý¿âÒÔ·¢ÏÖͬԴÐòÁС£ÕâÑùµÄËÑË÷¿ÉÒÔÔÚÈκεط½»òÕßÔÚÈκμÆËã»úÉÏÍê³É¡£¶øÇÒ£¬ÓÐÐí¶àWEB·þÎñÆ÷¿ÉÒÔ½øÐдËÀàËÑË÷£¬¿ÉÒÔÊäÈë»òÕ³ÌùÐòÁе½·þÎñÆ÷Éϲ¢½»»¥Ê½µØ½ÓÊÕ½á¹û¡£
ÐòÁÐËÑË÷Ò²ÓÐÐí¶à·½·¨£¬Ä¿Ç°×îÓÐÃûµÄÊÇBLAST³ÌÐò¡£¿ÉÒÔÈÝÒ׵õ½ÔÚ±¾µØÔËÐеİ汾£¨´Ó NCBI »òÕß Washington University£©£¬Ò²ÓÐÐí¶àµÄWEBÒ³ÃæÔÊÐí¶Ô¶à»ùÒò»òµ°°×ÖÊÐòÁеÄÊý¾Ý¿â±È½Ïµ°°×ÖÊ»òDNAÐòÁУ¬½ö¾Ù¼¸¸öÀý×Ó£º
•National Center for Biotechnology Information (USA) Searches
•European Bioinformatics Institute (UK) Searches
•BLAST search through SBASE (domain database; ICGEB, Trieste)
•»¹Óиü¶àµÄÕ¾µã
×î½üÐòÁбȽϵÄÖØÒª½øÕ¹ÊÇ·¢Õ¹ÁËgapped BLAST ºÍPSI-BLAST (position specific interated BLAST)£¬¶þÕß¾ùʹBLAST¸üÃô¸Ð£¬ºóÕßͨ¹ýÑ¡È¡Ò»ÌõËÑË÷½á¹û£¬½¨Á¢Ä£Ê½£¨profile£©£¬È»ºóÓÃÔÙËüËÑË÷Êý¾Ý¿âÑ°ÕÒÆäËûͬԴÐòÁУ¨Õâ¸ö¹ý³Ì¿ÉÒÔÒ»Ö±Öظ´µ½·¢ÏÖ²»ÁËеÄÐòÁÐΪֹ£©£¬¿ÉÒÔ̽²â½ø»¯¾àÀë·Ç³£Ô¶µÄͬԴÐòÁС£ºÜÖØÒªµÄÒ»µãÊÇ£¬ÔÚÀûÓÃÏÂÃæÕ½ڷ½·¨Ö®Ç°£¬Í¨¹ýPSI-BLAST°Ñµ°°×ÖÊÐòÁкÍÊý¾Ý¿â±È½Ï£¬ÕÒÑ°ÊÇ·ñÓÐÒÑÖª½á¹¹¡£
½«Ò»ÌõÐòÁкÍÊý¾Ý¿â±È½ÏµÄÆäËû·½·¨ÓУº
•FASTAÈí¼þ°ü (William Pearson, University of Virginia, USA)
•SCANPS (Geoff Barton, European Bioinformatics Institute, UK)
•BLITZ (Compugen's fast Smith Waterman search)
•ÆäËû·½·¨.
It is also possible to use multiple sequence information to perform more sensitive searches. Essentially this involves building a profile from some kind of multiple sequence alignment. A profile essentially gives a score for each type of amino acid at each position in the sequence, and generally makes searches more sentive. Tools for doing this include:
•PSI-BLAST (NCBI, Washington)
•ProfileScan Server (ISREC, Geneva)
•HMMER ÒþÂíÊÏÄ£ÐÍ£¨Sean Eddy£¬ Washington University£©
•Wise package £¨Ewan Birney£¬ Sanger Centre£»ÓÃÓÚµ°°×ÖʶÔDNAµÄ±È½Ï£©
•ÆäËû·½·¨.
A different approach for incorporating multiple sequence information into a database search is to use a MOTIF. Instead of giving every amino acid some kind of score at every position in an alignment, a motif ignores all but the most invariant positions in an alignment, and just describes the key residues that are conserved and define the family. Sometimes this is called a "signature". For example, "H-[FW]-x-[LIVM]-x-G-x(5)-[LV]-H-x(3)-[DE]" describes a family of DNA binding proteins. It can be translated as "histidine, followed by either a phenylalanine or tryptophan, followed by an amino acid (x), followed by leucine, isoleucine, valine or methionine, followed by any amino acid (x), followed by glycine,... [etc.]".
PROSITE (ExPASy Geneva) contains a huge number of such patterns, and several sites allow you to search these data:
•ExPASy
•EBI
It is best to search a few different databases in order to find as many homologues as possible. A very important thing to do, and one which is sometimes overlooked, is to compare any new sequence to a database of sequences for which 3D structure information is available. Whether or not your sequence is homologous to a protein of known 3D structure is not obvious in the output from many searches of large sequence databases. Moreover, if the homology is weak, the similarity may not be apparent at all during the search through a larger database.
One last thing to remember is that one can save a lot of time by making use of pre-prepared protein alignments. Many of these alignments are hand edited by experts on the particular protein families, and thus represent probably the best alignment one can get given the data they contain (i.e. they are not always as up to date as the most recent sequence databases). These databases include:
•SMART (Oxford/EMBL)
•PFAM (Sanger Centre/Wash-U/Karolinska Intitutet)
•COGS (NCBI)
•PRINTS (UCL/Manchester)
•BLOCKS (Fred Hutchinson Cancer Research Centre, Seatle)
•SBASE (ICGEB, Trieste)
ͨ³£°Ñµ°°×ÖÊÐòÁкÍÊý¾Ý±È½Ï¶¼ÓкܶàµÄ·½·¨£¬ÕâЩ¶ÔÓÚʶ±ð½á¹¹Óò·Ç³£ÓÐÓá£
[ Last edited by cnlics on 2010-9-14 at 19:54 ] |
|