24小时热门版块排行榜    

查看: 117  |  回复: 1
当前主题已经存档。

anquren

金虫 (正式写手)

[交流] [转贴]Computational methods for identifying non-coding RNA in genomic sequences

Sequence similarities As mentioned previously several known ncRNA genes have already been found. By searching for similar sequences in other genomes more such sequences can be found. Several such projects have already been performed to locate tRNA genes (Lowe and Eddy, 1997), tmRNA genes (Zwieb et al., 1999) and snoRNA genes (Lowe and Eddy, 1999). This strategy does however lack generality and new families of ncRNA cannot be discovered. However, if any new ncRNA genes are found by other means, this strategy can be used to determine whether this is a new gene family or not. It is also useful to employ this strategy when working with new genomes to see whether they contain any of the known ncRNA genes.
Comparative genomics Sequences which code for the same protein in closely related organisms are conserved. This would also be the same for regions which code for ncRNA genes. Such genes could therefore be found by examining intergenic regions of closely related organisms and finding regions which are more conserved than the area surrounding them. Several such ``Comparative genomics'' projects have proved successful already (Rivas et al., 2001; Wassarman et al., 1999; Rivas and Eddy, 2001). This approach becomes especially attractive when considering the increase in available genomes.
Transcription signals ncRNA genes have to be transcribed to produce functional RNA, and are thus surrounded by sequences which regulate transcription. Specific sequences also help regulate translation from RNA to protein, these sequences should therefore not be present in ncRNA gene sequences. New candidate ncRNA genes could be found by searching for sequences which are transcribed, but not translated. Due to variations in transcription and translation signals between organisms, this approach has to be flexible as to what signal sequences to search for. Such methods have previously been used with success to locate ncRNA genes in E.coli (Argaman et al., 2001) and yeast (Olivas et al., 1997).
Statistical analysis By using statistics to analyse non-coding areas in genomes systematic variations can be found. Such variations can then be used to separate ncRNA genes from ``junk'' DNA. One such variation, the usage variation of the nucleotide pair CG in M.jannaschii (Schattner, 2002) has already helped find many more ncRNA in that organism. Other such variations could possibly used as a locating mechanism. One measure of variation could for instance be the Shannon entropy measure (Shannon, 1948). The entropy of a sequence indicates the amount of information available in that region. ncRNA genes should, since they are more ordered than other regions, show less entropy than ``junk'' DNA.
Combining criteria If several of the methods mentioned above indicate the presence of a ncRNA gene in a genomic sequence, this would strengthen the belief that the sequence does indeed comprise a ncRNA gene. It should therefore be possible to combine and evaluate the results from the different methods which are developed.
Verification of predicted ncRNA genes The accuracy of the methods proposed above can be assessed by analysing their ability to find already known ncRNA genes. New ncRNA genes which might be discovered can be verified by DNA hybridisation experiments (Northern blots). It would also be possible to do further laboratory studies on any ncRNA genes which seem especially interesting with the help of different groups within the institute.
回复此楼
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
相关版块跳转 我要订阅楼主 anquren 的主题更新
普通表情 高级回复 (可上传附件)
信息提示
请填处理意见