版块导航: 正在加载中...

登录注册

应《网络安全法》要求，自2017年10月1日起，未进行实名认证将不得使用互联网跟帖服务。为保障您的帐号能够正常使用，请尽快对帐号进行手机号验证，感谢您的理解与支持！

24小时热门版块排行榜

>论坛更新日志 (5184)
>考研 (1502)
>导师招生 (1000)
>虫友互识 (125)
>休闲灌水 (94)
>文献求助 (91)
>硕博家园 (90)
>考博 (86)
>论文投稿 (40)
>基金申请 (34)
>招聘信息布告栏 (27)
>公派出国 (27)
>材料综合 (22)
>教师之家 (21)
>博后之家 (20)
>找工作 (12)

返回列表

【奖励】本帖被评价2次，作者zhq025增加金币 2 个

当前主题已经存档。

zhq025

金虫 (小有名气)

应助: 0 (幼儿园)
金币: 760.5
帖子: 203
在线: 44分钟
虫号: 468923

[资源] 【转贴】Perl Programming for Biologists（已搜索无重复）

Introduction

Molecular biology is a study in accelerated expectations.
In  1973,  the  ﬁrst  paper  reporting  a  nucleotide  sequence  derived  directly
from the DNA was reported. During the late 1970s, a graduate student could
earn  a  Ph.D.  and  publish  multiple  papers  in  Science,  Cell,  or  any  number
of  respected  journals  by  performing  the  astonishing  task  of  sequencing  a
gene – any gene. By 1982, DNA sequencing had become straightforward enough
that any well-equipped laboratory could clone and sequence a gene, providing
they had a copy of Molecular Cloning: A Laboratory Manual. By 1990, simply
sequencing a gene was considered sufﬁcient for only a master’s degree, and
most journals considered the sequence of a gene to be only the starting point
for a scientiﬁc paper. The last sequencing-only paper published was the full
genomic  sequence  of  an  organism.  By  1995,  the  majority  of  journals  had
stopped publishing sequence data completely. In 1999, mid-way through the
Human Genome Sequencing Project, approximately 1.5 megabases of human
genomic sequence were being deposited in GenBank monthly, and by the end
of  2001  there  were  almost  15 billion  bases  of  sequence  information  in  the
databases, representing over 13 million sequences.
Bioinformatics, by necessity, is following the same growth curve.
Once  a  rariﬁed  realm,  computers  in  biology  have  become  common  place.
Almost  every  biology  lab  has  some  type  of  computer,  and  the  uses  of  the
computer  range  from  manuscript  preparation  to  Internet  access,  from  data
collection to data crunching. And for each of these activities, some form of
bioinformatics is involved.
The ﬁeld of bioinformatics can be split into two broad ﬁelds: computational
biology and analytical bioinformatics. Computational biology encompasses the
formal algorithms and testable hypotheses of biology, encoded into various
programs. Computational biologists often have more in common with people
in the campus computer science department than with those in the biology
department,  and  usually  spend  their  time  thinking  about  the  mathematics
of  biology.  Computational  biology  is  the  source  of  the  bioinformatic  tools
like  BLAST  or  FASTA,  which  are  commonly  used  to  analyze  the  results  of
experiments.
If computational biology is about building the tools, analytical bioinformatics
is about using those tools. From sequence retrieval from GenBank to performing
an analysis of variance regression using local statistical software, nearly every
biological researcher does some form of analytical bioinformatics. And just as
DNA sequencing has turned into a Red Queen pursuit, every biology researcher
has to perform more and more analytical bioinformatics to keep up.
Fortunately, keeping up is not as hard as it used to be. The explosion of the
Internet and the use of the World Wide Web (WWW) as a means of accessing
data and tools means that most researchers can keep up simply by updating the
bookmarks ﬁle of their favorite browser. In itself, this is no mean feat – Internet
research skills can be tricky to acquire and even trickier to understand how to
use properly. Still, there is a way to go further: one can begin to manipulate the
data returned from conventional programs.
Data manipulation can usually be done in spreadsheets and databases. Indeed,
these two types of programs are indispensable in any laboratory,  especially
those quite sophisticated in analytical bioinformatics. But to take the ﬁnal step
to truly exploit data analysis tools, a researcher needs to understand and be
able to use a scripting language.
A scripting language is similar in most ways to a programming language.
The user writes computer code according to the syntactic conventions of the
language, and then executes the result. However, a scripting language is typically
much  easier  to  learn  and  utilize  than  a  traditional  programming  language,
because many of the common functions people use have already been created
and stored. Additionally, most scripting languages are interpreted (turned into
binary  computer  instructions  on  the  ﬂy)  rather  than  compiled  (turned  into
binary computer instructions once), so that scripts development is generally
quicker and the scripts themselves are more portable.
Of course, there is always a price to pay for things being easier, and in the case
of scripting languages, the major price is speed. Scripting languages typically
take longer to execute than compiled code. But, except for the most extreme
cases, the trade-off for ease of use over speed is quite acceptable, and might
not even be noticeable on the faster computers available today.
The Perl programming language is probably the most widely used scripting
language in bioinformatics. A large percentage of programs are written in Perl,
and many bioinformatists cut their programming teeth using Perl. In fact, the
most common advice heard by aspiring bioinformatists is "go learn Perl."
In part, Perl is a popular language because it is less structured than traditional
programming languages. With fewer rules and multiple ways to perform a task,
Perl is a language that allows for fast and easy coding. For the same reasons,
it is an easier language to learn as a ﬁrst programming language. But the very
ease of using Perl is a bit of a trap: it is quite easy to make simple mistakes that
are difﬁcult to catch.
But there are strong reasons to learn and use Perl. The language was orig-
inally created for parsing ﬁles and quickly creating formatted reports. Larry
Wall, the author of Perl, claims the name stands for ‘‘Practical Extraction and
Reporting Language’’ (but he acknowledges that the name could just as easily
stand for ‘‘Pathologically Eclectic Rubbish Lister’’) and the language is perfect
for rummaging through ﬁles looking for a particular pattern of characters, or
for reformatting data tables. The program has a very powerful regular expres-
sion capability for pattern matching, as well as built-in ﬁle manipulation and
input/output (I/O) piping mechanisms. These abilities have proven invaluable
for  bioinformatics,  where  we  are  often  looking  for  motifs  within  sequences
(pattern-matching) or rearranging one database format into another.
The biggest use of Perl is the quick and dirty creation of small analysis pro-
grams. Nearly every bioinformatist has written a program to parse a nucleotide
sequence into the reverse complement sequence. Similarly, a great many people
use small Perl scripts to read disparate data ﬁles and parse the relevant data
into  a  new  format.  This  usage  is  so prevalent  that  the  term "glutility"  was
coined by Sam Cartinhour for scripts that take the output of one program (like
BLAST, for example) and change it into a form suitable for import into another
program (like ClustalW). Finally, with the advent of the WWW, Perl has become
the language of choice to create Common Gateway Interface (CGI) scripts to
handle form submissions and create compute servers on the WWW.
The purpose of this book is to teach you Perl programming. What sets this
book apart from most Perl language books is 1) the assumption that you’ve
never had any formal training in programming, and 2) the examples are geared
toward  real  problems  biologists  face,  so  you  don’t  have  to  either  learn  an
entirely new concept to understand the example or wrestle with an example
that is generic and difﬁcult to extrapolate into the real world of the laboratory.
At the conclusion of the book, you should be able to write a script to ﬁx the
clone library preﬁx that your summer student mistyped on every line of the
spreadsheet, or to scan a Fasta sequence ﬁle for every occurrence of an EcoRI
site. Moreover, you’ll be able to write reusable and maintainable scripts so you
don’t have to rewrite the same piece of code over and over. Additionally, you’ll
be able to look at other people’s scripts and adapt them to your own purposes.
After all, to quote Larry Wall, the creator of Perl, ‘‘For programmers, laziness is
a virtue.’’

Download link：http://www.isload.com.cn/store/u4qnppx23wxoh

[ Last edited by 2007骑猪逛街 on 2008-1-16 at 16:30 ]

回复此楼

» 猜你喜欢

材料284求调剂，一志愿郑州大学英一数二已经有5人回复
0856材料专业298分有科研经历硕士研究生调剂自荐信已经有3人回复
317一志愿华南理工电气工程求调剂已经有11人回复
高分子化学与物理调剂已经有9人回复
寻找调剂已经有4人回复
311求调剂已经有10人回复
材料学调剂已经有8人回复
材料化工调剂已经有5人回复
291分工科求调剂已经有8人回复
0856化工专硕求调剂已经有5人回复

1楼 2008-01-08 10:58:58

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

agri521

木虫 (著名写手)

应助: 0 (幼儿园)
金币: 4785.7
帖子: 1901
在线: 102.9小时
虫号: 45319

★★★★★ 五星级,优秀推荐

我感觉这应该发到农林版，谢谢。

赞一下

回复此楼

2楼2008-01-13 09:58:58

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

诸葛曹操

金虫 (著名写手)

应助: 1 (幼儿园)
金币: 3943.5
帖子: 1374
在线: 468小时
虫号: 21502

★★★★★ 五星级,优秀推荐

生物信息学阿发到这里应该是合适的

赞一下

回复此楼

3楼2008-01-16 09:11:12

已阅回复此楼关注TA 给TA发消息送TA红花 TA的回帖

相关版块跳转我要订阅楼主 zhq025 的主题更新

返回列表

☆ 无星级 ★ 一星级 ★★★ 三星级 ★★★★★ 五星级

普通表情龙兔虎猫高级回复 (可上传附件)

最具人气热帖推荐 [查看全部]		作者	回/看	最后发表

[考研] 0856材料专业298分有科研经历硕士研究生调剂自荐信 +3	zyf上岸 2026-03-01	3/150	2026-03-01 11:21 by gaoxiaoniuma
[考研] 高分子化学与物理调剂 +5	好好好1233 2026-02-28	9/450	2026-03-01 10:59 by fengyu211
[考研] 材料学调剂 +6	提神豆沙包 2026-02-28	8/400	2026-03-01 10:53 by sunny81
[硕博家园] 2025届双非化工硕士毕业，申博 +3	更多的是 2026-02-27	4/200	2026-03-01 10:04 by ztg729
[考研] 0856求调剂285 +6	吕仔龙 2026-02-28	6/300	2026-03-01 10:03 by wang_dand
[考研] 290求调剂 +6	材料专硕调剂； 2026-02-28	7/350	2026-03-01 09:21 by L135790
[考研] 272求调剂 +4	材紫有化 2026-02-28	4/200	2026-03-01 09:20 by L135790
[论文投稿] 求助coordination chemistry reviews 的写作模板 10+3	ljplijiapeng 2026-02-27	4/200	2026-03-01 09:07 by babero
[考研] 272求调剂 +4	田智友 2026-02-28	4/200	2026-03-01 06:43 by 刘兵
[考研] 285求调剂 +6	满头大汗的学生 2026-02-28	6/300	2026-03-01 06:29 by Trying]
[考研] 材料调剂 +4	爱擦汗的可乐冰 2026-02-28	4/200	2026-03-01 00:38 by 猫猫球alter
[考研] 304求调剂 +3	52hz~~ 2026-02-28	5/250	2026-03-01 00:00 by 52hz~~
[考研] 化工专硕348，一志愿985求调剂 +4	弗格个 2026-02-28	6/300	2026-02-28 22:00 by wang_dand
[考研] 292求调剂 +3	yhk_819 2026-02-28	3/150	2026-02-28 21:57 by gaoxiaoniuma
[考研] 264求调剂 +3	巴拉巴拉根556 2026-02-28	3/150	2026-02-28 21:31 by gaoxiaoniuma
[考研] 298求调剂 +8	人间唯你是清欢 2026-02-28	11/550	2026-02-28 20:26 by L135790
[考研] 0856材料求调剂 +10	hyf hyf hyf 2026-02-28	11/550	2026-02-28 18:50 by 无际的草原
[考研] 265分求调剂不调专业和学校有行学上就 +4	礼堂丁真258 2026-02-28	6/300	2026-02-28 16:18 by 求调剂zz
[考研] 0856调剂 +3	刘梦微 2026-02-28	3/150	2026-02-28 13:22 by houyaoxu
[考研] 304求调剂 +5	曼殊2266 2026-02-28	6/300	2026-02-28 12:44 by 迷糊CCPs