24小时热门版块排行榜    

查看: 20643  |  回复: 2
当前只显示满足指定条件的回帖,点击这里查看本话题的所有回帖

QQ894064647

新虫 (初入文坛)

[交流] TCGA(癌症和肿瘤基因图谱)数据下载和处理(TCGA-Assembler) 已有2人参与

国政府发起的癌症和肿瘤基因图谱(Cancer Genome Atlas,TCGA)计划,试图通过应用基因组分析技术,特别是采用大规模的基因组测序,将人类全部癌症(近期目标为50种包括亚型在内的肿瘤)的基因组变异图谱绘制出来,并进行系统分析,旨在找到所有致癌和抑癌基因的微小变异,了解癌细胞发生、发展的机制,在此基础上取得新的诊断和治疗方法,最后可以勾画出整个新型“预防癌症的策略”。
TCGA 使命:提高人们对癌症发病分子基础的科学认识及提高我们诊断、治疗和预防癌症的能力
TCGA 目标:完成一套完整的与所有癌症基因组改变相关的“图谱”。
TCGA(癌症和肿瘤基因图谱)数据下载和处理(TCGA-Assembler)
图1.png

    TCGA数据源大部分都是公开的,如何有效的进行收集和预处理是一个头疼的问题。今天我们讲解下怎么将TCGA的数据转化成癌症类型的二维数据矩阵(例如基因为rows,样本为columns)。得到这个矩阵之后,后面的事情就好办了,我们可以做差异表达,共表达网络,生存分析等。今天我们主要讲解如何下载TCGA的数据,大家对后续分析感兴趣的话,可以在加“生物信息培训+视频”裙,或者大家可以在掏宝搜索“生物信息视频”,跟我们联系。
    我们开始吧,我们可以使用TCGA-Assembler这软件去下载TCGA的数据http://www.compgenome.org/TCGA-Assembler/。TCGA-Assembler不但可以很方便的下载数据,还能对数据进行初始化处理,非常方便。下载完后,我们使用首先要安装一些依赖包。通过下面的命令:
install.packages(c("HGNChelper", "RCurl", "httr", "stringr", "digest", "bitops", dependencies=T)

     安装完了依赖包,我们进入刚才下载的TCGA-Assembler的目录,使用setwd(C:/Users/cloud/Desktop/TCGA-Assembler)设置TCGA-Assembler的目录为工作目录,接下来,我们就可以下载数据了。我们需要下载什么数据,就选择相应的脚本。具体脚本如下:

# Load module A functions.
source("Module_A.r";

# Download level-3 miRNA-seq data of six rectum adenocarcinoma (READ) samples
miRNASeqRawData = DownloadmiRNASeqData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda",
  saveFolderName = "./QuickStartGuide_Results/RawData/", cancerType = "READ",
  assayPlatform = "miRNASeq", inputPatientIDs = c("TCGA-EI-6884-01",
  "TCGA-DC-5869-01", "TCGA-G5-6572-01", "TCGA-F5-6812-01", "TCGA-AF-2689-11", "TCGA-AF-2691-11");

# Download level-3 DNA copy number data of six READ samples
CNARawData = DownloadCNAData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda",
  saveFolderName = "./QuickStartGuide_Results/RawData/", cancerType = "READ",
  assayPlatform = "genome_wide_snp_6", inputPatientIDs = c("TCGA-EI-6884-01",
  "TCGA-DC-5869-01", "TCGA-G5-6572-01", "TCGA-F5-6812-01", "TCGA-AF-2692-10", "TCGA-AG-4021-10");

# Download level-3 RNASeqV2 gene expression and exon expression data of six READ samples
RNASeqRawData = DownloadRNASeqData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName =
  "./QuickStartGuide_Results/RawData/", cancerType = "READ", assayPlatform = "RNASeqV2",
  dataType = c("rsem.genes.normalized_results", "exon_quantification", inputPatientIDs =
  c("TCGA-EI-6884-01", "TCGA-DC-5869-01", "TCGA-G5-6572-01", "TCGA-F5-6812-01", "TCGA-AG-3732-11",
  "TCGA-AG-3742-11");

# Download level-3 HumanMethylation27 data of six READ samples
Methylation27RawData = DownloadMethylationData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName =
  "./QuickStartGuide_Results/RawData/", cancerType = "READ", assayPlatform = "humanmethylation27",
  inputPatientIDs = c("TCGA-AG-3583-01", "TCGA-AG-A032-01", "TCGA-AF-2692-11", "TCGA-AG-4001-01",
  "TCGA-AG-3608-01", "TCGA-AG-3574-01");

# Download level-3 HumanMethylation450 data of six READ samples
Methylation450RawData = DownloadMethylationData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName =
  "./QuickStartGuide_Results/RawData", cancerType = "READ", assayPlatform = "humanmethylation450",
  inputPatientIDs = c("TCGA-EI-6884-01", "TCGA-DC-5869-01", "TCGA-G5-6572-01", "TCGA-F5-6812-01",
  "TCGA-AG-A01W-11", "TCGA-AG-3731-11");

# Download level-3 RPPA protein expression data of six READ samples
RPPARawData = DownloadRPPAData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName =
  "./QuickStartGuide_Results/RawData", cancerType = "READ", assayPlatform = "mda_rppa_core",
  inputPatientIDs = c("TCGA-EI-6884-01", "TCGA-DC-5869-01", "TCGA-G5-6572-01", "TCGA-F5-6812-01",
  "TCGA-AG-3582-01", "TCGA-AG-4001-01");  

# Download de-identified clinical information of READ patients
DownloadClinicalData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName =
  "./QuickStartGuide_Results/RawData", cancerType = "READ", clinicalDataType = c("patient", "drug", "follow_up");


运行上面的脚本,我们就能得到我们想要的结果了,假如我们需要下载adenocarcinoma的miRNA数据,我们可以使用。下载完后,我们就得到了adenocarcinoma的矩阵了(基因为rows,样本为columns)。

setwd(C:/Users/cloud/Desktop/TCGA-Assembler)
source("Module_A.r";
miRNASeqRawData = DownloadmiRNASeqData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda",
  saveFolderName = "./QuickStartGuide_Results/RawData/", cancerType = "READ",
  assayPlatform = "miRNASeq";
回复此楼

» 猜你喜欢

» 本主题相关商家推荐: (我也要在这里推广)

» 本主题相关价值贴推荐,对您同样有帮助:

已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

xingzhou823

木虫 (正式写手)

五道杠


小木虫: 金币+0.5, 给个红包,谢谢回帖
引用回帖:
2楼: Originally posted by htt12119 at 2015-10-17 09:29:57
原始数据怎么看?求助~

http://wenku.baidu.com/link?url= ... O7J9NHzBL_xnc1QCBRC
这里有介绍
你,长大了吗?~有目的的希望小木虫不要倒闭
3楼2015-11-12 11:48:08
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
查看全部 3 个回答

htt12119

新虫 (初入文坛)


小木虫: 金币+0.5, 给个红包,谢谢回帖
原始数据怎么看?求助~
2楼2015-10-17 09:29:57
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
普通表情 高级回复 (可上传附件)
信息提示
请填处理意见