| 查看: 292 | 回复: 0 | ||
lyk0323金虫 (正式写手)
|
[求助]
翻译求助
|
|
K-mer distribution analysis With the aim to investigate whether the RAD sequencing was able to provide a representative and unbiased sample of the C. cardunculus genome, we compared the k-mers spectrum with other fully sequenced genomes. Moreover, we further investigated how CpG content correlate with the repetitive contents of the genome, as suggested by Chor et al. . The frequency and distribution of 10-mers among the raw sequence and the assembled wild cardoon contigs were comparable to one another (Figure 5A). K-mers lacking CpG dinucleotides were over-represented in the more repetitive portion of the spectra (i.e. their distribution was right-skewed), while those bearing at least one CpG produced a more left-shifted distribution (Figure 5A). Results were confirmed by negative controls through the adoption of random dinucleotides, which did not show any preferential distributions of K-mers (Additional file 4). This outcome is consistent with the known correlation of CpG methylation with the repression of transposable elements . A comparative study of other plant genomes showed that the V. vinifera genome has a higher frequency of zero-CpG K-mers (Figure 5C) than that of A. thaliana (Figure 5B), but that the Fragaria vesca K-mer distribution (Figure 5D) was rather similar to that obtained in C. cardunculus (Figure 5A). To futher investigate these trends, CpG rates across the four dicot species were compared. While the CpG rate in the C. cardunculus RAD dataset was 0.53, 0.72 was calculated for A. thaliana, 0.43 for V. vinifera and 0.61 for F. vesca genomes. Furthermore, the A. thaliana genome includes a 14% presence of repetitive elements , that in V. vinifera is 41% , and that in F. vesca 22% . Variations in CpG rates showed to be congruent with data derived from K-mer spectra analysis, since genomes harbouring higher rates of CpG reported less repetitive K-mer populations. This suggests a key contribution of DNA methylation in the inhibition of genome expansion due to repetitive element proliferation. Altogether, our data suggest that the RAD procedure, despite its use of GC-rich recognition sites, has produced a random representation of the C. cardunculus genome, and shows that it represents a reliable means of assessing genome complexity. SNP calling and classification The paired ends generated for each mapping parent were aligned based on the reference contig set. This alignment detected 33,784 sequence variants, including 1,520 short indels, scattered over 12,068 contigs (’CcRAD1’ dataset, Additional file 5). The overall SNP frequency was estimated to be 5.6 per 1,000 nucleotides, a level which is almost identical to that found in the non-coding regions of the V. vinifera genome (5.5 per 1,000 nucleotides) and very similar to that uncovered among Citrus spp. ESTs (6.1 per 1,000 nucleotides) . The estimation of SNP frequency using such high throughput sequencing data is, however, heavily dependent both on the number of genomes sampled, and on the extent (if any) of targeting and of genome coverage. The efficiency of SNP discovery was correlated with the length of the RAD tags (Figure 2). Contigs longer than 400 bp were associated with a 74% probability of finding at least one SNP, while this probability fell to 62% for contigs shorter than 400 bp. Setting as a criterion the need to identify SNPs informative for both mapping populations reduced the dataset size to 17,450 sequence polymorphisms distributed over 7,478 contigs (‘CcRAD2’ dataset, Additional file 6); of these, 16,727 were SNPs, and 723 were 1 or 2 nt indels. Some 57% of the contigs contained more than one polymorphic site, and non bi-allelic variants occurred at 959 sites. The number of heterozygous SNP loci was 1,235 in the globe artichoke parent, 2,868 in the cultivated cardoon and 5,069 in the wild cardoon. The loci were classified into those expected to segregate in a 1:1 ratio (“testcross markers”), and those in a 1:2:1 ratio (“intercross markers”) (Table 1, Additional file 6). The lower number of reads generated from the globe artichoke template produced an under-representation of testcross markers, compared to the levels of informativeness observed previously for other marker types . Moreover, genetic diversity across the three taxa might be responsible for taxon-specific RAD tags due to the absence of PstI restriction sites. In the final dataset (“fully informative” SNP sites, Additional file 6), the proportion of contigs including more than one informative marker was 26%. |
» 猜你喜欢
26/27申博自荐
已经有9人回复
售SCI一区T0P文章,我:8.O.5.5.1.O.5.4,科目齐全,可+急
已经有3人回复
售SCI一区T0P文章,我:8.O.5.5.1.O.5.4,科目齐全,可+急
已经有7人回复
河北省自然科学基金
已经有7人回复
揭秘青基评审内幕:几个A才能顺利中标
已经有4人回复
青B发送上会通知了吗
已经有7人回复
博士申请
已经有3人回复
某211大学教师把个人教师官方主页改成:我跑了我跑了我跑了!官宣跑路!
已经有4人回复
今年审到国自然15份,谈谈感受
已经有28人回复
投稿求助,期刊
已经有8人回复













回复此楼
10