24小时热门版块排行榜    

CyRhmU.jpeg
查看: 247  |  回复: 0

lyk0323

金虫 (正式写手)

[求助] 翻译求助

K-mer distribution analysis
With the aim to investigate whether the RAD sequencing was able to provide a representative and unbiased sample of the C. cardunculus genome, we compared the k-mers spectrum with other fully sequenced genomes. Moreover, we further investigated how CpG content correlate with the repetitive contents of the genome, as suggested by
Chor et al. . The frequency and distribution of 10-mers among the raw sequence and the assembled wild cardoon contigs were comparable to one another (Figure 5A). K-mers lacking CpG dinucleotides were over-represented in the more repetitive portion of the spectra (i.e. their distribution was right-skewed), while those bearing at least one CpG produced a more left-shifted distribution (Figure 5A). Results were confirmed by negative controls through the adoption of random dinucleotides, which did not show any preferential distributions of K-mers (Additional file 4). This outcome is consistent with the known correlation of CpG methylation with the repression of transposable elements . A comparative study of other plant genomes showed that the V. vinifera genome has a higher frequency of zero-CpG K-mers (Figure 5C) than that of A. thaliana (Figure 5B), but that the Fragaria vesca K-mer distribution (Figure 5D) was rather similar to that obtained in C. cardunculus (Figure 5A). To futher investigate these trends, CpG rates  across the four dicot species were compared. While the CpG rate in the C. cardunculus RAD dataset was 0.53, 0.72 was calculated for A. thaliana, 0.43 for V. vinifera and 0.61 for F. vesca genomes. Furthermore, the A. thaliana genome includes a 14% presence of repetitive elements , that in V. vinifera is 41% , and that in F. vesca 22% . Variations
in CpG rates showed to be congruent with data derived from K-mer spectra analysis, since genomes harbouring higher rates of CpG reported less repetitive K-mer populations. This suggests a key contribution of DNA methylation in the inhibition of genome expansion due to repetitive element proliferation.


Altogether, our data suggest that the RAD procedure, despite its use of GC-rich recognition sites, has produced a random representation of the C. cardunculus genome, and shows that it represents a reliable means of assessing genome complexity.


SNP calling and classification
The paired ends generated for each mapping parent were aligned based on the reference contig set. This alignment detected 33,784 sequence variants, including 1,520 short indels, scattered over 12,068 contigs (’CcRAD1’ dataset, Additional file 5). The overall SNP frequency was estimated to be 5.6 per 1,000 nucleotides, a level which is almost identical to that found in the non-coding regions of the V. vinifera genome (5.5 per 1,000 nucleotides)  and very similar to that uncovered among Citrus spp. ESTs (6.1 per 1,000 nucleotides) . The estimation of SNP frequency using such high throughput sequencing
data is, however, heavily dependent both on the number of genomes sampled, and on the extent (if any) of targeting and of genome coverage. The efficiency of SNP discovery was correlated with the length of the RAD tags (Figure 2). Contigs longer than 400 bp were associated with a 74% probability of finding at least one SNP, while this probability fell to 62% for contigs shorter than 400 bp. Setting as a criterion the need to identify SNPs informative for both mapping populations reduced the dataset size to 17,450 sequence polymorphisms distributed over 7,478 contigs (‘CcRAD2’ dataset, Additional file 6); of these, 16,727 were SNPs, and 723 were 1 or 2 nt indels. Some 57% of the contigs contained more than one polymorphic site, and non bi-allelic variants occurred at 959 sites. The number of heterozygous SNP loci was 1,235 in the globe artichoke parent, 2,868 in the cultivated cardoon and 5,069 in the wild cardoon. The loci were classified
into those expected to segregate in a 1:1 ratio (“testcross markers”), and those in a 1:2:1 ratio (“intercross markers”) (Table 1, Additional file 6). The lower number of reads generated from the globe artichoke template produced an under-representation of testcross markers, compared to the levels of informativeness observed previously for other marker types . Moreover, genetic diversity across the three taxa might be responsible for taxon-specific RAD tags due to the absence of PstI restriction sites. In the final dataset (“fully informative” SNP sites, Additional file 6), the proportion of contigs
including more than one informative marker was 26%.

» 猜你喜欢

赚钱娶媳妇儿
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
相关版块跳转 我要订阅楼主 lyk0323 的主题更新
信息提示
请填处理意见