Как выбрать гостиницу для кошек
14 декабря, 2021
With the per base cost of sequencing on the rapid decline, draft genome sequencing endeavors are becoming more and more feasible in complex eukaryotes, and a high quality sequence-based reconstruction of an organisms genome is an invaluable tool to deciphering the underlying biology and performing intergenome comparisons at the population level or between species. As of this writing, there are 41 plant genera that have a reference sequence deposited on the phytozome genome browser (www. phytozome. net) (Goodstein et al. 2012, Table 1), with only two reported attempts at polyploid genomes such as bread wheat (Brenchley et al. 2012) and switchgrass (unpublished). As of this writing, a Panicum virgatum version 0.0 preliminary release of genotype AP13 is available via phytozome (www. phytozome. net/panicumvirgatum. php). The draft genome dataset consists of 15-fold coverage of the estimated 1.6Gbp genome size as a contig only dataset (summarized in Table 2) that consists of approximately 1,358 Mbp arranged in ~410k contigs (N50 of 4.2kb-83,229 contigs). 65,878 protein coding loci were identified, with 4,193 suspected with splice variation. A subset of contigs that aligned to the Setaria italica (Foxtail millet) coding sequence were aligned to the foxtail millet genome and referenced as such on the phytozome site. These data represent the first de novo assembly of the switchgrass genome and certainly provide a conduit for gene discovery and analysis of the effective gene space of the switchgrass genome, yet underscore the need for new technologies and approaches for deciphering large and complex genomes. Recent reports of hybrid 2nd and 3rd generation sequencing technologies such as Illumina’s HiSeq and Pacific Biosciences RS molecule sequencer (PacBio correction and assembly) (Koren et al. 2012) suggest that longer, accurate reads are becoming possible and scaffolding and super-scaffolding efforts can be augmented by this approach. However, in a polyploid situation, short — reads resulting from the Illumina sequencer may not necessarily accurately correct a long read (5-10kb) from a PacBio molecule sequencer with the correct sub-genome placement in regions of the genome that share high sequence identity. The approach may not be sensitive enough to detect the difference between sequencing errors and subgenome specific SNPs. Another emerging technology, named Moleculo where genomic DNA is fragmented into 10kb segments, clonally amplified, sheared and marked with a unique barcode and sequenced with the Illumina technology (http: // www. illumina. com/technology/moleculo-technology. ilmn), and assembled with proprietary bioinformatics creates long, synthetic reads. This approach holds promise for accurate reconstruction of longer reads and better chance of proper subgenome placement. A more costly, but traditional approach is to pursue a physical mapping approach and minimal tile path sequencing
Table 1. Plant genomes sequenced to date
|
Table 2. Current status of the switchgrass genome initiative
|
using the hierarchal BAC-by-BAC approach supplemented with a mix of 2nd generation sequencing. With this approach, it would be prudent to assess the ability to readily separate homeologous genomic segments. The future of the reference genome sequence for switchgrass is uncertain, but as sequencing and advanced capture technologies evolve, we will be better positioned to unravel and understand more about the composition and arrangement of the switchgrass genome.