SNP Markers and Genome Selection

Over the course of the last decade, molecular markers have revolutionized how we view and measure genetic diversity at the DNA level. Historically, the DNA marker of choice was the microsatellite or simple-sequence repeat marker because of its simple PCR-based assay and large numbers of alleles per locus. As reference genome sequencing has become routine, a radical shift in polymorphism detection was eminent. The new marker of choice, single nucleotide polymorphism (SNP), has taken polymorphism discovery and genotyping to a completely new level. In a typical grass species like maize, the level of genetic diversity is quite high (~1 substitution per 100 bp) (Tenaillon et al. 2001), and the genome complexity is largely a result of DNA rearrangements and the captured genome space in the reference contains ~70% or less of the species-wide genome space (Gore et al. 2009). As 2nd generation costs have declined, and multiplexing options increased, a new strategy to assess genetic diversity and develop SNP markers has transformed genotype-phenotype associations (trait mapping), germplasm characterization, and molecular breeding strategies (Elshire et al. 2011). The approach, termed Genotyping-by-Sequencing (GBS) or Genomic Selection essentially reduces the complexity of the genome through digestion with one or two methylation sensitive enzymes that maximizes the amount of fragmented gDNA in the 300bp range, indexed with Illumina barcodes, and sequenced in a multiplex fashion on the Illumina HiSeq. The resulting sequences are assembled bioinformatically to produce consensus sequences flanking restriction sites that can either be used from a de novo perspective or mapped to a reference genome for SNP discovery (Baird et al. 2008). With the promise as a bioenergy feedstock and urgent need for genome enablement, a GBS approach to explore genetic diversity has the potential to immediately increase the amenability of switchgrass breeding programs. A recent study conducted by Lu et al. (2013) applies GBS to 840 individuals generating a total of 350 GB of DNA sequence. Of particular importance from these authors is the development of a pipeline called Universal Network Enabled Analysis Kit (UNEAK) tailored to enable dense SNP discovery and genomic selection in genomes without reference assemblies. UNEAK removes terminal low quality bases at the ends of reads, reads are collapsed into tags, and pairwise alignment identifies tag pairs with single base mismatches as candidate SNPs (Lu et al. 2013). In large complex genomes like switchgrass, there is an additional filter that removes tags that pair as a result of repeats, paralogs, and errors (Lu et al. 2013). In switchgrass, the authors created a full-sib linkage population of 130 individuals, a half-sib linkage population with 168 individuals, and an association panel composed of 66 diverse populations and 540 individuals and after sequencing, identified ~1.2 million putative SNPs (Lu et al. 2013). An important finding through the deep genotyping efforts revealed that tetraploid switchgrass is similar to a diploid in genomic composition (Lu et al. 2013), but further genome analysis and a more comprehensive dataset through genome resequencing and reference mapping is necessary for corroboration. Through these efforts, the authors constructed a high-quality linkage map using 3,000 of the highest quality SNPs and placed into a context of the 18 chromosomes, also guided by synteny with foxtail millet. This resource will be invaluable in advancing the genome reconstruction efforts described above.