Microalgal Genome Projects

The nuclear genome of C. reinhardtii was published in 2007 (Merchant et al. 2007) after the first wave of next-generation sequencing (NGS) became commercially available in 2005 (Margulies et al. 2005; Shendure et al. 2005). Nevertheless, the Chlamydomonas genome was sequenced through a conventional shotgun sequencing and assembly pipeline with 13X coverage (Merchant et al. 2007). Following the completion of Chlamydomonas genome sequencing, Thalassiosira pseudonana, a diatom, was the first eukaryotic marine alga that was sequenced (Bowler et al. 2008). A draft genome sequence of Nannochloropis gaditana was also made available in 2012 (Radakovits et al. 2012).

The continued development of NGS platforms, among them Illumina, and Ion Torrent semiconductor sequencing in the main stream, have brought down the time, effort, and cost of genome sequencing well beyond the exponential drop predicted by Moore’s law (Moore 1998). This enabled the sequencing of many new algal genomes (Fig. 10.2). The main limitation of NGS has been that the relatively short read length (50-500 bp) introduces inaccuracy in the assembly of sequences (Zhang et al. 2011). Furthermore, the high demand on bioinformatics analysis due to the increased data volume by several orders of magnitudes (Morey et al. 2013) intro­duces challenges in the use of NGS, particularly when the investigators do not have access to high performance computing infrastructure and appropriate bioinformatics support. Third-generation sequencing (TGS) technologies are being developed to address these problems. For instance, single molecule real-time (SMRT) sequenc­ing makes the whole genome sequencing of single cells from uncultivable organ­isms possible (Schadt et al. 2010). Many more TGS technologies are expected to be on the way. Moreover, user-friendly software such as the CLC Genomics Work­bench (CLC bio, a QIAGEN Company, Denmark) is enabling investigators to carry out genome assembly without the need of high performance computers or dedicated informatics specialists.

While advances in technology will ultimately lead to the generation and in-depth analysis of sequenced genomes, this would still be an initial step to be

Fig. 10.2 Phylogenetic tree representing algal species with available genome sequences or ongoing genome sequencing projects. Data presented are available at the NCBI genome database (http://www. ncbi. nlm. nih. gov/) and the AlgaeBASE website (http://www. algaebase. org/)

complemented by transcriptomic, proteomic, and metabolic analysis in order to reach a better understanding of the system per se. The integration of all of these levels of analyses, compiling them into a predictive model, and describing the interactions between their respective components, is in fact the main feat of systems biology. In such endeavors, metabolic network models occupy a central and key position in advancing bioproduct optimization.