Genomics

Advances in the fields of genomics and metagenomics have dramatically re­vised our view of microbial biodiversity and the potential for biotechnological applications. In the last decade the revolution in computer processor speeds, memory storage capability, and expanding networks has made possible the large scale sequencing of genomes and management of large integrated databases over the Internet. Since the first microorganism, Haemophilus in­fluenzae, was sequenced in 1995, genome sequencing initiatives have resulted in over 300 sequenced organisms, including 27 archaeal, 337 bacterial, and 41 eukaryotic genomes. As of July 2006, more than 1500 prokaryotic and eukaryotic genome sequencing projects are underway [70,71]. The genome sequences of Escherichia coli and Saccharomyces cerevisiae were not only among the first to be published, but were also the first of wide relevance for the production of industrial biochemicals, including bioethanol. Given that the genome of a particular microorganism, following annotation, provides the theoretical enzyme reaction set, it serves as a preferred starting point for engineering metabolic pathways that will lead to significantly improved titer, yield, productivity, and performance of a microorganism [62].

Annotated genomes certainly compliment experimental designs; however, the design space that can be considered by visual inspection or classical hy­pothesis driven experimentation is limited given the high degree of connec­tivity of the metabolic network. Modifying a given enzyme or metabolite pool is likely to elicit a multilayered regulatory response that not only mitigates the original perturbation, but will shift the equilibrium of other enzymes, metabolite pools, or signalling pathways. To a large extent, this is why ran­dom mutagenesis approaches have been favored over targeted approaches, until recently. The first genome-scale in silico metabolic network model for E. coli was made available in 2000 and was among the first to demonstrate consistency between modeling predictions and in vivo physiology [72,73]. Specifically, the model was used to explore the relationship between acetate, succinate, and oxygen uptake rates when attempting to maximize growth rate, to confirm the hypothesis that E. coli under acetate and succinate car­bon limitations regulates its metabolic network to maximize growth rate. For industrial biotechnology process development, it is often desirable to shift carbon flux from biomass to product formation, thereby maximizing the yield of product on substrate.

The first eukaryotic genome-scale metabolic model was reported in S. cere — visiae in 2003 based on its annotated genome sequence and a thorough examination of online pathway databases, biochemistry textbooks, and jour­nal publications [74]. This genome-scale in silico model, by using a relatively simple synthetic medium, could predict 88% of the growth phenotypes cor­rectly, indicating that this model in many cases can predict cellular behavior. In one step further, Duarte et al. (2004) [74] used the S. cerevisiae genome — scale metabolic network constructed by Forster et al. (2003) [75] to generate a phenotypic phase plane (PhPP) analysis that describes yeast’s metabolic states at various levels of glucose and oxygen availability. Examination of the S. cerevisiae PhPP has led to the identification of two lines of optimal­ity: LOgrowth, which represents optimal biomass production during aerobic, glucose-limited growth, and LOethanol, which corresponds to both maximal ethanol production and optimal growth during microaerobic conditions. The predictions of the S. cerevisiae PhPP and genome scale model were compared to independent experimental data, and the results showed strong agreement between the computed and measured specific growth rates, uptake rates, and secretion rates. Thus, genome-scale in silico models can be used to system­atically reconcile existing data available for S. cerevisiae, particularly now that yeast resources, databases, and tools for global analysis of genomic data have been expanded and made publicly available, such as the Saccharomyces Genome Database [70,71].

Another major challenge of current biotechnology, especially in the lignocellulose-to-ethanol process, is to identify novel biocatalysts and en­zymes for enzymatic hydrolysis from the genomes of organisms and metage­nomic libraries. A large number of protein sequences deduced from the genomes of industrial microorganisms have no assigned function, as they exhibit low (or no) homology to the enzymes and/or proteins functionally characterized thus far [76]. The demand for identification of novel biomass­degrading enzymes as well as for heterologous protein production at higher efficiencies and reduced costs has catalyzed an interest in elucidating the genomic sequence of Trichoderma reesei — the most prolific producer of biomass-degrading enzymes. Diener et al. (2004) [77] has described the creation of a T. reesei strain QM6A large-insert BAC (bacterial artificial chro­mosome) library and its subsequent analysis, which was successfully used to identify both biomass degradation and secretion related genes. These data re­vealed the utility of a BAC library for use in assembly of the T. reesei genome and isolation of genomic sequences of industrial interest.

Even though the above study represents a direct application of sequenc­ing technology for identification of novel biomass-degrading enzymes, it is also often the case to use such high-throughput experimental techniques to elucidate mechanistic understanding of enzymes derived from random, nat­ural selective pressures. The research of Foreman et al. (2003) [78] using

T. reesei RL-P37, a strain that has been selected for improved production of

cellulolytic enzymes [79], is such an example. The mutation(s) that improved cellulase production concurrently improved the inducible expression of ancil­lary genes that do not have a known function in cellulose degradation. These results suggest significant regulatory points of convergence across the spec­trum of cellular processes involved in carbon sensing, signal transduction, and transcriptional regulation. These findings will likely have significant im­plications for the design of industrial processes for commercial production of biomass-degrading enzymes.

In conclusion, the vastly improved computational capability integrated with large-scale miniaturization of biochemical techniques such as BAC, PCR, and microarray chips has delivered significant amounts of genomic data to researchers all over the world [80]. This availability of data has led to an ex­plosion of genome analysis leading to many new discoveries and tools that are not possible in exclusively wet-lab experiments.

It is evident from the above applications of genomics coupled to in silico modeling that industrial biotechnology, and especially bioethanol produc­tion, can benefit from this technology platform both in the identification of metabolic engineering target genes to improve yields, titer, and productivity, and in the discovery of novel enzymatic catalysts. This is further reinforced by the various case studies to be presented in subsequent chapters, including the role genomics has played in the identification of thermostable cellulases, metabolic engineering for pentose and xylose utilization in S. cerevisiae and Pichia stipitis, development of ethanologenic bacteria, and development of Z. mobilis for bioethanol production.

3.2