Genomic Approaches for Identification of Novel Glycoside Hydrolases

Lignocellulose-degrading microbes are found all over our planet, as free-living organisms and in the microbiomes of invertebrates and vertebrates. Genome sequencing of lignocellu- lose-degrading microbes is being used to reveal the relevant molecular components for optimal cellulose degradation in these microorganisms via sequence similarities to CAZyme components from other organisms. To date, there are at least 25 different microbes for which
genome projects are either in progress or completed (www. genomesonline. org/). Interestingly, based on the genome projects, there appears to be many different paradigms for lignocellu — lose-degradation, and each of these bacteria seem to have evolved organism-specific modes of plant cell wall deconstruction, which, while very efficient, are distinctly different.

This has led to comparative genome efforts, one of which is expression profiling. The idea is to monitor changes in gene expression in response to exposure to different plant cell wall substrates. This combined genomic and proteomic approach is needed to under­stand the regulation and assembly of this remarkable cadre of CAZyme and cellulosome components, and to identify those CAZymes that have maximal degradative capacity against a given substrate. To date, this approach has been used for two cellulosome — containing organisms, Clostridium thermocellum (Brown et al. 2007) and Ruminococcus flavefaciens (Berg et al. 2006). These functional and proteomic approaches can define candidate enzymes and are necessary for maximal lignocellulose degradation. Such func­tional and comparative genomics approaches are essential for defining how lignocellulose sources affect microbial-borne gene families and the rate and extent of lignocellulose degradation.

In the next few years, the powerful approach of comparative genomics will enable rapid advances. This is primarily due to the recent development of nexfigeneration sequencing technologies, which have dramatically reduced the time, cost, and labor for genome sequenc­ing projects. Pyrosequencing (also called 454 sequencing) was originally developed in the mid 1990s (Ronaghi et al. 1996, 1998) and has been continuously developed since then, and has become widely used in genome sequencing projects. The elimination of cloning vectors and their associated biases in terms of the clonability of certain DNA fragments is a major advantage in using this system (Hyman 1988; Ronaghi et al. 1996, 1998; Margulies et al. 2005) . This sequencing technology also readily reads through secondary structures, and has the capacity to produce very large amounts of sequence. Current esti­mates from the latest version called “Titanium” suggest that read lengths with an average length of 400 bp and a five) fold throughput increase to 400-600 million bp per run for approximately $12,000.

Other next generation of sequencing technologies also include the Solexa/Illumina 1G Genome Analysis System and Applied Biosystems SOLiD Sequencing. While presently average read lengths are much shorter than those obtained from the traditional methods, a far higher number of sequence reads can be produced in a single day or on a single run by these technologies. It is not unreasonable to predict that these next-generation technologies will eventually generate as good as or even longer read lengths than some of the traditional methods. Furthermore, there are additional “next-generation” technologies that will be released in the near term, including those from Helicos (www. helicosbio. com) and Complete Genomics (www. completegenomics. com).

These cost-effective next-generation sequence technologies will allow the generation of huge reference genome databases where one will now sequence up to 10 isolates of a micro­bial species for use in comparative studies. This approach was recently pioneered for micro­organisms from the human gastrointestinal tract and their glycoside hydrolase components (Lozupone et al. 2008). Analysis of 67 microbial genomes from the human gastrointestinal revealed that the CAZyme repertories found in these microbes had converged due to hori­zontal gene transfer, with limited evolution of the gene families. This implies that the envi­ronment can drive the adaptation of gene families. In this case, the plant cell wall material in the biome was the environment, and therefore the genes and enzymes needed for maximal degradation were the targets for optimization.