Metagenome analysis

Collectively, the genomes of the total microbiota found in nature, referred to as the metagenome (37), contains vastly more genetic information than is contained in the cultivable subset. However, the genetic complexity of a microbial community at a spe­cific site is influenced by many environmental factors. Re-association of total community DNA extracted from different environmental distinct sites has revealed that the community genome size can equal that of 6000-10 000 Escherichia coli genomes in unperturbed organic soil, but only 350-1500 genomes in arable or heavy metal-polluted soils (83, 94). These estimates are conservative, since genomes representing rare and unrecovered microorgan­isms were probably not included in the analysis. As expected, Torsvik and Ovreas (2) could recover less than 40 genomes by culturing methods which emphasizes the need for develop­ment of novel methods and approaches to provide new insight into the relationship between phylogenetic and functional diversity of these communities as ecosystems.

DNA sequencing continues to be one of the most important platforms for the study of biological systems. With the development of improved sequencing technologies that en­hance the speed, sensitively and throughput, it has become feasible to sequence the entire metagenome of an environmental sample (95). Culture-independent genomic analysis of microbial communities using metagenomics is revealing that soil and ocean environments are more genetically and potentially more biochemically diverse than previously thought (96). This involves the cloning and analysis of large genomic DNA fragments isolated from a mixed community. The metagenomic library can then be screened for functional or tax­onomic genes of interest or sequenced by shotgun sequencing. Most environments contain communities far too complex for it to be possible to sequence a complete metagenome, and even the simple communities contain micro-heterogeneity that makes most genome reconstructions simplified versions of reality. Reconstruction of community metagenomes was initially pursued for viral communities in the ocean and human feces (97-99) and has since been attempted in an acid mine drainage (AMD) biofilm (100) and the Sargasso Sea (101). The AMD biofilm community was ideal for complete metagenome sequencing be­cause 16S rRNA gene sequencing indicated that there were three bacterial and three archaeal species in the biofilm. Marine communities contain far greater species richness, on the order of 100-200 species per milliliter of water (102, 103), making the sequencing and assembly effort considerably more difficult. Further out on the continuum of biological complexity is soil, with an estimated species richness on the order of 4000 species per gram of soil (35, 102, 103). Sequencing the soil metagenome requires faster and less expensive sequencing technology than currently available.

Recently, we initiated in collaboration with the JGI the sequencing of the metagenome of a microbial community actively decaying poplar biomass under anaerobic conditions. The predominance of microbial enumeration in the biomass pile is represented by this large anaerobic core zone. In addition to some cellulolytic fungi, bacteria of the order Clostridiales, many of which have strong cellulolytic activities, were found to dominate this specific microbial community. The estimated composition and the distribution of bacte­rial members of this community were determined based on 16S rRNA gene sequencing (S. Taghavi and D. van der Lelie, unpublished). It should be noted that we were able to cultivate several members of this community and characterize their cellulolytic activities. Interestingly, none of these cultivable species represented the dominant community mem­bers, stressing the importance to use a cultivation-independent approach to characterize the composition and metabolic potential of this complex microbial community.