Transcriptomics

Following the release and annotation of a genome, the next logical step is to evaluate the messenger RNA expression level on a whole genome scale, referred to as transcriptome analysis. Targeted metabolic engineering relies heavily on the assumption that a genetic perturbation — gene deletion, con­stitutive overexpression, regulated induction, or modulation — will confer a metabolic flux response. This stems from the central dogma of biology: DNA is transcribed to RNA and subsequently translated to polypeptides that give rise to phenotype. Prior to transcriptome analysis, genes were assumed to be expressed followed by post-translational regulation, with little under­standing of interactions across gene loci [81]. In fact, transcriptome profiling of reference strains has provided a first approximation as to which pathways are active and, equally important, inactive, assuming that up-regulated gene expression leads to up-regulated pathway activity. It has since been shown that this is not always true — elevated mRNA levels do not always translate to elevated protein levels or activity. It has also provided significant insight into alternative modes of regulation, such as transcription factor-mediated as opposed to post-translational regulation. This has permitted narrowing of the experimental space that metabolic engineers need to consider, and made available new strategies to consider. Additionally, transcriptome pro­filing provides a quantitative in vivo assessment of several key metrics fol­lowing a genetic perturbation relative to a reference case: (1) what is the net change in mRNA expression levels of the targeted gene(s), (2) what is the net change in mRNA expression levels of non-targeted gene(s), and (3) what is the net change in mRNA expression levels of either reference or constructed strains under specific environmental conditions. These questions aim to iso­late which genes and pathways may serve as targets and/or explanations for observed or induced phenotypes. Measurement of the transcriptome, via readily available microarray technology, has evolved into a routinely meas­ured data set for many industrially relevant organisms, including E. coli and S. cerevisiae, and is playing a central role in both forward and reverse metabolic engineering [63,82,83].

Among the first applications of transcriptome measurements with in­dustrial relevance to bioethanol production was establishing the baseline response of S. cerevisiae to diverse carbon substrates and medium com­positions — essential for optimizing strains to given feedstocks and vice versa. Steady-state chemostat cultures were used to measure transcriptome responses under glucose, ethanol, ammonium, phosphate, and sulfate lim­itations [84]. Results suggested that genes related to high-affinity glucose uptake, the TCA cycle, and oxidative phosphorylation were up-regulated in glucose-limiting conditions, while genes involved in gluconeogenesis and ni­trogen catabolite repression were up-regulated in ethanol-grown cells [84]. In a similar but earlier study, transcriptome measurements were performed of S. cerevisiae grown using glucose-limited chemostats coupled with nitro­gen, phosphorus, and sulfur limitations [85]. In total, 1881 transcripts (31% of the total 6084 different open reading frames probed) were significantly up — or down-regulated between at least two conditions, and a total of 51 genes demonstrated a >tenfold higher or lower expression within a given condi­tion [85]. The transcriptome profiles under each condition have provided genetic motifs that may be recognized and regulated by transcription factors. These may be used in metabolic engineering strategies that could cater to a specific growth medium composition.

With the experimental mechanics of collecting transcriptome data becom­ing common place, attention and focus is now placed on data analysis methods and integration with other x-ome data sets. It has become abundantly clear that transcriptome data alone, unless used for the purposes of environmen­tal screening or quality control (i. e., confirming that an engineered genotype is producing the corresponding transcription profile), provides limited bi­ological insight. Several efforts have emerged coupling transcriptome with metabolome and fluxome data [86-89]. For example, elementary flux modes for three carbon substrates (glucose, ethanol, and galactose) were deter­mined using the catabolic reactions from the genome-scale metabolic model of S. cerevisiae, and then used for gene deletion phenotype analysis. Control — effective fluxes were used to predict transcript ratios of metabolic genes for growth under each substrate, resulting in a high correlation between the theor­etical and experimental expression levels of 38 genes when ethanol and glucose media were considered [90]. This example demonstrates that incorporating transcriptional functionality and regulation into metabolic networks for in silico predictions provides both more biologically representative models and a means of bridging transcriptome and fluxome data.

In another example, the topology of the genome-scale metabolic model constructed for S. cerevisiae is examined by correlating transcriptional data with metabolism. Specifically, an algorithm was developed enabling the iden­tification of metabolites around which the most significant transcriptional changes occur (referred to as reporter metabolites) [91]. Due to the highly connected and integrated nature of metabolism, genetic or environmental per­turbations introduced at a given genetic locus will affect specific metabolites and then propagate throughout the metabolic network. Using transcriptome experimental data, a priori predictions of which metabolites are likely to be affected can be made, and serve as rational targets for additional inspection and metabolic engineering [91]. This algorithm has been recently extended to include reporter reactions, whereby transcriptional data is correlated with the metabolic reactions of the reconstructed S. cerevisiae genome-scale metabolic network model to identify those reactions around which a genetic or environ­mental perturbation conferring transcriptional changes cluster [92].

As more genomes continue to become available, and microarray technol­ogy continues to become more accessible with cost-effective customizable DNA microarrays now available, transcriptome data will continue to increase. Bioinformatics for data handling, integration of transcriptome with other x-ome data, and the development of various network models that rely on tran- scriptome data for biological interpretation will continue to develop. From an industrial biotechnology perspective transcriptome measurements and analysis have played a large role in reverse metabolic engineering; transcrip­tional surveying of a strain constructed either via random mutagenesis or directed evolution [63,82,83,93]. For example, lysine production via C. glu — tamicum has undergone transcriptome and fluxome measurements to elu­cidate greater than 50 years of traditional metabolic engineering (random mutagenesis), providing new targets for improved strategies [94-96]. This ef­fort, applied to other industrial biotechnology processes, is likely to intensify.

3.3