Proteomics

Proteomics is the quantitative study of all proteins expressed in a cell under defined conditions. Proteomics represents one of the more challeng­ing x-omes given that analytical methods enabling measurement of all pro­teins with the sensitivity, accuracy, and precision required have only recently been developed [62,72]. Rapid advances in protein analytical technologies, fueled by the addition of mass spectrometry (MS), liquid chromatography (LC), sequence databases, and data handling methods, have made it possible for protein chemists to identify and examine the expression of many pro­teins resolvable by 2DE (two-dimensional gel electrophoresis). The possibility for large-scale protein studies seemed attainable [97]. It was in this context that in 1994, at the first 2DE meeting in Siena, Italy, the term “proteome” was coined [98]. Methods employed in proteomics have since gone on to include two-dimensional differential gel-electropheresis (DiGE), multidimen­sional protein identification technology (MuDPiT), isotope-coded affinity tag technology (ICAT), and quantitative proteome analysis based on MS-MS spe — tra and a multiplexed set of chemical reagents referred to as iTRAQ [99]. Although still slowly emerging, there are clear examples of where proteome analysis has resulted in strain improvement and successful metabolic engin­eering strategies [62,100].

In line with industrial biotechnology applications, results of 2DE analy­sis can identify targets for strain improvement, such as target gene dele­tions [101] or co-expression for product enhancement [102]. Proteome an­alysis may also improve the design and control of industrial fermentation processes. In such a study, the dynamics of the E. coli proteome were recorded during an industrial fermentation process with and without in­duction of recombinant antibody synthesis [103]. The recombinant anti­body fragment CD18 F(ab’)2 was developed as a biopharmaceutical for the treatment of acute myocardial infarction. Proteomic analysis of the above fermentation process suggested co-expression of Phage shock pro­tein A (PspA) with a recombinant antibody fragment in E. coli resulted in improved yields. Further investigation is required to understand why PspA addition resulted in improved yield [104]. Another example, more relevant to bulk chemical manufacturing, is the metabolic engineering of E. coli to pro­duce the biodegradable and biocompatible thermoplastic polymer, poly-(3- hydroxybutyrate), often referred to as PHB, which has numerous applications including serving as a primary feedstock for synthesis of enantiomerically pure chemicals. Specifically, the proteome of the metabolically engineered E. coli XL-1 Blue for PHB intracellular accumulation was compared to the reference strain, noting that PHB accumulation is not observed in the refer­ence strain. It was revealed that 2-keto-3-deoxy-6-phosphogluconate adolase (Eda) plays a pivotal role in supplying glycerol-3-phosphate and pyruvate to further increase the flux to acetyl-CoA. A larger acetyl-CoA and NADPH de­mand is consistent with cells that produce a large amount of PHB. These conclusions were based on identification of protein spots on 2DE using matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry [105].

Among the most recent examples of proteomics applied to industrial biotechnology process development is the recent reporting of the com­plete proteome of Mannheimia succiniciproducens [100]. M. succinicipro — ducens MBEL55E is a capnophilic Gram-negative bacterium isolated from bovine rumen, which produces large amounts of succinic acid under anaer­obic conditions (0.68 g-succinic acid/g-glucose), and was first reported in 2002 [106]. Succinic acid is a C4 organic acid, traditionally produced via petrochemical conversion of maleic anhydride. It promises to be a strategic building block chemical to be produced by industrial biotechnology, due to its use as the primary feedstock in the synthesis of key products including bu — tanediol, tetrahydrofuran, y-butyrolactone, and poly-amides [107,108]. Nu­merous groups are exploring production of succinic acid in different host or­ganisms, including E. coli [109], Anaerobiospirillum succiniciproducens [110], Actinobacillus succinogenes [110,111], Aspergillus niger, and Saccharomyces cerevisiae. In M. succiniciproducens using 2DE coupled with MS-MS identi­fication and characterization lead to identification of 200 proteins, with 129 proteins from the whole cell proteome, 48 proteins from the membrane pro — teome, and 30 proteins from the secreted proteome. Characterization of cell growth and metabolite levels in conjunction with proteome measurements during the transition from exponential to stationary growth was carried out.

Two interesting conclusions could be drawn from such analysis that was not possible a priori. First, a gene locus previously annotated as the succinate dehydrogenase subunit A (sdhA) is likely to be the fumarate reductase sub­unit A (frdA), based on comparative proteome analysis supported by physi­ological data. Second, two novel enzymes were identified as likely metabolic engineering targets for future improvements in succinic acid production. PutA and OadA are enzymes responsible for acetate formation and conver­sion of oxaloacetate to pyruvate, respectively, and their deletion is likely to induce higher flux towards succinic acid through minimization of byprod­uct formation [100]. This is a clear example of where proteome measurement and analysis not only provided novel information for future metabolic engin­eering strategies, but also served as a quality-control check for two critical assumptions: (i) that genome annotation is error-free, and (ii) that mRNA expression directly correlates with protein expression and activity.

As discussed previously, acquisition of large bodies of genomic sequences has prompted development and application of tools such as cDNA/oligo — nucleotide microarrays, which in turn has made possible global analysis of cellular processes. As powerful as this approach is proving to be, much of the regulation of physiological processes occurs post-transcriptionally. Thus, measurement of mRNA levels provides an incomplete picture of cellular ac­tivity and regulatory control points that may yield themselves as preferred metabolic engineering targets. Methods and techniques developed to meas­ure the global expression, localization, and interaction of proteins fall within the domain of proteomics. By integrating various data sources with known biological function about individual genes and proteins, one starts uncover­ing underlying mechanisms leading to the creation and analysis of static and dynamic models of regulatory networks and pathways.

A recent study has shown the value of this union of data as an experimen­tal strategy to gain insights into cellular physiology [87]. In this study, both transcriptional and proteomic data were collected from S. cerevisiae and all of the known components of the galactose induction pathway were systemati­cally perturbed. The different data were integrated into a mathematical model that included enzymatic reactions, membrane transport, transcriptional acti­vation, protein activation, and protein inhibition. The model predicted pre­viously unknown intra-pathway interactions, and inter-pathway interactions of the galactose induction pathway and other cellular processes. Several of these predictions were then verified experimentally [87]. The galactose sig­naling pathway is of particular industrial relevance as one of the classical and best-understood promoter and induction systems used for protein expres­sion. This example further highlights that even such an extensively studied pathway will manifest new mechanisms for control and manipulation using x-omic approaches.

Related directly to bioethanol process development, several groups are evaluating proteomes of production organisms under defined environments that are of immediate industrial relevance. For example, Salusjarv et al. (2003) performed a proteome analysis of metabolically engineered S. cerevisiae strains cultured on xylose as compared to glucose under aerobic and anaero­bic carbon-limited chemostats [113]. Lignocellulosic feedstocks are abundant and renewable; however, are also composed of xylose — the most abundant pentose sugar in hemicellulose, hardwoods, and crop residues, and the sec­ond most abundant monosaccharide after glucose [114]. S. cerevisiae fails to consume pentose sugars efficiently, compared to glucose, and therefore sig­nificant research has occurred in metabolically engineering such strains (see Sect. 3.5 for further discussion). Proteome analysis of xylose fermentations revealed 22 proteins that were found in significantly higher concentrations relative to glucose fermentations. Such proteins included alcohol dehydroge­nase 2 (Adh2p), acetaldehyde dehydrogenases 4 and 6 (Ald4p and Ald6p), and DL-glycerol-3-phosphatase (Gpp1p) [113]. As will be revealed in the fluxome discussion, this protein expression profile is indicative of the redi­rection of metabolic fluxes believed to occur under xylose fermentation. Pro — teome analysis bridges the gap between genetic engineering, transcription profiles, and observed metabolism by identifying that over — or underexpres­sion of specific proteins (i. e., enzymes) are pushing targeted (or untargeted) metabolic fluxes in desired (or undesired) directions.

Proteomics is a rapidly developing area of research, whereby new technolo­gies are often developed and validated in model systems such as S. cerevisiae. Compared with genomics, however, proteomics is still limited because it is strongly biased towards highly abundant proteins and, therefore, does not yet provide the genome-wide coverage obtained by other x-ome technologies. Additionally, the proteome world is possibly the most complex of all x-omes because of its highly dynamic nature and complexity resulting from splice — variants, isoforms, and protein post-translational modifications. For some proteins, in excess of 1000 variants have been described [104]. It is evident that there is an ongoing need for improvement in (quantitative) proteomics technologies, whereby yeast will likely have its role again as the benchmark model system. Proteomics, largely absent in bioethanol development, is at the infancy of finding key roles in industrial products. Those products are likely to be targeted as co-products for bioethanol-based biorefineries. Succinic acid has already been considered as a potential added value co-product that could diversify the product portfolio of a biorefinery where the high-volume, low — value product will be bioethanol [115,116].

3.4