Category Archives: ADVANCES IN

SOLID SOFTWARE

After the sequencing with SOLiD, the original sequence of color coding will be accumulated. According to double-base coding matrix, the origi­nal color sequence can be decoded to get the base sequence if we knew the base types for one of any position in the sequence. Because of a kind of color corresponding four base pair, the color coding of the base will directly influence the decoding of its following base. It said that a wrong color coding will cause a chain decoding mistakes. BioScope is SOLiD data analysis package which provides a validated, single framework for resequencing, ChIP-Seq, and whole transcriptome analysis. It depends on reference for the follow-up data analysis. First, the software converts the base sequences of references into color coding sequence. Second, the col­or-coding sequence of references is compared with the original sequence of color-coding to get the information of mapping with newly developed mapping algorithm MaxMapper.

DISCUSSION

Oleaginous microalgae can accumulate large quantities of lipid under stress inducing growth conditions, making them a target organism for sustainable liquid biofuel production. In the present study, we induced TAG production and accumulation in N. oleoabundans through nitrogen deprivation, and investigated the expression of genes involved in TAG production at the transcriptome level. Mapping reads to the assembled and annotated transcriptome provided significantly more information than mapping reads to other microalgae for which the genome has been sequenced and annotated (Figure 3). While transcriptomic analysis is not substitute for detailed gene and pathway studies, it does provide a broad overview of the important metabolic processes from which to ef­ficiently build hypotheses that can guide future detailed studies on im­proving lipid accumulation.

Our results suggest that under — N conditions, the altered expression of coordinated metabolic processes, many of which occur in the plastid, redirect the flow of fixed carbon toward biosynthesis and storage of lip­ids. These processes include up-regulation of de novo fatty acid and TAG synthesis, and concomitant repression of P-oxidation and TAG lipases. To supply precursors for lipid production, genes associated with the pyruvate dehydrogenase complex for converting pyruvate to acetyl CoA and lipases involved in the release free fatty acids from cell wall glycerophospholipids were overexpressed in the — N scenario. To power fatty acid production, strong overexpression under — N was observed in the pentose-phosphate pathway, which is primarily involved in supplying reducing equivalents for anabolic metabolism, including the production of fatty acids and as­similation of inorganic nitrogen [34].

INTEGRATION OF MULTIPLE ANNOTATION DATABASES

The Algal Functional Annotation Tool integrates annotation data from the biological knowledge bases listed in Table 1. Publically available flat files containing annotation data were downloaded and parsed for each individual resource. Chlamydomonas reinhardtii proteins were assigned KEGG pathway annotations by means of sequence similarity to proteins within the KEGG genes database [1]. MetaCyc [2], Reactome [28], and Panther [30] pathway annotations were assigned to C. reinhardtii proteins by sequence similarity to subsets of UniProt IDs annotated in each corresponding database. In all

cases, sequence similarity was determined by BLAST. BLAST results were filtered to contain only best hits with an E-value < 1e-05.

TABLE 1: List of annotation resources integrated into the Algal Functional Annotation Tool

Resource

URL

Reference

KEGG

http://www. genome. jp/kegg/

[1]

MetaCyc

http ://www. metacyc. org/

[2]

Pfam

http://pfam. sanger. ac. uk/

[3]

Reactome

http://www. reactome. org/

[28]

Panther

http://www. pantherdb. org/pathway

[30]

Gene Ontology

http://www. geneontology. org/

[31]

InterPro

http: //www. eb i. ac. uk/interpro

[32]

MapMan Ontology

http://mapman. gabipd. org/

[33]

KOG

http://www. ncbi. nlm. nih. gov/COG/grace/shokog. cgi

[35]

Primary databases used to functionally annotate gene models and integrated into the Algal Functional Annotation Tool.

Gene Ontology (GO) [31] terms were downloaded from the Chlamydo — monas reinhardtii annotation provided by JGI. These GO terms were asso­ciated with their respective ancestors in the hierarchical ontology structure to include broader functional terms and provide a complete annotation set. Pfam domain annotations were assigned by direct search against protein domain signatures provided by Pfam. InterPro [32] and user-submitted manual annotations are based on those contained within JGI’s annotation of the C. reinhardtii genome [11]. These methods were applied to four types of gene identifiers commonly used for C. reinhardtii proteins: JGI protein identifiers (versions 3 and 4) and Augustus gene models (versions 5 and 10.2). In total, over 12,600 unique functional annotation terms were as­signed to 65,494 C. reinhardtii gene models spanning four different gene identifier types by these methods (Table 2). These assigned annotations may be explored for single genes using a built-in keyword search tool as well as an integrated annotation lookup tool which displays all annotations for a particular identifier.

FUTURE DIRECTIONS

As with all tools that integrate data from multiple external sources, the power of analysis using the Algal Functional Annotation Tool is ultimately limited by the quality of the annotations within the primary databases. With the steady growth of knowledge in these annotation databases, the utility of the analyses provided is expected to increase in the future as more biologi­cal associations are assigned to genes. Additionally, as Chlamydomonas rein­hardtii genes continue to be experimentally characterized, the assignment of manual annotations will also fill in the gaps left by automated annotation as­signment and thus expand the annotation coverage throughout the genome, further improving the results generated by our portal. Lastly, the extensible nature of the Algal Functional Annotation Tool will allow us to add other algal organisms in the future using the same platform so that genomic data from other algal model organisms may be analyzed in a similar fashion as that currently available for Chlamydomonas reinhardtii.

11.3 CONCLUSIONS

The Algal Functional Annotation Tool is intended as a comprehensive analysis tool to elucidate biological meaning from gene lists derived from high-throughput experimental techniques. Annotation sets from a num­ber of biological databases have been pre-processed and assigned to gene identifiers of the green alga Chlamydomonas reinhardtii, and this annotation data may be explored in multiple ways, including the use of enrichment tests designed for large gene lists. Furthermore, the site enables the visu­alization of proteins within pathway maps. Using several methods, such as inferring annotations from orthologous proteins of other organisms, the initially sparse annotation coverage of C. reinhardtii is alleviated, allowing for a more effective functional term enrichment analysis. Other functions of the tool include a batch gene identifier conversion tool and a manual annotation search tool. Lastly, similar genes based on expression across several conditions may be explored using the gene similarity search tool.

11.4 AVAILABILITY AND REQUIREMENTS

• Project name: Algal Functional Annotation Tool

• Public web service: http://pathways. mcdb. ucla. edu webcite; Free and no registration.

• Programming language: Perl/CGI

• Database: MySQL

• Software License: GNU General Public License

ILLUMINA GA/HISEQ SYSTEM

In 2006, Solexa released the Genome Analyzer (GA), and in 2007 the company was purchased by Illumina. The sequencer adopts the technol­ogy of sequencing by synthesis (SBS). The library with fixed adaptors is denatured to single strands and grafted to the flowcell, followed by bridge amplification to form clusters which contains clonal DNA fragments. Before sequencing, the library splices into single strands with the help of linearization enzyme [10], and then four kinds of nucleotides (ddATP, ddGTP, ddCTP, ddTTP) which contain different cleavable fluorescent dye and a removable blocking group would complement the template one base at a time, and the signal could be captured by a (charge-coupled device) CCD.

At first, solexa GA output was 1 G/run. Through improvements in poly­merase, buffer, flowcell, and software, in 2009 the output of GA increased to 20 G/run in August (75PE), 30 G/run in October (100PE), and 50 G/run in December (Truseq V3, 150PE), and the latest GAIIx series can attain 85 G/run. In early 2010, Illumina launched HiSeq 2000, which adopts the same sequencing strategy with GA, and BGI was among the first globally to adopt the HiSeq system. Its output was 200 G per run initially, improved to 600 G per run currently which could be finished in 8 days. In the fore­seeable future, it could reach 1 T/run when a personal genome cost could drop below $1 K. The error rate of 100PE could be below 2% in average after filtering (BGI’s data). Compared with 454 and SOLiD, HiSeq 2000 is the cheapest in sequencing with $0.02/million bases (reagent counted only by BGI). With multiplexing incorporated in P5/P7 primers and adapters, it could handle thousands of samples simultaneously. HiSeq 2000 needs (HiSeq control software) HCS for program control, (real-time analyzer software) RTA to do on-instrument base-calling, and CASAVA for sec­ondary analysis. There is a 3 TB hard disk in HiSeq 2000. With the aid of Truseq v3 reagents and associated softwares, HiSeq 2000 has improved much on high GC sequencing. MiSeq, a bench top sequencer launched in 2011 which shared most technologies with HiSeq, is especially convenient for amplicon and bacterial sample sequencing. It could sequence 150PE and generate 1.5 G/run in about 10 hrs including sample and library prepa­ration time. Library preparation and their concentration measurement can both be automated with compatible systems like Agilent Bravo, Hamilton Banadu, Tecan, and Apricot Designs.

TRANSCRIPTOME RESPONSE OF N. OLEOABUNDANS TO NITROGEN LIMITATION

A primary physiological response to nitrogen limitation is a decrease in cell growth, as observed with the three times reduction in N. oleoabun­dans growth rate. The transcript profile of nitrogen-starved N. oleoabundans clearly reflects the decrease in cell proliferation and stressed physiological status of the cells. Gene ontology terms related to cellular growth, photo­synthesis, and protein machinery are significantly suppressed under — N conditions, and autophagy genes were up-regulated. The 5’ AMP-activat­ed protein kinase (SnRKl gene in plants) was slightly overexpressed in the — N scenario. SnRKl is activated under starvation conditions, includ­ing nitrogen depletion [31] and is a global regulator of starch and TAGs production in plants [30]. Overexpression of SnRKl in the transgenic po­tato Solanum tubersum cv. Prairie [35] and Arabidopsis thaliana[36] has resulted in changes in starch and carbohydrate levels, thus confirming this gene’s central role in carbon partitioning and suggesting that SnRKl may be an important target for metabolic engineering efforts in oleaginous mi­croalgae. We note also that genes encoding for the components of nitrogen assimilation are identified as the most significantly up-regulated genes in the transcriptome of nitrogen limited N. oleoabundans. Overexpression of nitrogen assimilation pathways under nitrogen limiting conditions has been previously reported in the transcriptome of other non-oleaginous mi­croalgae species [10,33].

ASSIGNMENT OF ANNOTATION FROM ARABIDOPSIS THALIANA

To extend the terms associated with C. reinhartdii genes, functional terms were inferred by homology to the annotation set of the plant Arabidop — sis thaliana (thale cress). Identification of orthologous proteins was based on sequence similarity and subsequent filtering of the results by retain­ing only mutual best hits between the two sets of protein sequences. The corresponding Arabidopsis thaliana annotation was used to supplement GO terms and was similarly expanded to contain term ancestry. The A. thaliana annotations of the MapMan Ontology [33] and MetaCyc Pathway database [2] were also used to provide more complete annotation coverage of the C. reinhardii genome.

TABLE 2: Number of gene identifiers associated with annotation databases

Identifier

Type

Total

Gene

IDs

KEGG

Reac-

tome

Pan­

ther

Gene

Ontol­

ogy

Map-

Man

KOG

Pfam

InterPro

JGI v3.0

14598

5348

2740

1147

6563

5214

9139

7166

7532

JGI v4.0

16706

4232

1949

1085

7568

3171

9973

7305

8151

Augustus

v5.0

16888

4686

2983

1673

4334

3160

5123

8202

5202

Augustus

u10.2

17302

4583

3326

1913

6956

3892

8977

8691

7464

Number of Chlamydomonas reinhardtii identifiers with at least one functional annotation for each primary database, shown per identifier type.

TRANSCRIPTOMIC ANALYSIS OF THE OLEAGINOUS MICROALGA NEOCHLORIS OLEOABUNDANS REVEALS METABOLIC INSIGHTS INTO TRIACYLGLYCERIDE ACCUMULATION

HAMID RISMANI-YAZDI, BERAT Z. HAZNEDAROGLU, CAROL HSIN, and JORDAN PECCIA

12.1 BACKGROUND

Important advantages of microalgae-based biofuels over first generation biofuels include algae’s greater solar energy conversion efficiency com­pared to land plants [1], the ability of oleaginous microalgae to utilize non-arable land and saline or waste-water, and their high content of energy dense neutral lipids that can be readily transesterified to produce biodiesel [2,3]. Under stress conditions such as nutrient deprivation or high light intensity, several species of oleaginous microalgae can alter lipid biosyn­thetic pathways to produce intracellular total lipid contents between 30 to 60% of dry cell weight (DCW) [4]. Triacylglycerides (TAGs) are the dominant form of lipids produced under these conditions. The excess pro­duction of TAGs in microalgae is thought to play a role in carbon and energy storage and functions as part of the cell’s stress response [5].

Due to the limited understanding of microalgae genetics and physiol­ogy, lipid metabolism from higher plants and bacteria have been the basis from which the accumulation of TAGs in microalgae has been modeled [5]. TAGs and polar membrane lipids are synthesized from fatty acids, that are primarily produced in the chloroplast [6]. The committed step in

fatty acid biosynthesis starts with the conversion of acetyl CoA to malo — nyl CoA through the enzyme acetyl CoA carboxylase (ACCase). In some plants, there is evidence that both photosynthesis — and glycolysis-derived pyruvate could be endogenous sources of acetyl CoA pool for fatty acid biosynthesis [5]. Fatty acid production in E. coli is regulated through feedback-inhibition by long chain fatty acyl carrier proteins (ACP) [7,8], and a recent study in the microalgae Phaeodactylum tricornutum demon­strated that overexpression of genes that encode for the thioesterases that hydrolyze the thioester bond of long chain fatty acyl ACPs resulted in a significant increase in fatty acid production [9]. Recent nitrogen depriva­tion studies in the model, nonoleaginous microalga Chlamydomonas rein — hardtii have also suggested an important role for lipases in restructuring the cell membrane under nitrogen limitation in order to supply fatty acids for TAG biosynthesis [10].

The stress-induced production of TAGs provides an opportunity to observe differential gene expression between high and low TAG accu­mulating phenotypes. Because multiple pathways are associated with the enhanced production of neutral lipids in microalgae, transcriptomic studies are an appropriate tool to provide an initial, broad view of carbon partitioning [11] and regulation of TAG biosynthesis during microalgae stress responses. However, the most promising strains thus far identified by growth experiments and lipid content screening [4,12] do not have se­quenced, fully annotated genomes [13-15]. In microalgae, transcriptomic studies have instead focuses on model organisms that are not oleaginous but have sequenced genomes [10,16]. There is a growing number of ole­aginous microalgae from which de novo transcriptomes have been assem­bled and annotated but comprehensive quantitative gene expression analy­sis in these microalgae has not yet been performed [14,17-19]. Recently, a de novo assembled-transcriptome was used as a search model to enable a proteomic analysis of the oleaginous microalga Chlorella vulgaris that demonstrated up-regulation of fatty acid and TAG biosynthetic pathways in response to nitrogen limitations [13].

In the present study, we quantitatively analyzed the transcriptome of the oleaginous microalga Neochloris oleoabundans to elucidate the met­abolic pathway interactions and regulatory mechanisms involved in the accumulation of TAG. N. oleoabundans (a taxonomic synonym of Ettlia oleoabundans[20]) is a unicellular green microalga belonging to the Chlo — rophyta phylum (class Chlorophyceae). It is known to produce large quan­tities of lipids (35 to 55% dry cell weight total lipids and greater than 10% TAGs) [4,12,21] in response to physiological stresses caused by nitrogen deprivation. To produce differences in lipid enrichment, N. oleoabundans was cultured under nitrogen replete and nitrogen limited conditions and major biomolecules including total lipids, TAGs, starch, protein, and chlo­rophyll were measured. The transcriptome was sequenced and assembled de novo, gene expression was quantified, and comparative analysis of genes, pathways and broader gene ontology categories was conducted. The results provide new insight into the regulation of lipid metabolism in oleaginous microalgae at the transcriptomic level, and suggest several potential strategies to improve lipid production in microalgae based on a rational genetic engineering approach.

12.2 RESULTS

HISEQ SOFTWARE

HiSeq control system (HCS) and real-time analyzer (RTA) are adopted by HiSeq 2000. These two softwares could calculate the number and position of clusters based on their first 20 bases, so the first 20 bases of each se­quencing would decide each sequencing’s output and quality. HiSeq 2000 uses two lasers and four filters to detect four types of nucleotide (A, T, G, and C). The emission spectra of these four kinds of nucleotides have cross-talk, so the images of four nucleotides are not independent and the distribution of bases would affect the quality of sequencing. The standard sequencing output files of the HiSeq 2000 consist of *bcl files, which contain the base calls and quality scores in each cycle. And then it is con­verted into *_qseq. txt files by BCL Converter. The ELAND program of CASAVA (offline software provided by Illumina) is used to match a large number of reads against a genome.

In conclusion, of the three NGS systems described before, the Illu — mina HiSeq 2000 features the biggest output and lowest reagent cost, the SOLiD system has the highest accuracy [11], and the Roche 454 system has the longest read length. Details of three sequencing system are list in Tables 1(a), 1(b), and 1(c).

THE REGULATION OF FATTY ACID AND TAG BIOSYNTHESIS AND SUPPLY OF PRECURSORS

While under nitrogen deprivations, there has been considerable uncertain­ty expressed whether the increase in TAG content is due to a reduction in the mass of the cell, rather than increase in TAG production [2]. Both the measured increase in TAG content per cell dry weight reported here (which accounted for the loss of cell mass during nitrogen limitation), and the observed changes in the FAME profile unequivocally demonstrate the overproduction and accumulation of TAG in N. oleoabundans under nitro­gen stress. Quantitative gene expression results also support these TAG production observations. In our study, most of the genes involved in the fatty acid biosynthetic pathway were up-regulated under — N conditions. The gene encoding for ACCase, the first enzyme in the pathway, was re­ported as down-regulated under — N. However, the biotin-containing sub­unit of ACCase, biotin carboxylase (BC), was significantly overexpressed. In photosynthetic organisms, two different forms of ACCase have been identified, one located in the plastid and the other located in the cytosol. The plastidal ACCase is a heteromeric multi-subunit enzyme that contains BC, whereas the cytosolic ACCase is a homomeric multifunctional protein that does not contain BC [27]. In our transcriptome analysis, we identified genes encoding for both forms of ACCase. In the plastid—the primary cite of lipid biosynthesis in microalgae—we have observed a significant increase in expression of the BC subunit of heteromeric isoform that cata­lyzes the very first step of carboxylation. On the other hand, the expression of homomeric ACCase, predominantly located in the cytosol where lipid biosynthesis does not typically occur, was repressed.

Although the overexpression of BC points to a key step in the pathway as a potential target to genetically engineer an improved oleaginous strain, mixed results for improving fatty acid synthesis in microalgae have been observed when ACCase is overexpressed [2]. Recent research has sug­gested that fatty acid synthesis may also be regulated by inhibition from the buildup of long chain fatty acyl ACPs [9]. Overexpressing genes that cleave ACP residues from the long chain fatty acyl ACPs is a condition observed in bacteria and recently in the microalga P. tricornutum to result in increased production of fatty acids [9]. In our study, genes encoding for these enzymes were highly overexpressed under the — N conditions. Therefore, a potential target for metabolic engineering in N. oleoabundans is the overexpression of thioesterases FatA and OAH that cleave off ACP residues.

Genes encoding enzymes involved in the steps downstream of fatty acid biosynthesis, including elongation and desaturation, have also displayed significant changes in transcription levels in response to nitrogen star­vation. In particular, the genes encoding AAD and delta-15 desaturase, which catalyze the formation of double bond between the 9th, 10th, 14th, and 15th carbon, respectively, were up-regulated under — N conditions. A similar observation has been reported by Morin et al. [37], where the gene encoding delta-9 fatty acid desaturase is up-regulated in the oleagi­nous yeast Y. lipolytica cultured under nitrogen limitation. As observed here, and supported by gene expression levels, nitrogen limitation alter the lipid profile towards higher saturation (increase in C18:1, and decrease in C18:2 and C18:3). The increased proportion of saturated fatty acids in TAG has been demonstrated to improve cetane number and stability of resulting biodiesel [38].

Based on the lipid metabolism genes discovered from our transcriptome assembly, the acyl-CoA dependent mechanism is the major contributor to TAG biosynthesis in N. oleoabundans. In our study, two genes associated with biosynthesis of TAG show significant changes in their expression under — N condition: one encoding GPAT and the other one encoding AGPAT. These enzymes catalyze the acyl-CoA-dependent acylation of positions 1 and 2 of glycerol-3-phosphate, respectively. The acylation of glycerol-3-phos­phate represents the first and committed step in glycerolipid biosynthesis, and likely the rate limiting step in the pathway as GPAT exhibits the lowest specific activity among all enzymes involved in the glycerol-3-phosphate pathway [39]. A recent proteomics study also reported significant up-regu­lation of TAG-related acyltransferases in parallel with accumulation of large quantities of lipid in C. vulgaris cultured under nitrogen limitation [13]. The overexpression of GPAT and AGPAT has been reported to increase seed oil accumulation in Arabidopsis and Brassica napus[40-42]. The up-regulation of these two genes also indicates an increase in the flow of acyl-CoA to­ward TAG biosynthesis. The final step of the TAG biosynthesis pathway is catalyzed by DGAT, the third acyltransferase. In our study, the gene encod­ing DGAT displays relatively no change in its expression under nitrogen limitation. This observation coupled with the significant increase in TAG production in the — N case, and previous proteomics studies that showed overexpression of DGAT in the C. vulgaris due to nitrogen limitation [13] provides evidence that DGAT expression in N. oleoabundans may be regulated post-transcriptionally. The post-transcriptional regulation of DGAT has pre­viously been documented in the oilseed rape Brassica napus [43].

Finally, the enrichment of intracellular starch increased during the — N case. Although starch synthase and AGPase encoding genes were repressed in — N, the gene encoding for a-amylase, responsible for the hydrolysis of starch to glucose monomers, was also repressed. The con­comitant accumulation of starch and lipids under nitrogen limitation has been reported in the nonoleaginous C. reinhardtii[44,45] and recently reported for N. oleoabundans[46]. This contrasts with recent reports in Micractinium pusillum where carbohydrate content was reduced and TAG production was increased under nitrogen limitation [19]. Genetic manipulations (sta6 mutant) that block starch synthesis in C. reinhardtii have resulted in a significant increase in TAG accumulation [47]. Under nitrogen limitation, the increased TAG content in N. oleoabundans and concomitant repression of starch synthase are analogous to the C. rein­hardtii sta6 mutant. These results extend the idea of blocking starch syn­thesis for improvement of TAG production to the oleagenous microalga N. oleoabundans.