Category Archives: ADVANCES IN

FUNCTIONAL TERM ENRICHMENT TESTING

The hypergeometric distribution is commonly used to determine the sig­nificance of functional term enrichment within a list of genes. In this test, the occurrence of a functional term within a gene list is compared to the background level of occurrence across all genes in the genome to deter­mine the degree of enrichment. A p-value based on this test can be calculated from four parameters: (1) the number of genes within the list, (2) the frequency of a term within the gene list, (3) the total number of genes within the genome, and (4) the frequency of a term across all genes in the genome. This test effectively distinguishes truly overrepresented terms from those occurring at a high frequency across all genes in the genome and therefore within the gene list as well. The cumulative hyper­geometric test assigns a p-value to each functional term associated with genes within a given list, and all functional terms are ranked by ascending p-value (i. e. by descending levels of enrichment). Huang et al. reviews the use of the hypergeometric test for functional term enrichment [34]. The Algal Functional Annotation Tool computes hypergeometric p-values using a Perl wrapper for the GNU Scientific Library cumulative hypergeo­metric function written in C to provide a quick and accurate implementa­tion of this statistical test.

MAJOR BIOMOLECULE CONTENT AND COMPOSITION DIFFER BETWEEN THE NITROGEN REPLETE (+N) AND NITROGEN-LIMITED (-N) GROWTH ENVIRONMENTS

To track gene transcription in the oleaginous microalga N. oleoabundans, cells were first grown under+N and — N conditions as a method to pro­duce differential cellular enrichments of TAGs. Cells were harvested after 11 days. This sampling time corresponded to below detection level con­centrations for NO3- and a reduction in growth rate in the -N reactors (Fig­ure 1A, B). The maximum growth rate for the — N cultures was 113 ± 4 (std. err.) mgl-1 day-1 and decreased to 34 ± 0.7 mgl-1 day-1 once nitrogen became limited in the reactor. Total lipids extracted under the+N and — N scenarios revealed a statistically significant increase (p < 0.05) from 22% DCW in+N to 36% in the — N condition (Figure 1C). Extracted lipids were transesterified and fatty acid methyl esters (FAMEs) (FAMEs assumed to

image089

FIGURE 1: N. oleoabundans growth and lipid characteristics. (A) Growth curves under+N and — N conditions. Inset image represents the difference in culture appearance between the two growth condition; (B) Nitrate as N concentrations in the bioreactors during growth; (C) Cell weight enrichment of total lipids and fatty acid methyl esters (FAME, representative of TAGs) from cells harvested on day 11; and (D) Percentage distribution of FAME from cells harvested on day 11. All error bars represent one standard deviation.

be equivalent to TAGs content [22]), were quantified. Compared to the+N condition, the FAME or TAG content per cell mass increased by five times in the — N case (p <0.05), demonstrating that the additional lipids produced during N limitations were mostly TAGs (Figure 1C). Estimates of total cell mass based on direct microscopic counts and DCW determinations revealed that the average mass of a cell in — N was 81% of that in+N, confirming that the change in TAG was independent of changes in DCW. FAME profiles are presented in Figure 1D, and show a 50% decrease in the proportion of unsaturated fatty acids (i. e. C16:2, C16:3, C16:4, C18:2, and C18:3) under nitrogen limitation. The most significant change was in the amount of oleic acid (C18:1), which increased over 5 times, while the quantity of a-linoleic acid (C18:3) decreased by 4.8-fold under — N conditions. This trend toward a greater proportion of C18:1 is consistent with prior investigations of the oleaginous microalgae N. oleoabundans and Chlorella vulgaris FAME contents under nitrogen limitations [13,22].

To aid in interpreting how photosynthetically fixed carbon was directed into major metabolic pathways, the chlorophyll, protein, and starch con­tent of N. oleoabundans were also measured under the — N and+N scenarios (Table 1). Nitrogen deprivation lead to a reduction in nitrogen-containing chlorophyll content. This loss of chlorophyll is consistent with the light green color of chlorosis observed in the cultures under nitrogen limitation (Figure 1A inset). Also under nitrogen limitation, a decrease in cellular protein content and an increase in cellular starch content were observed. The observed changes in metabolite and biomolecule contents suggest the redirection of metabolism in N. oleoabundans during nitrogen limitation to reduce nitrogen-containing compounds (protein and chlorophyll) and fa­vor the accumulation of nitrogen free storage molecules TAGs and starch.

TABLE 1: Culture density and cellular composition of major biomolecules of N. oleoabundans cells determined after 11 days of growth under nitrogen replete (+N) and nitrogen limited (-N) conditions

+N

-N

Culture density (cells/mL)

(6.1±0.2) x 107

(3.8 ± 0.2) x 107

Chlorophyll a (^g/mg)

(119.3 ± 12.6) x 10-3

(5.9 ± 0.4) x 10-3

Chlorophyll b(^g/mg)

(42.6 ± 5.5) x 10-3

(5.5 ± 0.5) x 10-3

Starch content (% DCW)

0.2 ± 0.1

4.0 ± 0.5

Protein content (% DCW)

37.9 ± 4.0

19.4 ± 17.1

COMPACT PGM SEQUENCERS

Ion Personal Genome Machine (PGM) and MiSeq were launched by Ion Torrent and Illumina. They are both small in size and feature fast turnover rates but limited data throughput. They are targeted to clinical applications and small labs.

10.5.1 ION PGM FROM ION TORRENT

Ion PGM was released by Ion Torrent at the end of 2010. PGM uses semi­conductor sequencing technology. When a nucleotide is incorporated into the DNA molecules by the polymerase, a proton is released. By detecting the change in pH, PGM recognized whether the nucleotide is added or not. Each time the chip was flooded with one nucleotide after another, if it is not the correct nucleotide, no voltage will be found; if there is 2 nucleotides added, there is double voltage detected [15]. PGM is the first commercial sequencing machine that does not require fluorescence and camera scanning, resulting in higher speed, lower cost, and smaller instru­ment size. Currently, it enables 200 bp reads in 2 hours and the sample preparation time is less than 6 hours for 8 samples in parallel.

An exemplary application of the Ion Torrent PGM sequencer is the identification of microbial pathogens. In May and June of 2011, an ongo­ing outbreak of exceptionally virulent Shiga-toxin — (Stx) producing Esch­erichia coli O104:H4 centered in Germany [16, 17], there were more than 3000 people infected. The whole genome sequencing on Ion Torrent PGM sequencer and HiSeq 2000 helped the scientists to identify the type of E. coli which would directly apply the clue to find the antibiotic resistance. The strain appeared to be a hybrid of two E. coli strains—entero aggre­gative E. coli and entero hemorrhagic E. coli—which may help explain why it has been particularly pathogenic. From the sequencing result of E. coli TY2482 [18], PGM shows the potential of having a fast, but limited throughput sequencer when there is an outbreak of new disease.

In order to study the sequencing quality, mapping rate, and GC depth distribution of Ion Torrent and compare with HiSeq 2000, a high GC Rho — dobacter sample with high GC content (66%) and 4.2 Mb genome was sequenced in these two different sequencers (Table 2). In another experi­ment, E. coli K12 DH10B (NC_010473.1) with GC 50.78% was sequenced by Ion Torrent for analysis of quality value, read length, position accura­cies, and GC distribution (Figure 1).

Подпись: Comparison of Next-Generation Sequencing Systems 289Подпись:Подпись: 200 «■Подпись: 50Подпись:image084ft.

150 £

T3

г

ha

100 ^

a.

rj

All read length distribution, total read num з 577537 Mapped read length distribution Mapped read map length distribution

TABLE 2: Comparison in alignment between Ion Torrent and HiSeq 2000.

Ion Torrenta

HiSeq 2000b

Total reads num

165518

205683

Total bases num

18574086

18511470

Max read length

201

90

Min read length

15

90

Map reads num

157258

157511

Map rate

95%

76.57%

Covered rate

96.50%

93.11%

Total map length

15800258

14176420

Total mismatch base

53475

142425

Total insertion base

109550

1397

Total insertion num

95740

1332

Total deletion base

152495

431

Total deletion num

139264

238

Ave mismatch rate

0.338%

1.004%

Ave insertion rate

0.693%

0.009%

Ave deletion rate

0.965%

0.003%

a: use TMAP to align; b: use SOAP2 to align.

LIPID TURNOVER

In our study, several genes encoding enzymes involved in the intracel­lular breakdown of fatty acids and lipids are significantly repressed under — N (Table 3). Repressing p-oxidation is a clear strategy for maintaining a higher concentration of fatty acids within a cell. In contrast, most of the identified lipases (with the exception of triacylglycerol lipases) are overexpressed during nitrogen limitation. Upon closer examination, the up-regulated lipases are mostly phospholipases associated with hydrolyz­ing cell wall glycerophospholipids and phospholipids into free fatty acids, potentially for incorporation into TAGs. A known result of nitrogen limi­tation induced autophagy in C. reinhardtii is the degradation of the chlo — roplast phospholipid membrane [47,48]. Moreover, the overexpression of lipases during nitrogen limitation in C. reinhardtii has previously been hypothesized to be associated with the reconstruction of cell membranes [10]. In addition to phospholipases, we have identified an enriched num­ber of transcripts for phospholipid metabolic processes and lipid transport in the — N case (Figure 4B). The up-regulation of genes encoding for enzymes that produce free fatty acids is also consistent with the fact that the PDAT enzyme associated with the acyl-CoA-independent mechanism of TAG synthesis (which utilizes phospholipids, rather than free fatty ac­ids, as acyl donors) was not recovered in our assembled transcriptome.

12.3 CONCLUSIONS

Assembling the transcriptome and quantifying gene expression responses of Neochloris oleoabundans under nitrogen replete and nitrogen limited conditions enabled the exploration of a broad diversity of genes and path­ways, many of which comprise the metabolic responses associated with lipid production and carbon partitioning. The high coverage of genes en­coding for full central metabolic pathways demonstrates the completeness of the transcriptome assembly and the repeatability of gene expression data. Furthermore, the concordance of metabolite measurements and ob­served physiological responses with gene expression results lends strength to the quality of the assembly and our quantitative assessment. Our find­ings point to several molecular mechanisms that potentially drive the over­production of TAG during nitrogen limitation. These include up-regula­tion of fatty acid and TAG biosynthesis associated genes, shuttling excess acetyl CoA to lipid production through the pyruvate dehydrogenase com­plex, the role of autophagy and lipases for supplying an additional pool of fatty acids for TAG synthesis, and up-regulation of the pentose phosphate pathway to produce NADPH to power lipid biosynthesis. These identified gene sequences and measured metabolic responses during excess TAG production can be leveraged in future metabolic engineering studies to im­prove TAG content and character in microalgae and ultimately contribute to the production of a sustainable liquid fuel.

12.4 METHODS

DYNAMIC VISUALIZATION OF KEGG PATHWAY MAPS

Individual pathway maps from KEGG provide information on protein localization within the cell, compartmentalization into different cellular components, or of reactions within a larger metabolic process. Visualiza­tion of proteins from gene lists onto pathway maps is useful for their in­terpretation. The Algal Functional Annotation Tool utilizes the publicly available KEGG application programming interface (API) for pathway highlighting. The information linking C. reinhardtii proteins to identifiers within the KEGG database is used to determine the subset of KEGG IDs within the supplied gene list associated with a particular pathway. The Algal Functional Annotation Tool also deduces which proteins within the pathway are located within the genome of C. reinhardtii but not found in the gene list and sends the corresponding identifiers to the KEGG API to be highlighted in a different background color. This API interface is imple­mented using the SOAP architecture for web applications.

DE NOVO TRANSCRIPTOME ASSEMBLY, ANNOTATION, AND EXPRESSION

In order to produce statistically reliable and comparable RNA-Seq data, cDNA library construction and sequencing was performed for each of the duplicate+N reactors, and each of the duplicate — N reactors. Over 88 mil­lion raw sequencing reads were generated and subjected to quality score and length based trimming; resulting in a high quality (HQ) read data set of 87.09 million sequences (average phred score of 35) with an average read length of 77 bp. By incorporating a multiple k-mer based de novo transcriptome assembly strategy (k-mers 23, 33, 63, and 83) [23], HQ reads were assembled into 56,550 transcripts with an average length of 1,459 bp and a read coverage of 1,444* (Figure 2C). Generated transcripts were subjected to searches against the National Center for Biotechnology Information’s (NCBI) nonredundant and plant refseq databases [24], and the majority of transcripts showed significant mat-ches to other closely related green microalgae species (Figure 2A, B) including C. variabilis (~85% of all transcripts), C. reinhardtii (~2.6%), and V. carteri (~3.4%) (Figure 2A). With additional annotations by using KEGG services and Gene Ontology (GO), a total of 23,520 transcripts were associated with at least one GO term, and 4,667 transcripts were assigned with enzyme commission (EC) numbers. Overall, 14,957 transcripts had KO identifiers and were annotated as putative genes and protein families. This assembly provided a reliable, well-annotated transcriptome for downstream RNA — Seq data analysis.

Following the transcriptome assembly and annotation, HQ reads ob­tained from each experimental condition were individually mapped to the generated assembly in order to determine the transcript abundances as RPKM values. To determine fold change differences among+N and — N transcripts, non-normalized read counts were fed into the DESeq package (v1.5.1) and variance and mean dependencies were accounted for [25]. Based on the negative binomial distribution model used in DESeq pack­age, 25,896 transcripts out of the total 56,550 non-redundant transcripts were up-regulated under the — N condition. Plotting transcript fold changes levels shows a high correlation among the biologically replicated sequenc­ing runs as indicated by Euclidean distances (Figure 2D). Overall, 15,987 transcripts had significant differential regulation (q < 0.05) Figure 3A. A complete table of fold changes with significance level for all genes as­sessed is presented in Additional file 3.

We further investigated the alignment of HQ reads to the reference genomes of C. reinhardtii and V. carteri in order to improve and extend our transcriptomic analysis to the detection of splicing events and alternative isoform formation (Figure 3B, C). Although the majority of annotated or-

Подпись: Transcriptomic Analysis of the Oleaginous Microalga 331Подпись: В Подпись:image093A

Transcript l«r>glh (bp)

FIGURE 2: De novo assembly and mapping results. (A, B) Top-hit species distribution for BLASTX matches for the N. oleoabundcms transcriptome; (C) Cumulative transcript length frequency distribution of the N. oleoabundcms transcriptome assembly; (D) Heat map demonstrating the top 100 most differentially expressed genes in the biological replicates of+N and-N conditions.

image094

FIGURE 3: (A) MvA plot contrasting gene expression levels between the-N and+N scenarios based on reads mapped to the/V.

Подпись: 332 Advances in Biofuel Production: Algae and Aquatic Plantso/eoflij/wfifawxtranscriptome. The x-axis represents the mean expression level at the gene scale, and the у-axis represents the log2 fold change from-N to+N. Negative fold changes indicate up-regulation of-N genes. Lighter gray dots are genes that are significant at a false discovery rate of 5%; (B) MvA plot for reads mapped to the C. reinhardtii genome; and (C) MvA plot for reads mapped to the V. carteri genome.

thologs were identified from these closely related microalgae species, very poor mappings (i. e. <5% of reads) were observed between the RNA-Seq data of N. oleoabundans and the genomes of C. reinhardtii and Volvox carteri. As a result, the number of transcripts annotated and evaluated for differ­ential expression was suboptimal, and genomes from these most closely related organisms were not used for gene expression analysis.

SEQUENCING QUALITY

The quality of Ion Torrent is more stable, while the quality of HiSeq 2000 decreases noticeably after 50 cycles, which may be caused by the decay of fluorescent signal with increasing the read length (shown in Figure 1).

10.5.1.1 MAPPING

The insert size of library of Rhodobacter was 350 bp, and 0.5 Gb data was obtained from HiSeq. The sequencing depth was over 100x, and the contig and scaffold N50 were 39530 bp and 194344 bp, respectively. Based on the assembly result, we used 33 Mb which is obtained from ion torrent with 314 chip to analyze the map rate. The alignment comparison is Table 2.

The map rate of Ion Torrent is higher than HiSeq 2000, but it is in­comparable because of the different alignment methods used in different sequencers. Besides the significant difference on data including mismatch rate, insertion rate, and deletion rate, HiSeq 2000 and Ion Torrent were still incomparable because of the different sequencing principles. For example, the polynucleotide site could not be indentified easily in Ion Torrent. But it is shown that Ion Torrent has a stable quality along sequencing reads and a good performance on mismatch accuracies, but rather a bias in detection of indels. Different types of accuracy are analyzed and shown in Figure 1.

BIOREACTOR EXPERIMENTS

N. oleoabundans (UTEX # 1185) was obtained from the Culture Collec­tion of Algae at the University of Texas (UTEX, Austin, TX, USA). Batch cultures were started by inoculation with 106 log growth phase cells into 1 liter glass flasks filled with 750 ml of Modified Bold-3 N medium [49] without soil extract. The concentration of nitrogen in the medium was ad­justed to 50 mg as N l-1 (nitrogen replete; denoted as + N) and 10 mg as N l-1 (nitrogen limited; denoted as — N) using potassium nitrate (KNO3) as the sole source of nitrogen. These concentrations were chosen based on preliminary experiments that identified incubation times and nitrogen con­centrations necessary to induce nitrogen depletion in the mid log-phase of the — N cultures and to ensure that the nitrogen-replete cultures never encountered nitrogen-limitation during the course of the experiment. For each nitrogen condition, cells were cultured in duplicate reactors. Reac­tors were operated at room temperature (25°C ± 2°C), and with a 14:10 h light:dark cycle of exposure to fluorescent light (32 Watt Ecolux, Gen­eral Electric, Fairfield, CT, USA) at a photosynthetic photon flux density of 110 pmol-photon m-2 s-1. Cultures were mixed by an orbital shaker at 200 rpm and continuously aerated with sterile, activated carbon filtered air at a flow rate of 200 ml min-1 using a mass flow controller (Cole-Parmer Instrument Company, IL, USA).

INTEGRATION OF EXPRESSION DATA

The expression levels of C. reinhardtii genes have been experimentally characterized under numerous conditions using high-throughput methods such as RNA-seq [[26,27], unpublished data (Castruita M., et al.)]. These expression data were compiled and analyzed to determine which genes are over — and under-expressed in each experimental condition. The expres­sion data was preprocessed to normalize the counts for uniquely mappable reads in any experiment. Genes exhibiting greater than a two-fold change in expression compared to average expression across all conditions with a Poisson cumulative p-value of less than 0.05 were considered differen­tially expressed. Using this data, C. reinhardtii genes were associated with conditions in which they were over — and under-expressed.

The compiled expression data was also analyzed to find functionally related genes based on their expression levels across the different experi­mental conditions [[26,27], unpublished data (Castruita M., et al.)]. Genes demonstrating low variance of expression across all samples were not con­sidered. This analysis was performed for three representations of the ex­pression data: absolute counts, log counts, and log ratios of expression. By this method, C. reinhardtii genes are each associated with 100 genes with the most similar expression patterns to determine potentially functionally related genes.

LIPID CONTENT IN MICROALGAE

Many microalgae are capable of accumulating a large amount of lipids in the cells [10]. On average, the lipid contents typically range from 10 to 30% of dry weight (Table 3). Depending on the specific algae species and their cultivation conditions, however, microalgal lipid production may range widely from 2 to 75% [2]. In some extreme cases, it can reach 70%-90% of dry weight [4,5]. For instance, the freshwater green alga Botryococcus braunii can produce oil (including hydrocarbons) up to 86% of its dry cell weight [44]. This species is being considered as a possible source for biodiesel production in the near future [4], but has the major disadvantage of slow growth rates and a low tolerance for contamination. As a result, lipid productivities (lipid production per area or volume) of other microalgae, such as Nannochloropsis, Chlorella, Tetraselmis and Pavlova sp. are typically much higher [39,45]. Lipid productivity can be dramatically increased by external application of stress factors and is con­sidered a survival strategy for microalgae under adverse conditions. Most notably these include nutrient deprivation, exposure to chemicals, changes in salinity, temperature, pH and/or irradiation [4,39,46]. The composition of fatty acids-containing lipids differs widely among species, but, as men­tioned above, generally includes structural unsaturated polar lipids, as well as neutral storage lipids, mostly in the form of TAG. Significant fatty acids used for biodiesel include saturated fatty acids and polyunsaturated fatty acids (PUFAs) containing 14-18 carbon molecules, such as C14:0, C16:0, C16:1, C18:0, C18:1, C18:2, C18:3 fatty acids [41]. According to European requirements for biodiesel standards, some fatty acids should be excluded because of undesirable properties. For instance, methyl lino — lenate and fatty acid methyl esters with more than four double bonds are limited to 12% due to oxidation properties [47].

Table 3. Examples of lipid contents in some microalgae species [4,48].

Species

Total lipids (% dry weight)

PUFA

(% total lipids)

PUFA

(% dry weight)

Isochrysis galbana

25.6

17

4.3

Nanaochloropsis sp.

5.6

2.8

0.2

Chaetoceros calcitrans

11.8

8.7

0.9

Tetreselmis suecica

2.5

20.9

0.2

Skeletonema costatum

9.7

5.1

0.5

Phaeodactylum tricornutum

30

Porphyridium cruentum

1.5

17.1

0.3

Crypthecodinium cohnii

20

Botryococcus braunii

25.0-75.0

Chlorella sp.

10.0-48.0

It is expected that microalgae that offer a multiple product portfolio as part of a biorefinery, will be most applicable to large-scale commercial cultivation. In a microalgae screening process, besides fatty acids with properties relevant for biodiesel production, some high value products such as protein-rich biomass, omega-3 fatty acids, sterols, antioxidants, vitamins and pigments should also be taken into account. In particular, omega-3 fatty acids from microalgae have received significant attention as a high-value add product, as the current sources of fish oil are unsustain­able due to depleting global fish stocks. A comparison of omega-3 fatty acid contents of different microalgae shows that these differ considerably between species (Table 4).