Category Archives: ADVANCES IN

NITROGEN LIMITATION AND THE REGULATION OF GENES ASSOCIATED WITH TAG BIOSYNTHESIS

TAG is the major storage lipid in oleaginous microalgae and in this study nitrogen limitations induced a five-fold increase in its intracellular con­tent. Several genes involved in TAG biosynthesis displayed changes in their expression in response to nitrogen limitation. Biosynthesis of TAG in the chloroplast begins with two consecutive acyl transfers from acyl-CoA to positions 1 and 2 of glycerol-3-phosphate to form phosphatidic acid (PA), which is subsequently dephosphorylated to form 1,2-diacylglycerol (DAG) (Figure 5B). These reactions are catalyzed by enzymes glycer­ol-3-phosphate acyltransferase (GPAT), acyl-glycerol-3-phosphate acyl — transferase (AGPAT), and phosphatidate phosphatase (PP), respectively. The last step in the pathway, catalyzed by diacylglycerol acyltransferase (DGAT), involves the transfer of third acyl group to the DAG 3 position. This final reaction is the only dedicated step in TAG synthesis since the preceding intermediates (i. e. PA and DAG), are also substrates for the syn­thesis of membrane lipids. Our results indicated that the expression of genes encoding GPAT and AGPAT was up-regulated in response to nitro­gen starvation. However, the expression of gene encoding PP and DGAT remained relatively unchanged.

Though TAG biosynthesis in microalgae is believed to occur mainly through the glycerol pathway as described above, an alternative route known as the acyl CoA-independent mechanism has also been reported to take place in some plants and yeast [29]. In this mechanism, phospholipid is utilized as the acyl donor in the last step of TAG formation and the reac­tion is catalyzed by phospholipid:diacylglycerol acyltransferase (PDAT). We have recently found homologues of gene encoding for PDAT in the D. tertiolecta transcriptome, suggesting that the PDAT route could also play a role in microalgae TAG biosynthesis [14]. We did not however identify such homologues in the transcriptome of N. oleoabundans, making it un­clear if PDAT contributes to TAG biosynthesis in this organism.

COMPLETE GENOMICS

Complete genomics has its own sequencer based on Polonator G.007, which is ligation-based sequencer. The owner of Polonator G.007, Dover, collaborated with the Church Laboratory of Harvard Medical School, which is the same team as SOLiD system, and introduced this cheap open system. The Polonator could combine a high-performance instrument at very low price and the freely downloadable, open-source software and protocols in this sequencing system. The Polonator G.007 is ligation detection sequencing, which decodes the base by the single­base probe in nonanucleotides (nonamers), not by dual-base coding [19]. The fluorophore-tagged nonamers will be degenerated by selectively li­gate onto a series of anchor primers, whose four components are labeled with one of four fluorophores with the help of T4 DNA ligase, which correspond to the base type at the query position. In the ligation prog­ress, T4 DNA ligase is particularly sensitive to mismatches on 3′-side of the gap which is benefit to improve the accuracy of sequencing. After imaging, the Polonator chemically strips the array of annealed primer — fluorescent probe complex; the anchor primer is replaced and the new mixture are fluorescently tagged nonamers is introduced to sequence the adjacent base [20]. There are two updates compared with Polonator G.007, DNA nanoball (DNB) arrays, and combinatorial probe-anchor li­gation (cPAL). Compared with DNA cluster or microsphere, DNA nano­ball arrays obtain higher density of DNA cluster on the surface of a sili­con chip. As the seven 5-base segments are discontinuous, so the system of hybridization-ligation-detection cycle has higher fault-tolerant ability compared with SOLiD. Complete genomics claim to have 99.999% ac­curacy with 40x depth and could analyze SNP, indel, and CNV with price 5500$-9500$. But Illumina reported a better performance of HiSeq 2000 use only 30x data (Illumina Genome Network). Recently some re­searchers compared CG’s human genome sequencing data with Illumina system [21], and there are notable differences in detecting SNVs, indels, and system-specific detections in variants.

RNA-SEQ DATA ANALYSES

For quality control, raw sequencing reads were analyzed by FastQC tool (v0.10.0) [57] and low quality reads with a Phred score of less than 13 were removed using the SolexaQA package (v1.1) [58]. De novo tran — scriptome assembly was conducted using Velvet (v1.2.03) [23] and Oases (v0.2.06) [59] assembly algorithms with a multi-k hash length (i. e. 23, 33, 63, and 83 bp) based strategy to capture the most diverse assembly with improved specificity and sensitivity [59,60]. Final clustering of transcripts were obtained using the CD-HIT-EST package (v4.0-210-04-20) [61] and a non-redundant contigs set was generated.

For transcriptome annotation, the final set of contigs was searched against the NCBI’s non-redundant (nr) protein and plant refseq [24] da­tabases using the BLASTX algorithm [62] with a cut off E-value <10­

6. Contigs with significant matches were annotated using the Blast2GO platform [63]. Additional annotations were obtained through the Kyoto Encyclopedia of Genes and Genomes (KEGG) gene and protein fami­lies database through the KEGG Automatic Annotation Server (KAAS) (v1.6a) [64]. Associated Gene Ontology (GO) terms as well as enzyme commission (EC) numbers were retrieved and KEGG metabolic pathways were assigned [65].

To determine transcript abundances and differential expression, high quality reads from each experimental condition were individually mapped to the assembled transcriptome using Bowtie software (v0.12.7) [66]. Reads mapping to each contig were counted using SAMtools (v0.1.16) [67] and transcript abundances were calculated as reads per kilobase of exon model per million mapped reads (RPKM) [68]. All differential ex­pression analysis (fold changes) and related statistical computations were conducted by feeding non-normalized read counts into the DESeq pack­age (v1.5.1) [25]. Separate sequence read datasets were used as inputs into the DESeq package where size factors for each dataset were calculated and overall means and variances were determined based on a negative binomial distribution model. Fold change differences were considered significant when a q-value < 0.05 was achieved based on Benjamin and Hochberg’s false discovery rate (FDR) procedure [69], and only statisti­cally significant fold changes were used in the results analysis. In addition to individual enzyme encoding transcripts, contigs were pooled for each experimental condition and tested against the combined dataset to deter­mine the enriched GO terms using the Gossip package [70] integrated in the Blast2GO platform. Significantly enriched GO terms (q-value < 0.05) were determined for both+N and — N conditions.

Finally, reference guided mapping and differential expression was as also explored as a quantitation method. In this case, the Tophat package (v1.3.3) [71] was used to map high quality reads from each experimen­tal condition against the genomes of closely related green algae species Chlamydomonas reinhardtii (version 169) and Volvox carteri (version 150) available through Phytozome (v7.0) [72]. Differential gene expression analysis was quantified using the Cufflinks package (v1.2.1) [73].

UTILITY AND DISCUSSION

11.3.1 COMPREHENSIVE, INTEGRATED DATA-MINING ENVIRONMENT

The Algal Functional Annotation Tool is composed of three main com­ponents — functional term enrichment tests (which are separated by type), a batch gene identifier conversion tool, and a gene similarity search tool. A ‘Quick Start’ analysis is provided from the front page, featuring enrich­ment analysis using a sample set of databases containing the richest set of annotations (Figure 1). From any page, the sidebar provides access to the ‘Quick Start’ function of the tool.

Numerous other enrichment analyses — including enrichment using pathway, ontology, protein family, or differential expression data — are available within the Algal Functional Annotation Tool. Enrichment results

Подпись: 314 Advances in Biofuel Production: Algae and Aquatic Plants

Algal Functional Annotation Tool

A tool to visualize pathway maps and identify enriched biological terms using lists of gene IDs.

 

Welcome to the Algal Functional Annotation Tool, a bioinformatics resource to visualize pathway maps, identify enriched biological terms, or convert algal gene identifiers to elucidate biological function in silica

Quick start — search all databases

Enter a list of gene identifiers separated by commas, spaces, or lines. Alternatively, load sample data.

 

Pathway Maps Enriched Ontology Terms ProteinFjifnily Enrichment Gene ID Conversion Search Manual Annotations Expression Similarity Search About Example

 

Quick start; Gene identifier type: f Augustus v5.o pent Moritlt • j j [?] Advanced options (starch all databases)

‘• " 1 Augustus vS. O gene models may be numerical protein IDs [l. e. 502948} or alphanumeric model names (l. e. au5.g9Si_tl).

Pathway maps — visualize proteins of interest within KEGG maps

Dynamically visualize KEGG pathway maps with the provided proteins highlighted on the diagrams. Custom colored pathway maps can also be produced based on hits to individual biological pathways. Search pathway maos.

 

Gene ontology — search for enriched GO and MapMan terms

Search through databases containing biological processes, cellular components, and molecular functions to find enriched terms among a list of supplied proteins. Statistical calculations are performed on the results to show relevance. Search oene ontoloov.

 

Gene Identifier Type: I?]

[ Augustus v5.0 Gene Models? )

 

(quick start) Gene identifier conversion

Based on sequence similarity above a stringent threshold, find other identifiers that correspond to your proteins of interest to use in other databases. Convert gene identifiers.

Feedback

Manual annotation search

Search against user-submitted JGt manual annotations using a list of protein IDs. These protein IDs are automatically interconverted to find the correct protein ID with the manual annotation attached, without needing to browse all gene models at that locus. Search manual annotations.

 

FIGURE 1: Algal Functional Annotation Tool. The front page of the Algal Functional Annotation Tool. A ‘Quick Start’ analysis is available to test for enrichment using the richest annotation databases included in the tool. Other features accessible from the sidebar include more specific enrichment tests (based on biological pathways, ontology terms, or protein families), a gene identifier conversion tool, a manual annotation search tool, and an expression similarity search tool.

 

Подпись: Algal Functional Annotation Tool 315Pathway results — KEGG pathways [20]

[KEGG Pathway

Hits

Score

+ Sulfur metabolism

10

2.1335Є-17

_ ||JGI v3.0 Protein ID

□kegg id

[BLAST E-value

I

□ [196483

□<<01760

IK

I

□24268

□ [К0Ї739

IK

I

□ [196910

□ [K00958

lit

I

□ |206154

□k00392

□[c

I

] [205985

□ ІК00640

_____ lit

I

□ [169320

□ |К01738

|4e-178

I

□ [59800

□«00387

[ie-Tso

I

[[205485

□ [K00392

~l|2e-129

j |131444

□ |K00390

ПІ52Є-91

I

11184 419

□k00860

~||l -1e-69

J

[Represent "Sulfur metabolism" pathway uaina custom colors

Re-run functional enrichment analysis usina only the subset of proteins in this pathway

+ Cysteine and methionine metabolism

12

3.2806Є-17

+ Selenoamino acid metabolism

9

6.4241 e-16

■f Metabolic oathwavs

22

4.2704Є-06

+ Thiamine metabolism

3

0.00010125

FIGURE 2: Annotation Enrichment Results. Annotation enrichment results, sorted by ascending hypergeometric p-values, are shown in expandible/collapsible HTML tables such as the one shown. When expanded, the genes within the user-submitted list containing the expanded annotation are shown alongside additional statistical information. All results are downloadable as tab-delimited text files.

are always sorted by hypergeometric p-value and whenever possible con­tain links to the primary database’s entry for that annotation or to the pro­tein page of the gene identifier. The number of hits to a certain annotation term are also displayed alongside the p-value, and results may always be expanded to show additional details, such as the specific gene IDs within the list matching a certain annotation (Figure 2). These results are down­loadable as tab-delimited text files which may then be further analyzed or used in conjunction with other databases.

Dynamic visualization of KEGG pathway maps may be accessed from the results table for KEGG pathway enrichment by clicking on any pathway name. The proteins in the list that are members of the particu­lar biological pathway will appear in red, while those proteins existing in Chlamyomonas reinhardtii but not in the list appear in green (Figure 3). Alternatively, by expanding the pathway results and following the link at the bottom, the user may select a custom color scheme for visualizing the proteins on pathway maps. These custom color schemes may be designed on a gene-by-gene basis (choosing colors individually for genes) or in a group-by-group fashion (such as choosing a color for those proteins found within the organism but not in the gene list).

A list of genes may also be converted into a list of gene identifiers of another type. This feature allows easy transformation of gene IDs into corresponding models for use in other databases that may have additional annotation information. Additionally, the resulting list of gene identifiers may be used as a new starting point for enrichment analysis. Because of the different annotations associated with other gene identifier types (albeit of the same proteins), enrichment results using a converted set of gene IDs may yield new biological information.

The gene similarity search tool, the third component of the Algal Func­tional Annotation Tool, accepts single genes and returns functionally re­lated genes (based on gene expression across different experimental con­ditions) using user-specified distance metrics and thresholds. Presently, functionally related genes may be determined using correlation distance based on absolute counts, log counts, or log ratios of expression. The results page shows the original query gene at the top in gray and any resulting genes, sorted by similarity, are shown below the query gene (Figure 4). A colormap based on gene expression is generated for the different genes

across the conditions, and this colormap may be changed to display abso­lute expression, log expression, or log ratios of expression. The distance between any gene and the original query gene is displayed by hovering the mouse over the gene identifier of interest. Quantitative expression data (e. g. absolute counts) are provided for each experiment by hovering over the colormap. Whenever a description of a gene is available, this is dis­played when hovering over the gene identifier as well. Links to external databases (e. g. JGI, KEGG) providing more information about the genes are provided with the results.

ROCHE 454 SYSTEM

Roche 454 was the first commercially successful next generation system. This sequencer uses pyrosequencing technology [5]. Instead of using dide — oxynucleotides to terminate the chain amplification, pyrosequencing tech­nology relies on the detection of pyrophosphate released during nucleotide incorporation. The library DNAs with 454-specific adaptors are denatured into single strand and captured by amplification beads followed by emul­sion PCR [6]. Then on a picotiter plate, one of dNTP (dATP, dGTP, dCTP, dTTP) will complement to the bases of the template strand with the help of ATP sulfurylase, luciferase, luciferin, DNA polymerase, and adenosine 5′ phosphosulfate (APS) and release pyrophosphate (PPi) which equals the amount of incorporated nucleotide. The ATP transformed from PPi drives the luciferin into oxyluciferin and generates visible light [7]. At the same time, the unmatched bases are degraded by apyrase [8]. Then another dNTP is added into the reaction system and the pyrosequencing reaction is repeated.

The read length of Roche 454 was initially 100-150 bp in 2005, 200000+ reads, and could output 20 Mb per run [9, 10]. In 2008 454 GS FLX Titanium system was launched; through upgrading, its read length could reach 700 bp with accuracy 99.9% after filter and output 0.7 G data per run within 24 hours. In late 2009 Roche combined the GS Junior a bench top system into the 454 sequencing system which simplified the library preparation and data processing, and output was also upgraded to 14 G per run [11, 12]. The most outstanding advantage of Roche is its speed: it takes only 10 hours from sequencing start till completion. The read length is also a distinguished character compared with other NGS systems (described in the later part of this paper). But the high cost of reagents remains a challenge for Roche 454. It is about $12.56 * 10-6 per base (counting reagent use only). One of the shortcomings is that it has relatively high error rate in terms of poly-bases longer than 6 bp. But its library construction can be automated, and the emulsion PCR can be semiautomated which could reduce the manpower in a great extent. Other informatics infrastructure and sequencing advantages are listed and com­pared with HiSeq 2000 and SOLiD systems in Tables 1(a), 1(b), and 1(c).

DURING NITROGEN LIMITATION GENES ASSOCIATED WITH LIPASES AND REGULATING AUTOPHAGY ARE UP — REGULATED

All three phospholipases encoding genes identified were overexpressed in — N, while only one of the two TAG lipase genes found, acylglycerol lipase, was overexpressed (Table 3). The overexpression of lipase genes during nitrogen deprivation in C. reinhardtii has been thought to be associ­ated with the reconstruction of the cellular membrane for the purpose of channeling fatty acids to triacylglyceride production [10]. Triacylglycer — ide lipase, which is active in triacylglyceride hydrolysis was moderately repressed (log2 fold change = 0.33) under the — N scenario providing some support to the hypothesis that while membrane reconstruction was active, TAG degradation was reduced under nitrogen limitation (Table 3). Final­ly, genes associated with regulating autophagy and the 5’ AMP-activated protein kinase gene (SnRKl gene in plants) were overexpressed in the — N scenario (Table 3). SnRKl is a global regulator of carbon metabolism in plants [30,31], and its up regulation—along with that of autophagy associ­ated genes—further demonstrates the cells efforts to maintain homeostasis under — N conditions.

THE BEGINNING FOR ALGAL BIOFUELS

With predictions of an ever-increasing global population in the 1940s and 1950s, many researchers were considering how feeding such vast number of people would be possible. Traditionally livestock is fed using arable crops however researchers believed that algae could play a large part in providing a high protein food source for livestock and thus a method for feeding the global population [5]. At University California at Berkley, Pro­fessor Oswald began designing pond systems to cultivate freshwater algae on a large scale. The idea was to design a low impact system (i. e., low energy requirements and environmental impact) which provided condi­tions allowing high productivity of the cultivated algae. Oswald’s work also focussed upon combining algal cultivation with wastewater treatment providing a co-benefit [4]. The algae could therefore provide a means of improving the water quality of raw or partially treated effluent as well as providing livestock feed.

The biomass produced from the cultivation process was not restricted to livestock feed and studies were performed assessing the amount of biogas the algal biomass was capable of providing [2]. Algae was deemed a po­tentially valuable substrate for biogas production and various strains have been tested for their suitability up to the present day [6-8]. Further investi­gations led to algal biomass being assessed for alternative fuel types. Due to the high oil content of many algae species [9-14] biodiesel was consid­ered a valuable fuel which could be extracted and processed from algal biomass. The concept of producing biodiesel from microalgae was de­veloped considerably by the US Department of Energy’s Aquatic Species Program: Biodiesel from Algae [15]. The program ran from 1978 to 1996 and was focused upon producing biodiesel from microalgae fed with CO2 from flue gases. The program was born out of a requirement for energy security as the US relied heavily upon gasoline for transport fuel, disrup­tion to supplies could have significant repercussions to the economy. The program provided excellent contributions to the area of algal cultivation for biofuel but when funds were diverted to alternative fuel research the program was phased out in 1996 [15].

Recently the interest in biofuels from algae has dramatically increased as a result of increased fossil fuel prices and the need to find an alternative energy source due to the threat of climate change. Areas of studies include optimising biofuel yields, methods of reducing energy consumption, in­vestigating alternative products and assessing environmental impacts.

SUSTAINABILITY OF BIOENERGY: AGRICULTURAL AND PROCESSING INPUTS

To guarantee sustainable agriculture and processing of energy crops, we included a sustainable framework for the inputs required for them.

The yield projections for energy crops in this study are based on rain — fed agricultural systems where nutrients are added to the land.

3.3.2.1 AGRICULTURAL WATER USE

Regarding agricultural water use, this means that no irrigation is used for the energy crops in this study. Energy crop yields are scaled in accor­dance with the land’s suitability for rain-fed agriculture to reflect this. This means that most of the yields in this work are at around 50-70% of the maximum yield currently obtained in high input agricultural systems.

Table 3 gives ranges for these yields, as they differ per region. The number given in Table 3 is in primary biomass yield of the main product.

TABLE 3: Yields of energy crops used in this study in primary biomass yield of the main product.

Crop type

Range of yields across the regions (GJ ha-1)

Comments

Oils + fats

25-35 (-0.5-1 tonne ha-1 of oil)

Equates to -22-31 GJ ha-1 of transport fuel

Number includes only primary oil yields; agricultural and fuel processing residues are included elsewhere

Marker crops: rapeseed, soybeans and oil palm

Sugar + starch

62-121 (-4-7 tonne ha-1 of starch or sugar)

Equates to -49-95 GJ ha-1 of transport fuel

Number includes only primary starch/sugar yields; agricultural and fuel processing resi­dues are included elsewhere

Marker crops: sugar cane and maize

Highest yields in South America due to suit­ability for sugar cane

(Ligno) cellulosic

160-230 (-8-12 tonne ha-1

Equates to -61-88 GJ ha-1 of transport fuel

crops

of dry matter)

Number includes all primary biomass yields; fuel processing residues are included elsewhere

ELECTROMAGNETIC EXPERIMENTS

Three primary classes of experiments of electromagnetic influence (Fig­ure 1) can be distinguished viz.:

1. Predominantly magnetic fields: Near-field regime (Permanent, slowly changing, and pulsed fields from magnetic coils)

2. Predominantly electric fields: Near-field regime (Permanent or slowly changing)

3. Fields with both electric and magnetic components, with ratios be­tween 0.1 and 10: Far-field regime (typical EMF oscillation fre­quency is 100 kHz or more)

4. Fields from (I, II, or III) with unique spatial and/or temporal topol­ogy

Group I is represented relatively larger, mostly because of simplicity of experimental setup and extended penetration depth of magnetic field inside the water containing systems (Figure 4). The generated fields are either static magnetic fields or oscillating magnetic fields created by either permanent magnets or electromagnets, like Helmholtz and Solenoid coils. The biological experiments generally use a standard bipolar configuration with a N/S magnetic or +/- electric field for stimulation.

Group II is most often used in electroporation where strong pulsed electric fields (or PEF’s) are used for reversible membrane permeabiliza — tion to induce the uptake or release of some cell ingredients or foreign molecules. Group III is electromagnetic energy that propagates as a wave at higher frequencies and is considered as the far-field regime via an an­tenna, magnetron, or klystron. This classification encompasses non-ioniz­ing radiowaves and microwaves, as well as optical and ionizing radiations such as IR, visible, UV, X-ray and Gamma radiation.

The following section on the effects of electromagnetic fields has been organized by the type of the EM treatment and further categorized on the basis of growth and physiological processes that have been studied within each treatment group.

RESINS WITH HIGHER ALGAL BINDING CAPACITY FOR DIRECT TRANSESTERIFICATION

While these studies were initiated and largely carried through using Am — berlite, it became apparent through efforts to develop better resins that the binding capacity of Amberlite for algae is generally quite low [25]. It is particularly low for KAS603, which gave the highest yields of FAME rela­tive to its dried weight. A comparison of Amberlite to other functionalized resins we have generated shows that EGDMA-IM-DEG [Figure 4(a)] and DVB-DMA [Figure 4(b)] resins showed equilibrium binding of 2.6 and 3.4 times more KAS603 than Amberlite, respectively [Figure 4(c)]. Cur­rently we have obtained binding capacities up to 150 mg algae per g resin. While these resins can be eluted cleanly with the sulfuric acid/methanol reagent so as to generate FAME [Figure 4(d)], they were not designed for this purpose and are potentially susceptible to attack by the strong acid. Future resin designs will need to address the need for chemical stability under harsh conditions.