Как выбрать гостиницу для кошек
14 декабря, 2021
The hypergeometric distribution is commonly used to determine the significance of functional term enrichment within a list of genes. In this test, the occurrence of a functional term within a gene list is compared to the background level of occurrence across all genes in the genome to determine the degree of enrichment. A p-value based on this test can be calculated from four parameters: (1) the number of genes within the list, (2) the frequency of a term within the gene list, (3) the total number of genes within the genome, and (4) the frequency of a term across all genes in the genome. This test effectively distinguishes truly overrepresented terms from those occurring at a high frequency across all genes in the genome and therefore within the gene list as well. The cumulative hypergeometric test assigns a p-value to each functional term associated with genes within a given list, and all functional terms are ranked by ascending p-value (i. e. by descending levels of enrichment). Huang et al. reviews the use of the hypergeometric test for functional term enrichment [34]. The Algal Functional Annotation Tool computes hypergeometric p-values using a Perl wrapper for the GNU Scientific Library cumulative hypergeometric function written in C to provide a quick and accurate implementation of this statistical test.
To track gene transcription in the oleaginous microalga N. oleoabundans, cells were first grown under+N and — N conditions as a method to produce differential cellular enrichments of TAGs. Cells were harvested after 11 days. This sampling time corresponded to below detection level concentrations for NO3- and a reduction in growth rate in the -N reactors (Figure 1A, B). The maximum growth rate for the — N cultures was 113 ± 4 (std. err.) mgl-1 day-1 and decreased to 34 ± 0.7 mgl-1 day-1 once nitrogen became limited in the reactor. Total lipids extracted under the+N and — N scenarios revealed a statistically significant increase (p < 0.05) from 22% DCW in+N to 36% in the — N condition (Figure 1C). Extracted lipids were transesterified and fatty acid methyl esters (FAMEs) (FAMEs assumed to
FIGURE 1: N. oleoabundans growth and lipid characteristics. (A) Growth curves under+N and — N conditions. Inset image represents the difference in culture appearance between the two growth condition; (B) Nitrate as N concentrations in the bioreactors during growth; (C) Cell weight enrichment of total lipids and fatty acid methyl esters (FAME, representative of TAGs) from cells harvested on day 11; and (D) Percentage distribution of FAME from cells harvested on day 11. All error bars represent one standard deviation. |
be equivalent to TAGs content [22]), were quantified. Compared to the+N condition, the FAME or TAG content per cell mass increased by five times in the — N case (p <0.05), demonstrating that the additional lipids produced during N limitations were mostly TAGs (Figure 1C). Estimates of total cell mass based on direct microscopic counts and DCW determinations revealed that the average mass of a cell in — N was 81% of that in+N, confirming that the change in TAG was independent of changes in DCW. FAME profiles are presented in Figure 1D, and show a 50% decrease in the proportion of unsaturated fatty acids (i. e. C16:2, C16:3, C16:4, C18:2, and C18:3) under nitrogen limitation. The most significant change was in the amount of oleic acid (C18:1), which increased over 5 times, while the quantity of a-linoleic acid (C18:3) decreased by 4.8-fold under — N conditions. This trend toward a greater proportion of C18:1 is consistent with prior investigations of the oleaginous microalgae N. oleoabundans and Chlorella vulgaris FAME contents under nitrogen limitations [13,22].
To aid in interpreting how photosynthetically fixed carbon was directed into major metabolic pathways, the chlorophyll, protein, and starch content of N. oleoabundans were also measured under the — N and+N scenarios (Table 1). Nitrogen deprivation lead to a reduction in nitrogen-containing chlorophyll content. This loss of chlorophyll is consistent with the light green color of chlorosis observed in the cultures under nitrogen limitation (Figure 1A inset). Also under nitrogen limitation, a decrease in cellular protein content and an increase in cellular starch content were observed. The observed changes in metabolite and biomolecule contents suggest the redirection of metabolism in N. oleoabundans during nitrogen limitation to reduce nitrogen-containing compounds (protein and chlorophyll) and favor the accumulation of nitrogen free storage molecules TAGs and starch.
TABLE 1: Culture density and cellular composition of major biomolecules of N. oleoabundans cells determined after 11 days of growth under nitrogen replete (+N) and nitrogen limited (-N) conditions
|
Ion Personal Genome Machine (PGM) and MiSeq were launched by Ion Torrent and Illumina. They are both small in size and feature fast turnover rates but limited data throughput. They are targeted to clinical applications and small labs.
10.5.1 ION PGM FROM ION TORRENT
Ion PGM was released by Ion Torrent at the end of 2010. PGM uses semiconductor sequencing technology. When a nucleotide is incorporated into the DNA molecules by the polymerase, a proton is released. By detecting the change in pH, PGM recognized whether the nucleotide is added or not. Each time the chip was flooded with one nucleotide after another, if it is not the correct nucleotide, no voltage will be found; if there is 2 nucleotides added, there is double voltage detected [15]. PGM is the first commercial sequencing machine that does not require fluorescence and camera scanning, resulting in higher speed, lower cost, and smaller instrument size. Currently, it enables 200 bp reads in 2 hours and the sample preparation time is less than 6 hours for 8 samples in parallel.
An exemplary application of the Ion Torrent PGM sequencer is the identification of microbial pathogens. In May and June of 2011, an ongoing outbreak of exceptionally virulent Shiga-toxin — (Stx) producing Escherichia coli O104:H4 centered in Germany [16, 17], there were more than 3000 people infected. The whole genome sequencing on Ion Torrent PGM sequencer and HiSeq 2000 helped the scientists to identify the type of E. coli which would directly apply the clue to find the antibiotic resistance. The strain appeared to be a hybrid of two E. coli strains—entero aggregative E. coli and entero hemorrhagic E. coli—which may help explain why it has been particularly pathogenic. From the sequencing result of E. coli TY2482 [18], PGM shows the potential of having a fast, but limited throughput sequencer when there is an outbreak of new disease.
In order to study the sequencing quality, mapping rate, and GC depth distribution of Ion Torrent and compare with HiSeq 2000, a high GC Rho — dobacter sample with high GC content (66%) and 4.2 Mb genome was sequenced in these two different sequencers (Table 2). In another experiment, E. coli K12 DH10B (NC_010473.1) with GC 50.78% was sequenced by Ion Torrent for analysis of quality value, read length, position accuracies, and GC distribution (Figure 1).
ft.
150 £
T3
г
ha
100 ^
a.
rj
All read length distribution, total read num з 577537 Mapped read length distribution Mapped read map length distribution
TABLE 2: Comparison in alignment between Ion Torrent and HiSeq 2000.
a: use TMAP to align; b: use SOAP2 to align. |
In our study, several genes encoding enzymes involved in the intracellular breakdown of fatty acids and lipids are significantly repressed under — N (Table 3). Repressing p-oxidation is a clear strategy for maintaining a higher concentration of fatty acids within a cell. In contrast, most of the identified lipases (with the exception of triacylglycerol lipases) are overexpressed during nitrogen limitation. Upon closer examination, the up-regulated lipases are mostly phospholipases associated with hydrolyzing cell wall glycerophospholipids and phospholipids into free fatty acids, potentially for incorporation into TAGs. A known result of nitrogen limitation induced autophagy in C. reinhardtii is the degradation of the chlo — roplast phospholipid membrane [47,48]. Moreover, the overexpression of lipases during nitrogen limitation in C. reinhardtii has previously been hypothesized to be associated with the reconstruction of cell membranes [10]. In addition to phospholipases, we have identified an enriched number of transcripts for phospholipid metabolic processes and lipid transport in the — N case (Figure 4B). The up-regulation of genes encoding for enzymes that produce free fatty acids is also consistent with the fact that the PDAT enzyme associated with the acyl-CoA-independent mechanism of TAG synthesis (which utilizes phospholipids, rather than free fatty acids, as acyl donors) was not recovered in our assembled transcriptome.
12.3 CONCLUSIONS
Assembling the transcriptome and quantifying gene expression responses of Neochloris oleoabundans under nitrogen replete and nitrogen limited conditions enabled the exploration of a broad diversity of genes and pathways, many of which comprise the metabolic responses associated with lipid production and carbon partitioning. The high coverage of genes encoding for full central metabolic pathways demonstrates the completeness of the transcriptome assembly and the repeatability of gene expression data. Furthermore, the concordance of metabolite measurements and observed physiological responses with gene expression results lends strength to the quality of the assembly and our quantitative assessment. Our findings point to several molecular mechanisms that potentially drive the overproduction of TAG during nitrogen limitation. These include up-regulation of fatty acid and TAG biosynthesis associated genes, shuttling excess acetyl CoA to lipid production through the pyruvate dehydrogenase complex, the role of autophagy and lipases for supplying an additional pool of fatty acids for TAG synthesis, and up-regulation of the pentose phosphate pathway to produce NADPH to power lipid biosynthesis. These identified gene sequences and measured metabolic responses during excess TAG production can be leveraged in future metabolic engineering studies to improve TAG content and character in microalgae and ultimately contribute to the production of a sustainable liquid fuel.
12.4 METHODS
Individual pathway maps from KEGG provide information on protein localization within the cell, compartmentalization into different cellular components, or of reactions within a larger metabolic process. Visualization of proteins from gene lists onto pathway maps is useful for their interpretation. The Algal Functional Annotation Tool utilizes the publicly available KEGG application programming interface (API) for pathway highlighting. The information linking C. reinhardtii proteins to identifiers within the KEGG database is used to determine the subset of KEGG IDs within the supplied gene list associated with a particular pathway. The Algal Functional Annotation Tool also deduces which proteins within the pathway are located within the genome of C. reinhardtii but not found in the gene list and sends the corresponding identifiers to the KEGG API to be highlighted in a different background color. This API interface is implemented using the SOAP architecture for web applications.
In order to produce statistically reliable and comparable RNA-Seq data, cDNA library construction and sequencing was performed for each of the duplicate+N reactors, and each of the duplicate — N reactors. Over 88 million raw sequencing reads were generated and subjected to quality score and length based trimming; resulting in a high quality (HQ) read data set of 87.09 million sequences (average phred score of 35) with an average read length of 77 bp. By incorporating a multiple k-mer based de novo transcriptome assembly strategy (k-mers 23, 33, 63, and 83) [23], HQ reads were assembled into 56,550 transcripts with an average length of 1,459 bp and a read coverage of 1,444* (Figure 2C). Generated transcripts were subjected to searches against the National Center for Biotechnology Information’s (NCBI) nonredundant and plant refseq databases [24], and the majority of transcripts showed significant mat-ches to other closely related green microalgae species (Figure 2A, B) including C. variabilis (~85% of all transcripts), C. reinhardtii (~2.6%), and V. carteri (~3.4%) (Figure 2A). With additional annotations by using KEGG services and Gene Ontology (GO), a total of 23,520 transcripts were associated with at least one GO term, and 4,667 transcripts were assigned with enzyme commission (EC) numbers. Overall, 14,957 transcripts had KO identifiers and were annotated as putative genes and protein families. This assembly provided a reliable, well-annotated transcriptome for downstream RNA — Seq data analysis.
Following the transcriptome assembly and annotation, HQ reads obtained from each experimental condition were individually mapped to the generated assembly in order to determine the transcript abundances as RPKM values. To determine fold change differences among+N and — N transcripts, non-normalized read counts were fed into the DESeq package (v1.5.1) and variance and mean dependencies were accounted for [25]. Based on the negative binomial distribution model used in DESeq package, 25,896 transcripts out of the total 56,550 non-redundant transcripts were up-regulated under the — N condition. Plotting transcript fold changes levels shows a high correlation among the biologically replicated sequencing runs as indicated by Euclidean distances (Figure 2D). Overall, 15,987 transcripts had significant differential regulation (q < 0.05) Figure 3A. A complete table of fold changes with significance level for all genes assessed is presented in Additional file 3.
We further investigated the alignment of HQ reads to the reference genomes of C. reinhardtii and V. carteri in order to improve and extend our transcriptomic analysis to the detection of splicing events and alternative isoform formation (Figure 3B, C). Although the majority of annotated or-
A
Transcript l«r>glh (bp)
FIGURE 2: De novo assembly and mapping results. (A, B) Top-hit species distribution for BLASTX matches for the N. oleoabundcms transcriptome; (C) Cumulative transcript length frequency distribution of the N. oleoabundcms transcriptome assembly; (D) Heat map demonstrating the top 100 most differentially expressed genes in the biological replicates of+N and-N conditions.
FIGURE 3: (A) MvA plot contrasting gene expression levels between the-N and+N scenarios based on reads mapped to the/V. |
o/eoflij/wfifawxtranscriptome. The x-axis represents the mean expression level at the gene scale, and the у-axis represents the log2 fold change from-N to+N. Negative fold changes indicate up-regulation of-N genes. Lighter gray dots are genes that are significant at a false discovery rate of 5%; (B) MvA plot for reads mapped to the C. reinhardtii genome; and (C) MvA plot for reads mapped to the V. carteri genome.
thologs were identified from these closely related microalgae species, very poor mappings (i. e. <5% of reads) were observed between the RNA-Seq data of N. oleoabundans and the genomes of C. reinhardtii and Volvox carteri. As a result, the number of transcripts annotated and evaluated for differential expression was suboptimal, and genomes from these most closely related organisms were not used for gene expression analysis.
The quality of Ion Torrent is more stable, while the quality of HiSeq 2000 decreases noticeably after 50 cycles, which may be caused by the decay of fluorescent signal with increasing the read length (shown in Figure 1).
10.5.1.1 MAPPING
The insert size of library of Rhodobacter was 350 bp, and 0.5 Gb data was obtained from HiSeq. The sequencing depth was over 100x, and the contig and scaffold N50 were 39530 bp and 194344 bp, respectively. Based on the assembly result, we used 33 Mb which is obtained from ion torrent with 314 chip to analyze the map rate. The alignment comparison is Table 2.
The map rate of Ion Torrent is higher than HiSeq 2000, but it is incomparable because of the different alignment methods used in different sequencers. Besides the significant difference on data including mismatch rate, insertion rate, and deletion rate, HiSeq 2000 and Ion Torrent were still incomparable because of the different sequencing principles. For example, the polynucleotide site could not be indentified easily in Ion Torrent. But it is shown that Ion Torrent has a stable quality along sequencing reads and a good performance on mismatch accuracies, but rather a bias in detection of indels. Different types of accuracy are analyzed and shown in Figure 1.
N. oleoabundans (UTEX # 1185) was obtained from the Culture Collection of Algae at the University of Texas (UTEX, Austin, TX, USA). Batch cultures were started by inoculation with 106 log growth phase cells into 1 liter glass flasks filled with 750 ml of Modified Bold-3 N medium [49] without soil extract. The concentration of nitrogen in the medium was adjusted to 50 mg as N l-1 (nitrogen replete; denoted as + N) and 10 mg as N l-1 (nitrogen limited; denoted as — N) using potassium nitrate (KNO3) as the sole source of nitrogen. These concentrations were chosen based on preliminary experiments that identified incubation times and nitrogen concentrations necessary to induce nitrogen depletion in the mid log-phase of the — N cultures and to ensure that the nitrogen-replete cultures never encountered nitrogen-limitation during the course of the experiment. For each nitrogen condition, cells were cultured in duplicate reactors. Reactors were operated at room temperature (25°C ± 2°C), and with a 14:10 h light:dark cycle of exposure to fluorescent light (32 Watt Ecolux, General Electric, Fairfield, CT, USA) at a photosynthetic photon flux density of 110 pmol-photon m-2 s-1. Cultures were mixed by an orbital shaker at 200 rpm and continuously aerated with sterile, activated carbon filtered air at a flow rate of 200 ml min-1 using a mass flow controller (Cole-Parmer Instrument Company, IL, USA).
The expression levels of C. reinhardtii genes have been experimentally characterized under numerous conditions using high-throughput methods such as RNA-seq [[26,27], unpublished data (Castruita M., et al.)]. These expression data were compiled and analyzed to determine which genes are over — and under-expressed in each experimental condition. The expression data was preprocessed to normalize the counts for uniquely mappable reads in any experiment. Genes exhibiting greater than a two-fold change in expression compared to average expression across all conditions with a Poisson cumulative p-value of less than 0.05 were considered differentially expressed. Using this data, C. reinhardtii genes were associated with conditions in which they were over — and under-expressed.
The compiled expression data was also analyzed to find functionally related genes based on their expression levels across the different experimental conditions [[26,27], unpublished data (Castruita M., et al.)]. Genes demonstrating low variance of expression across all samples were not considered. This analysis was performed for three representations of the expression data: absolute counts, log counts, and log ratios of expression. By this method, C. reinhardtii genes are each associated with 100 genes with the most similar expression patterns to determine potentially functionally related genes.
Many microalgae are capable of accumulating a large amount of lipids in the cells [10]. On average, the lipid contents typically range from 10 to 30% of dry weight (Table 3). Depending on the specific algae species and their cultivation conditions, however, microalgal lipid production may range widely from 2 to 75% [2]. In some extreme cases, it can reach 70%-90% of dry weight [4,5]. For instance, the freshwater green alga Botryococcus braunii can produce oil (including hydrocarbons) up to 86% of its dry cell weight [44]. This species is being considered as a possible source for biodiesel production in the near future [4], but has the major disadvantage of slow growth rates and a low tolerance for contamination. As a result, lipid productivities (lipid production per area or volume) of other microalgae, such as Nannochloropsis, Chlorella, Tetraselmis and Pavlova sp. are typically much higher [39,45]. Lipid productivity can be dramatically increased by external application of stress factors and is considered a survival strategy for microalgae under adverse conditions. Most notably these include nutrient deprivation, exposure to chemicals, changes in salinity, temperature, pH and/or irradiation [4,39,46]. The composition of fatty acids-containing lipids differs widely among species, but, as mentioned above, generally includes structural unsaturated polar lipids, as well as neutral storage lipids, mostly in the form of TAG. Significant fatty acids used for biodiesel include saturated fatty acids and polyunsaturated fatty acids (PUFAs) containing 14-18 carbon molecules, such as C14:0, C16:0, C16:1, C18:0, C18:1, C18:2, C18:3 fatty acids [41]. According to European requirements for biodiesel standards, some fatty acids should be excluded because of undesirable properties. For instance, methyl lino — lenate and fatty acid methyl esters with more than four double bonds are limited to 12% due to oxidation properties [47].
Table 3. Examples of lipid contents in some microalgae species [4,48].
|
It is expected that microalgae that offer a multiple product portfolio as part of a biorefinery, will be most applicable to large-scale commercial cultivation. In a microalgae screening process, besides fatty acids with properties relevant for biodiesel production, some high value products such as protein-rich biomass, omega-3 fatty acids, sterols, antioxidants, vitamins and pigments should also be taken into account. In particular, omega-3 fatty acids from microalgae have received significant attention as a high-value add product, as the current sources of fish oil are unsustainable due to depleting global fish stocks. A comparison of omega-3 fatty acid contents of different microalgae shows that these differ considerably between species (Table 4).