Как выбрать гостиницу для кошек
14 декабря, 2021
The transcripts annotated in+N and — N transcriptomes were first classified based on Gene Ontology (GO) terms. In the+N and — N datasets, respectively, 6,846 and 7,473 transcripts were classified into 306 and 218 broader GO term categories in accordance with the Gene Ontology Consortium [26]. An enrichment analysis of the broader GO terms was performed using the modified Fisher’s Exact test in Blast2GO to quantitatively compare the distribution of differentially enriched GO terms between the+N case and the entire data set (Figure 4A), and between the — N case and the entire data set (Figure 4B). The functional categories enriched under+N were distinctly different from those enriched under the — N condition. In the+N case (Figure 4A), functional categories linked to carbon fixation, photosynthesis, protein machinery, and cellular growth were highly enriched compared to the — N condition; reflecting the higher growth rate, higher cell mass, and increased chlorophyll content observed in+N. Under — N conditions, genes associated with carboxylic acid and lipid biosynthetic process, NADPH regeneration, the pentose-phosphate pathway, phospholipid metabolic process, and lipid transport demonstrated a greater enrichment of transcripts than the overall dataset (Figure 4B). These enriched GO terms directly correlated with the observed increase of lipid accumulation in — N cells. Other major categories identified as significantly expressed under the -N condition included the synthesis of value added products such as terpenoids, pigments, and vitamins as well as cellular response to nitrogen starvation, nitrate metabolic process, and nitrate assimilation (Figure 4B). Genes involved in the latter three functional categories were exclusively expressed in the nitrogen-limited cells.
FIGURE 4: Over representation analysis of selected significant GO terms. (A) contains results for+N versus the full dataset and (B) contains results for — N versus the full dataset. |
The GC depth distribution is better in Ion Torrent from Figure 1. In Ion Torrent, the sequencing depth is similar while the GC content is from 63% to 73%. However in HiSeq 2000, the average sequencing depth is 4x when the GC content is 60%, while it is 3x with 70% GC content.
Ion Torrent has already released Ion 314 and 316 and planned to launch Ion 318 chips in late 2011. The chips are different in the number of wells resulting in higher production within the same sequencing time. The Ion 318 chip enables the production of >1 Gb data in 2 hours. Read length is expected to increase to >400 bp in 2012.
The nitrate concentration of culture media was determined daily by passage through a 0.2 pm pore-size filter and analysis on an ion chromatograph equipped with conductivity detection [50]. Microalgae growth was monitored daily by measuring the optical density of the cultures at 730 nm (OD730) using a spectrophotometer (HP 8453, Hewlett Packard, Palo Alto, CA, USA). Biomass samples for analysis of cellular constituents (starch, proteins, chlorophyll and lipids), and extraction of total RNA were harvested on day-11 by centrifugation at 10,000 g for 5 min at 4°C. Cell pellets were snap-frozen in liquid nitrogen and immediately transferred to -80°C until further analysis. The dry cell weight (DCW) of cultures was determined by filtering an aliquot of cultures on pre-weighed 0.45 pm pore size filters and drying the filters at 90°C until constant weight was reached. For analysis of starch content, 109 cells ml-1 were suspended in deionized water in 2 ml screw-cap tubes containing 0.3 g of 0.5 mm glass beads, and disrupted by two cycles of bead-beating at 4800 oscillations per minute for 2 min, followed by three freeze/thaw cycles. The suspension was then incubated in a boiling water bath for 3 min and autoclaved for 1 hour at 121°C to convert starch granules into a colloidal solution. After samples were cooled to 60°C, cell debris was removed by centrifugation at 4,000 g for 5 min. The concentration of starch in the supernatant was measured enzymatically using the Sigma Starch Assay Kit (amylase/amy — loglucosidase method, Sigma-Aldrich, Saint Louis, MO, USA) according to the manufacturer’s instruction. Chlorophyll a and b were measured by the N, N’-dimethylformamide method and calculated from spectrophoto — metric adsorption measurement at 603, 647, and 664 nm, as previously reported [51,52]. The total protein content of cells was determined with minor modifications to the original Bradford method [53] as described in [54]. Starch, chlorophyll, and protein measurements were performed in at least triplicates, and averages and standard deviations are reported as a percent of DCW.
The total lipid content of the cells was determined using a modified Bligh and Dyer method utilizing 2:1 chloroform:methanol [55]. To determine the profile of fatty acids, lipid samples were transesterified [56] and the resulting fatty acid methyl esters (FAME) were analyzed using a liquid chromatography-mass spectrometer (Varian 500-MS, 212-LC pumps, Agilent Technologies, Santa Clara, CA, USA) equipped with a Waters normal phase, Atlantis® HILIC silica column (2.1 x 150 mm, 3 pm pore size) (Waters, Milford, MA, USA), and atmospheric pressure chemical ionization [56]. Identification was based upon the retention time and the mass to charge ratio of standard FAME mixtures. The sum of FAME was used as a proxy for TAG content [22].
Due to the existence of several protein identifier types (FM3.1, FM4, Au5, Au10.2), different identifiers are associated with an individual protein within the Chlamydomonas genome. In order to extend annotations from one identifier type to another, matching protein identifiers are deduced by sequence similarity filtering for mutual best hits between identifiers using BLAST. Matching identifiers with 100% sequence coverage are kept, and the rest of the mutual best hits are filtered to include only those proteins with matches with at least 75% coverage. Potential ambiguities involving proteins similar to multiple other proteins are resolved by considering only the reciprocal best hit from the BLAST query in the opposite direction. The information derived by this analysis is used to convert gene identifiers between different types, which allows the Algal Annotation Tool to work with multiple protein identifier types.
High lipid productivity is not the only factor that should be considered early during strain selection. Outdoor cultivation should determine whether the selected microalgae are robust enough to withstand variable local climatic conditions and whether they can dominate a culture. This is particularly important for open pond systems where other algae strains, grazers or viruses may easily contaminate the culture. For this purpose, many phycologists recommend the use of local dominant species, even if their lipid productivity may not be as high as other species [43].
TABLE 4: Examples of potential microalgae species for omega-3 production [48].
|
Harvesting capability is another important feature of microalgae with biodiesel potential. Harvesting or dewatering can be best achieved through settling, flocculation or froth flotation [49,50]. For example, many microalgae settle under adverse conditions, and this can be tested under small scale conditions [51]. Lipid extraction efficiency from microalgae is dependent on residual water content after drying and in particular the structure of their cell wall. For example, Nannochloropsis sp. is regarded a highly productive microalga with strong potential for large-scale biodiesel production [43], but ideally requires pretreatment to open up the highly rigid cell walls for higher lipid extraction efficiency.
9.3 CONCLUSIONS
Development of biodiesel production from microalgae presents an important move to address the limitations posed by current first generation biodiesel crops. Microalgae, once developed for commercial biodiesel production, may offer many economical and environmental advantages. Current biodiesel production from microalgae is in the research phase, but is being developed to commercial scale in many countries. Finding promising microalgae for commercial cultivation is multi-facetted and challenging because particular microalgae strains have different requirements in terms of nutrients intake, environmental and culturing conditions and lipid extraction technology. However, diversity of lipid-producing microalgae species is one of the major advantages of this group of organisms that is likely to lead to selection of suitable algae crops to achieve algal biodiesel production in different regions. A combination of conventional and modern techniques is likely the most efficient route from isolation to large-scale cultivation (Figure 2). Careful initial analyses and far-sighted selection of microalgae with a view towards downstream processing and large-scale production with potential value-add products, is an important prerequisite to domesticate and develop algae crops for biodiesel production.
The majority of genes governing fatty acid biosynthesis were identified as being overexpressed in nitrogen limited cells as shown in the global metabolic pathway level and fatty acid biosynthesis module. The fold-change and abundances of identified transcripts for the components of fatty acid biosynthesis at the gene level are presented in Figure 5A. The first step in fatty acid biosynthesis is the transduction of acetyl-CoA into malonyl — CoA by addition of carbon dioxide. This reaction is the first committing step in the pathway and catalyzed by Acetyl-CoA Carboxylase (ACCase). While the gene encoding ACCase was repressed under the — N condition, the biotin-containing subunit of ACCase, biotin carboxylase (BC), was significantly up-regulated in response to nitrogen starvation. The BC catalyzes the ATP-dependent carboxylation of the biotin subunit and is part of the heteromeric ACCase that is present in the plastid—the site of de novo fatty acid biosynthesis [27]. To proceed with fatty acid biosynthesis, malonyl-CoA is transferred to an acyl-carrier protein (ACP), by the action of malonyl-CoA ACP transacylase (MAT). This step is followed by a round of condensation, reduction, dehydration, and again reduction reactions catalyzed by beta-ketoacyl-ACP synthase (KAS), beta-ketoac — yl-ACP reductase (KAR), beta-hydroxyacyl-ACP dehydrase (HAD), and enoyl-ACP reductase (EAR), respectively. The expression of genes coding for MAT, KAS, HAD, and EAR were up-regulated, whereas the KAR encoding gene was repressed in — N cells. The synthesis ceases after six or seven cycles when the number of carbon atoms reaches sixteen (C16:0- [ACP]) or eighteen (C18:0-[ACP]). ACP residues are then cleaved off by thioesterases oleoyl-ACP hydrolase (OAH) and Acyl-ACP thioesterase A (FatA) generating the end products of fatty acid synthesis (i. e. palmitic (C16:0) and stearic (C18:0) acids). Genes coding for these thioesterases, i. e. FatA and OAH, were overexpressed in -N cells. The up-regulation of these thioesterase encoding genes, as previously reported in E. coli and the microalga P. tricornutum, is associated with reducing the feedback inhibition that partially controls the production rate of fatty acid biosynthesis
Fen» sod |
F**yece in’, |
‘глгі-Хіа — Ooleocr*t-IACJ»| |
Ohft*ra i pnosptim -—- |оіцдау»ц |
РПОЦЯШМК ВСЮ — ГОвСАМСМ* |
oi. eo* |
irtm НІГ EnoH-COA
З Kmcacyt CoA
I Qfib leveT]
FIGURE 5: Differential expression of genes involved in (A) the fatty acid biosynthesis; (B) triacylglycerol biosynthesis; (C)P-oxidation; and (D) starch biosynthesis. Pathway were reconstructed based on the de novo assembly and quantitative annotation of the N. oleoabundans transcriptome. (A) Enzymes include: ACC, acetyl-CoA carboxylase (EC:
6.4.1.2) ; MAT, malonyl-CoA ACP transacylase (EC: 2.3.1.39); KAS, beta-ketoacyl-ACP synthase (KAS I, EC: 2.3.1.41; KASII, EC: 2.3.1.179; KAS III, EC: 2.3.1.180); KAR, beta-ketoacyl-ACP reductase (EC: 1.1.1.100); HAD, beta-hydroxyacyl-ACP dehydrase (EC: 4.2.1.-); EAR, enoyl-ACP reductase (EC: 1.3.1.9); AAD, acyl-ACP desaturase (EC:
1.14.19.2) ; OAH, oleoyl-ACP hydrolase (EC: 3.1.2.14); FatA, Acyl-ACP thioesterase A (EC: 3.1.2.-); A12D, A12(ro6)-desaturase (EC: 1.4.19.6); A15D, A15(ro3)-desaturase (EC: 1.4.19.-); (B) Enzymes include: GK, glycerol kinase (EC: 2.7.1.30); GPAT, glycerol-3- phosphate O-acyltransferase (EC: 2.3.1.15); AGPAT, 1-acyl-sn-glycerol-3-phosphate O-acyltransferase (EC:2.3.1.51); PP, phosphatidate phosphatase (EC: 3.1.3.4); DGAT, diacylglycerol O-acyltransferase (EC: 2.3.1.20); and PDAT, phopholipid:diacy glycerol acyltransferase (EC 2.3.1.158); (C) Enzymes include: ACS, acyl-CoA synthetase (EC: 6.2.1.3); ACOX1, acyl-CoA oxidase (EC: 1.3.3.6); ECH, enoly-CoA hydratase (EC: 4.2.1.17); HADH, 3-hydroxyacyl-CoA dehydrogenase (EC: 1.1.1.35); ACAT, acetyl-CoA C-acetyltransferase (EC: 2.3.1.16, 2.3.1.9); (D) Enzymes include: PGM, phosphoglucomutase (EC: 5.4.2.2); AGPase, ADP-glucose pyrophosphorylase (EC:
2.7.7.27); SS, starch synthase (EC: 2.4.1.21); BE, a-1,4-glucan branching enzyme (EC: 2.4.1.18); and HXK, hexokinase (2.7.1.1). Starch catabolism enzymes include: a-AMY, a-amylase (EC: 3.2.1.1); O1,6G, oligo-1,6-glucosidase (EC: 3.2.1.10); P-AMY, P-amylase (EC: 3.2.1.2); and SPase, starch phosphorylase (EC: 2.4.1.1). Ethanol fermentation via pyruvate enzymes include: PDC, pyruvate decarboxylase (EC: 4.1.1.1); and ADH, alcohol dehydrogenase (EC: 1.1.1.1). Enzymes aceE, pyruvate dehydrogenase E1 component (EC 1.2.4.1); aceF, pyruvate dehydrogenase E2 component (EC: 2.3.1.12); and pdhD, dihydrolipoamide dehydrogenase (EC 1.8.1.4), transforms pyruvate into acetyl-CoA. Key enzymes are shown with an asterisk (*) next to the enzyme abbreviations, and dashed arrows denote reaction(s) for which the enzymes are not shown. All presented fold changes are statistically significant, q value < 0.05.
[7,8], and results in the overproduction of fatty acids [9]. It has also been suggested that an increase in FatA gene expression and the associated acyl-ACP hydrolysis may aid in increased fatty acid transport from the chloroplast to the endoplasmic reticulum site where TAG assembly occurs [10,28]. Finally, for supplying reducing equivalents via NADPH to power fatty acid biosynthesis, genes encoding for the pentose phosphate pathway were strongly up-regulated in the — N condition (Table 2).
The altered expression of genes associated with the generation of double bonds in fatty acids reflects the observed increase in the proportion of unsaturated of fatty acids (Figure 1D), and the enrichment of C18:1 during nitrogen limitations. The acyl-ACP desaturase (AAD), which introduces a one double bond to C16:0/C18:0, and delta-15 desaturase, which converts C18:2 to C18:3, were significantly up-regulated in the — N case, whereas the delta-12 desaturase catalyzing the formation of C18:2 from C18:1was repressed during nitrogen limitation.
Under nitrogen limitations, 10 of the 13 genes associated with fatty acid degradation (a and P-oxidation pathways for saturated and unsaturated acids) were significantly repressed. Figure 5C demonstrates the typical P-oxidation pathway for saturated fatty acids, while Table 3 displays expression levels for additional peroxisomal genes associated with fatty acid oxidation, but not shown in Figure 5C. Before undergoing oxidative degradation, fatty acids are activated through esterification to Coenzyme A. The activation reaction, is catalyzed by acyl-CoA synthetase (ACSL), which was up-regulated in — N cells. The acyl-CoA enters the P-oxidation pathway and undergoes four enzymatic reactions in multiple rounds. The first three steps of the pathway; oxidation, hydration and again oxidation of acyl-CoA are catalyzed by acyl-CoA oxidase (ACOX1), enoly-CoA hydratase (ECH), and hydroxyacyl-CoA dehydrogenase (HADH), respectively. In the last step of the pathway, acetyl-CoA acetyltransferase (ACAT) catalyzes the cleavage of one acetyl-CoA, yielding a fatty acyl-CoA that is 2 carbons shorter than the original acyl-CoA. The cycle continues until all the carbons are released as acetyl-CoA. The expression level of ECH and HADH were unchanged and genes encoding for enzymes ACOX1 and ACAT catalyzing the first and last reactions in the cycle were identified as significantly repressed in — N cells.
Table 2. N. oleoabundans genes involved in the pentose phosphate pathway
Pentose phosphate pathway Log2FC
Phosphogluconate dehydrogenase (decarboxylating) (PGD, EC: 1.1.1.44) -1.13
Glucose-6-phosphate dehydrogenase (G6PD, EC: 1.1.1.49) -1.41
Transketolase (tktA, EC: 2.2.1.1) 2.55
Transaldolase (talA, EC: 2.2.1.2) -0.66
6-phosphofructokinase (PFK, EC: 2.7.1.11) -0.45
Gluconokinase (gntK, EC: 2.7.1.12) 0.10
Ribokinase (rbsK, EC: 2.7.1.15) 0.11
Ribose-phosphate diphosphokinase (PRPS, EC: 2.7.6.1) -0.10
Gluconolactonase (GNL, EC: 3.1.1.17) -0.67
6-phosphogluconolactonase (PGLS, EC: 3.1.1.31) 0.07
Fructose-bisphosphatase (FBP, EC: 3.1.3.11) -0.24
Fructose-bisphosphate aldolase (fbaB, EC: 4.1.2.13) 0.17
Ribulose-phosphate 3-epimerase (RPE, EC: 5.1.3.1) -0.11
Ribose-5-phosphate isomerase (rpiA, EC: 5.3.1.6) -0.34
Glucose-6-phosphate isomerase (GPI, EC: 5.3.1.9) -1.21
Phosphoglucomutase (pgm, EC: 5.4.2.2) -0.83
Negative Log2FC values represent up-regulation under nitrogen limitation. All presented fold changes are statistically significant, q value < 0.05.
TABLE 3: N. oleoabundans genes involved in catabolic pathways related to peroxisomal fatty acid oxidation, lysosomal lipases, and the regulation of autophagy
Enzyme encoding gene Log2FC
Peroxisome
a-oxidation
2-hydroxyacyl-coa lyase 1 (HACL1, EC: 4.1.-.-) 0.35
Unsaturated fatty acid p-oxidation
Peroxisomal 2,4-dienoyl-coa reductase (DECR2, EC: 1.3.1.34) 0.21
A(3,5)-A(2,4)-dienoyl-coa isomerase (ECH1, EC: 5.3.3.-) -0.27
ATP-binding cassette, subfamily D (ALD), member 1 (ABCD1) 0.25
Long-chain acyl-coa synthetase (ACSL, EC: 6.2.1.3) 0.25
Other oxidation
Peroxisomal 3,2-trans-enoyl-coa isomerase (PECI, EC: 5.3.3.8) 0.59
Carnitine O-acetyltransferase (CRAT, EC: 2.3.1.7) 0.30
NAD+diphosphatase (NUDT12, EC: 3.6.1.22) 0.47
Glycerolipid metabolism
Triacylglycerol lipase (EC: 3.1.1.3) 0.33
Acylglycerol lipase (MGLL, EC: 3.1.1.23) -0.13
Glycerophospholipid metabolism
Phospholipase A1 (plda, EC: 3.1.1.32) -1.26
Phospholipase A2 (PLA2G, EC: 3.1.1.4) -0.31
Phospholipase C (plcc, EC: 3.1.4.3) -0.10
Lysosome Lipases
Lysosomal acid lipase (LIPA, EC: 3.1.1.13) -0.48
Lysophospholipase III (LYPLA3, EC: 3.1.1.5) 0.20
Regulation of autophagy
Unc51-like kinase (ATG1, EC: 2.7.11.1) -0.53
5′-AMP-activated protein kinase, catalytic alpha subunit (snrk1, PRKAA) -0.05
Vacuolar protein 8 (VAC8) 0.13
Beclin 1 (BECN1) -0.59
TABLE 3: cont.
Enzyme encoding gene Log2FC
Phosphatidylinositol 3-kinase (VPS34, EC: 2.7.1.137) -1.26
Phosphoinositide-3-kinase, regulatory subunit 4, p150 (VPS15, EC: 2.7.11.1) 0.11
Autophagy-related protein 3 (ATG3) 0.11
Autophagy-related protein 4 (ATG4) -0.16
Autophagy-related protein 5 (ATG5) -0.27
Autophagy-related protein 7 (ATG7) 0.17
Autophagy-related protein 8 (ATG8) -0.50
Autophagy-related protein 12 (ATG12) -0.58
Negative log2 fold change (Log2FC) values represent up-regulation under nitrogen limitation. All presented fold changes are statistically significant, q value < 0.05.
MiSeq which still uses SBS technology was launched by Illumina. It integrates the functions of cluster generation, SBS, and data analysis in a single instrument and can go from sample to answer (analyzed data) within a single day (as few as 8 hours). The Nextera, TruSeq, and Illumina’s reversible terminator-based sequencing by synthesis chemistry was used in this innovative engineering. The highest integrity data and broader range of application, including amplicon sequencing, clone checking, ChIP-Seq, and small genome sequencing, are the outstanding parts of MiSeq. It is also flexible to perform single 36 bp reads (120 MB output) up to 2 * 150 paired-end reads (1-1.5 GB output) in MiSeq. Due to its significant improvement in read length, the resulting data performs better in contig assembly compared with HiSeq (data not shown). The related sequencing result of MiSeq is shown in Table 3. We also compared PGM with MiSeq in Table 4.
TABLE 3: MiSeq 150PE data.
|
To control for cell synchronization, cells for the+N and — N conditions were harvested at the same time of day. Total RNA was extracted and purified separately from each of the two nitrogen replete and the two nitrogen limited cultures using the RNeasy Lipid Tissue Mini Kit (Qiagen, Valencia, CA,
USA). The quality of purified RNA was determined on an Agilent 2100 bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Isolation of mRNA from total RNA was carried out using two rounds of hybridization to Dynal oligo(dT) magnetic beads (Invitrogen, Carlsbad, CA, USA). Aliquots from mRNA samples were used for construction of the cDNA libraries using the mRNA-Seq Kit supplied by Illumina (Illumina, Inc., San Diego, CA, USA). Briefly, the mRNA was fragmented in the presence of divalent cations at 94°C, and subsequently converted into double stranded cDNA following the first — and second-strand cDNA synthesis using random hexamer primers. After polishing the ends of the cDNA using T4 DNA polymerase and Kle — now DNA polymerase for 30 min at 20°C, a single adenine base was added to the 3’ ends of cDNA molecules. Illumina mRNA-Seq Kit specific adaptors were then ligated to cDNA 3’ ends. Next, the cDNA was PCR-amplified for 15 cycles, amplicons were purified (QIAquick PCR purification kit, Qia — gen Inc., Valencia CA, USA), and the size and concentration of the cDNA libraries were determined on an Agilent 2100 bioanalyzer. Each of the four cDNA libraries (two nitrogen deplete and two nitrogen replete) was layered on a separate Illumina flow cell and sequenced at the Yale University Center for Genome Analysis using Illumina HiSeq 100 bp single-end sequencing. An additional lane was dedicated to sequencing PhiX control libraries to provide internal calibration and to optimize base calling. The sequence data produced in this study can be accessed at NCBI’s Sequence Read Achieve with the accession number SRA048723.
The web interface of the Algal Functional Annotation Tool consists of a set of portals that give access to the different types of analyses available. Results are shown within expandable/collapsible HTML tables that display annotation information along with the statistical results of the analysis. When expanded, the results table shows which gene identifiers contain a specific annotation along with further information regarding matching gene identifiers and BLAST E-values. Updates to the Algal Functional Annotation Tool are semi-automated using a set of Perl scripts that parse and process updated flat files from the various integrated annotation databases at regular intervals. Currently, functional data from the primary annotation databases is set to be updated every 4 months.
LIN LIU, YINHU LI, SILIANG LI, NI HU, YIMIN HE, RAY PONG, DANNI LIN, LIHUA LU, and MAGGIE LAW
10.1 INTRODUCTION
(Deoxyribonucleic acid) DNA was demonstrated as the genetic material by Oswald Theodore Avery in 1944. Its double helical strand structure composed of four bases was determined by James D. Watson and Francis Crick in 1953, leading to the central dogma of molecular biology. In most cases, genomic DNA defined the species and individuals, which makes the DNA sequence fundamental to the research on the structures and functions of cells and the decoding of life mysteries [1]. DNA sequencing technologies could help biologists and health care providers in a broad range of applications such as molecular cloning, breeding, finding pathogenic genes, and comparative and evolution studies. DNA sequencing technologies ideally should be fast, accurate, easy-to-operate, and cheap. In the past thirty years, DNA sequencing technologies and applications have undergone tremendous development and act as the engine of the genome era which is characterized by vast amount of genome data and subsequently broad range of research areas and multiple applications. It is necessary to look back on the history of sequencing technology development to review the NGS systems (454, GA/HiSeq, and SOLiD), to compare their advantages and disadvantages, to discuss the various applications, and to evaluate the recently introduced PGM (personal genome machines) and third-genera
tion sequencing technologies and applications. All of these aspects will be described in this paper. Most data and conclusions are from independent users who have extensive first-hand experience in these typical NGS systems in BGI (Beijing Genomics Institute).
Before talking about the NGS systems, we would like to review the history of DNA sequencing briefly. In 1977, Frederick Sanger developed DNA sequencing technology which was based on chain-termination method (also known as Sanger sequencing), and Walter Gilbert developed another sequencing technology based on chemical modification of DNA and subsequent cleavage at specific bases. Because of its high efficiency and low radioactivity, Sanger sequencing was adopted as the primary technology in the “first generation” of laboratory and commercial sequencing applications [2]. At that time, DNA sequencing was laborious and radioactive materials were required. After years of improvement, Applied Biosystems introduced the first automatic sequencing machine (namely AB370) in 1987, adopting capillary electrophoresis which made the sequencing faster and more accurate. AB370 could detect 96 bases one time, 500 K bases a day, and the read length could reach 600 bases. The current model AB3730xl can output 2.88 M bases per day and read length could reach 900 bases since 1995. Emerged in 1998, the automatic sequencing instruments and associated software using the capillary sequencing machines and Sanger sequencing technology became the main tools for the completion of human genome project in 2001 [3]. This project greatly stimulated the development of powerful novel sequencing instrument to increase speed and accuracy, while simultaneously reducing cost and manpower. Not only this, X-prize also accelerated the development of next-generation sequencing (NGS) [4]. The NGS technologies are different from the Sanger method in aspects of massively parallel analysis, high throughput, and reduced cost. Although NGS makes genome sequences handy, the followed data analysis and biological explanations are still the bottle-neck in understanding genomes.
Following the human genome project, 454 was launched by 454 in 2005, and Solexa released Genome Analyzer the next year, followed by (Sequencing by Oligo Ligation Detection) SOLiD provided from Agen — court, which are three most typical massively parallel sequencing systems in the next-generation sequencing (NGS) that shared good performance on throughput, accuracy, and cost compared with Sanger sequencing (shown in Table 1(a)). These founder companies were then purchased by other companies: in 2006 Agencourt was purchased by Applied Biosystems, and in 2007, 454 was purchased by Roche, while Solexa was purchased by Illumina. After years of evolution, these three systems exhibit better performance and their own advantages in terms of read length, accuracy, applications, consumables, man power requirement and informatics infrastructure, and so forth. The comparison of these three systems will be focused and discussed in the later part of this paper (also see Tables 1(a), 1(b), and 1(c)).
TABLE 1: (a) Advantage and mechanism of sequencers. (b) Components and cost of sequencers. (c) Application of sequencers. (A)
|
TABLE 1: Cont. (B)
|
(1) All the data is taken from daily average performance runs in BGI. The average daily sequence data output is about 8 Tb in BGI when about 80% sequencers (mainly HiSeq 2000) are running.
(2) The reagent cost of 454 GS FLX Titanium is calculated based on the sequencing of 400 bp; the reagent cost of HiSeq 2000 is calculated based on the sequencing of200 bp; the reagent cost of SOLiDv4 is calculated based on the sequencing of 85 bp.
(3) HiSeq 2000 is more flexible in sequencing types like 50SE, 50PE, or 101PE.
(4) SOLiD has high accuracy especially when coverage is more than 30x, so it is widely used in detecting variations in resequencing, targeted resequencing, and transcriptome sequencing. Lanes can be independently run to reduce cost.