RNA-SEQ DATA ANALYSES

For quality control, raw sequencing reads were analyzed by FastQC tool (v0.10.0) [57] and low quality reads with a Phred score of less than 13 were removed using the SolexaQA package (v1.1) [58]. De novo tran — scriptome assembly was conducted using Velvet (v1.2.03) [23] and Oases (v0.2.06) [59] assembly algorithms with a multi-k hash length (i. e. 23, 33, 63, and 83 bp) based strategy to capture the most diverse assembly with improved specificity and sensitivity [59,60]. Final clustering of transcripts were obtained using the CD-HIT-EST package (v4.0-210-04-20) [61] and a non-redundant contigs set was generated.

For transcriptome annotation, the final set of contigs was searched against the NCBI’s non-redundant (nr) protein and plant refseq [24] da­tabases using the BLASTX algorithm [62] with a cut off E-value <10­

6. Contigs with significant matches were annotated using the Blast2GO platform [63]. Additional annotations were obtained through the Kyoto Encyclopedia of Genes and Genomes (KEGG) gene and protein fami­lies database through the KEGG Automatic Annotation Server (KAAS) (v1.6a) [64]. Associated Gene Ontology (GO) terms as well as enzyme commission (EC) numbers were retrieved and KEGG metabolic pathways were assigned [65].

To determine transcript abundances and differential expression, high quality reads from each experimental condition were individually mapped to the assembled transcriptome using Bowtie software (v0.12.7) [66]. Reads mapping to each contig were counted using SAMtools (v0.1.16) [67] and transcript abundances were calculated as reads per kilobase of exon model per million mapped reads (RPKM) [68]. All differential ex­pression analysis (fold changes) and related statistical computations were conducted by feeding non-normalized read counts into the DESeq pack­age (v1.5.1) [25]. Separate sequence read datasets were used as inputs into the DESeq package where size factors for each dataset were calculated and overall means and variances were determined based on a negative binomial distribution model. Fold change differences were considered significant when a q-value < 0.05 was achieved based on Benjamin and Hochberg’s false discovery rate (FDR) procedure [69], and only statisti­cally significant fold changes were used in the results analysis. In addition to individual enzyme encoding transcripts, contigs were pooled for each experimental condition and tested against the combined dataset to deter­mine the enriched GO terms using the Gossip package [70] integrated in the Blast2GO platform. Significantly enriched GO terms (q-value < 0.05) were determined for both+N and — N conditions.

Finally, reference guided mapping and differential expression was as also explored as a quantitation method. In this case, the Tophat package (v1.3.3) [71] was used to map high quality reads from each experimen­tal condition against the genomes of closely related green algae species Chlamydomonas reinhardtii (version 169) and Volvox carteri (version 150) available through Phytozome (v7.0) [72]. Differential gene expression analysis was quantified using the Cufflinks package (v1.2.1) [73].