INTEGRATION OF EXPRESSION DATA

The expression levels of C. reinhardtii genes have been experimentally characterized under numerous conditions using high-throughput methods such as RNA-seq [[26,27], unpublished data (Castruita M., et al.)]. These expression data were compiled and analyzed to determine which genes are over — and under-expressed in each experimental condition. The expres­sion data was preprocessed to normalize the counts for uniquely mappable reads in any experiment. Genes exhibiting greater than a two-fold change in expression compared to average expression across all conditions with a Poisson cumulative p-value of less than 0.05 were considered differen­tially expressed. Using this data, C. reinhardtii genes were associated with conditions in which they were over — and under-expressed.

The compiled expression data was also analyzed to find functionally related genes based on their expression levels across the different experi­mental conditions [[26,27], unpublished data (Castruita M., et al.)]. Genes demonstrating low variance of expression across all samples were not con­sidered. This analysis was performed for three representations of the ex­pression data: absolute counts, log counts, and log ratios of expression. By this method, C. reinhardtii genes are each associated with 100 genes with the most similar expression patterns to determine potentially functionally related genes.