INTEGRATION OF MULTIPLE ANNOTATION DATABASES

The Algal Functional Annotation Tool integrates annotation data from the biological knowledge bases listed in Table 1. Publically available flat files containing annotation data were downloaded and parsed for each individual resource. Chlamydomonas reinhardtii proteins were assigned KEGG pathway annotations by means of sequence similarity to proteins within the KEGG genes database [1]. MetaCyc [2], Reactome [28], and Panther [30] pathway annotations were assigned to C. reinhardtii proteins by sequence similarity to subsets of UniProt IDs annotated in each corresponding database. In all

cases, sequence similarity was determined by BLAST. BLAST results were filtered to contain only best hits with an E-value < 1e-05.

TABLE 1: List of annotation resources integrated into the Algal Functional Annotation Tool

Resource

URL

Reference

KEGG

http://www. genome. jp/kegg/

[1]

MetaCyc

http ://www. metacyc. org/

[2]

Pfam

http://pfam. sanger. ac. uk/

[3]

Reactome

http://www. reactome. org/

[28]

Panther

http://www. pantherdb. org/pathway

[30]

Gene Ontology

http://www. geneontology. org/

[31]

InterPro

http: //www. eb i. ac. uk/interpro

[32]

MapMan Ontology

http://mapman. gabipd. org/

[33]

KOG

http://www. ncbi. nlm. nih. gov/COG/grace/shokog. cgi

[35]

Primary databases used to functionally annotate gene models and integrated into the Algal Functional Annotation Tool.

Gene Ontology (GO) [31] terms were downloaded from the Chlamydo — monas reinhardtii annotation provided by JGI. These GO terms were asso­ciated with their respective ancestors in the hierarchical ontology structure to include broader functional terms and provide a complete annotation set. Pfam domain annotations were assigned by direct search against protein domain signatures provided by Pfam. InterPro [32] and user-submitted manual annotations are based on those contained within JGI’s annotation of the C. reinhardtii genome [11]. These methods were applied to four types of gene identifiers commonly used for C. reinhardtii proteins: JGI protein identifiers (versions 3 and 4) and Augustus gene models (versions 5 and 10.2). In total, over 12,600 unique functional annotation terms were as­signed to 65,494 C. reinhardtii gene models spanning four different gene identifier types by these methods (Table 2). These assigned annotations may be explored for single genes using a built-in keyword search tool as well as an integrated annotation lookup tool which displays all annotations for a particular identifier.