UTILITY AND DISCUSSION

11.3.1 COMPREHENSIVE, INTEGRATED DATA-MINING ENVIRONMENT

The Algal Functional Annotation Tool is composed of three main com­ponents — functional term enrichment tests (which are separated by type), a batch gene identifier conversion tool, and a gene similarity search tool. A ‘Quick Start’ analysis is provided from the front page, featuring enrich­ment analysis using a sample set of databases containing the richest set of annotations (Figure 1). From any page, the sidebar provides access to the ‘Quick Start’ function of the tool.

Numerous other enrichment analyses — including enrichment using pathway, ontology, protein family, or differential expression data — are available within the Algal Functional Annotation Tool. Enrichment results

Подпись: 314 Advances in Biofuel Production: Algae and Aquatic Plants

Algal Functional Annotation Tool

A tool to visualize pathway maps and identify enriched biological terms using lists of gene IDs.

 

Welcome to the Algal Functional Annotation Tool, a bioinformatics resource to visualize pathway maps, identify enriched biological terms, or convert algal gene identifiers to elucidate biological function in silica

Quick start — search all databases

Enter a list of gene identifiers separated by commas, spaces, or lines. Alternatively, load sample data.

 

Pathway Maps Enriched Ontology Terms ProteinFjifnily Enrichment Gene ID Conversion Search Manual Annotations Expression Similarity Search About Example

 

Quick start; Gene identifier type: f Augustus v5.o pent Moritlt • j j [?] Advanced options (starch all databases)

‘• " 1 Augustus vS. O gene models may be numerical protein IDs [l. e. 502948} or alphanumeric model names (l. e. au5.g9Si_tl).

Pathway maps — visualize proteins of interest within KEGG maps

Dynamically visualize KEGG pathway maps with the provided proteins highlighted on the diagrams. Custom colored pathway maps can also be produced based on hits to individual biological pathways. Search pathway maos.

 

Gene ontology — search for enriched GO and MapMan terms

Search through databases containing biological processes, cellular components, and molecular functions to find enriched terms among a list of supplied proteins. Statistical calculations are performed on the results to show relevance. Search oene ontoloov.

 

Gene Identifier Type: I?]

[ Augustus v5.0 Gene Models? )

 

(quick start) Gene identifier conversion

Based on sequence similarity above a stringent threshold, find other identifiers that correspond to your proteins of interest to use in other databases. Convert gene identifiers.

Feedback

Manual annotation search

Search against user-submitted JGt manual annotations using a list of protein IDs. These protein IDs are automatically interconverted to find the correct protein ID with the manual annotation attached, without needing to browse all gene models at that locus. Search manual annotations.

 

FIGURE 1: Algal Functional Annotation Tool. The front page of the Algal Functional Annotation Tool. A ‘Quick Start’ analysis is available to test for enrichment using the richest annotation databases included in the tool. Other features accessible from the sidebar include more specific enrichment tests (based on biological pathways, ontology terms, or protein families), a gene identifier conversion tool, a manual annotation search tool, and an expression similarity search tool.

 

Подпись: Algal Functional Annotation Tool 315Pathway results — KEGG pathways [20]

[KEGG Pathway

Hits

Score

+ Sulfur metabolism

10

2.1335Є-17

_ ||JGI v3.0 Protein ID

□kegg id

[BLAST E-value

I

□ [196483

□<<01760

IK

I

□24268

□ [К0Ї739

IK

I

□ [196910

□ [K00958

lit

I

□ |206154

□k00392

□[c

I

] [205985

□ ІК00640

_____ lit

I

□ [169320

□ |К01738

|4e-178

I

□ [59800

□«00387

[ie-Tso

I

[[205485

□ [K00392

~l|2e-129

j |131444

□ |K00390

ПІ52Є-91

I

11184 419

□k00860

~||l -1e-69

J

[Represent "Sulfur metabolism" pathway uaina custom colors

Re-run functional enrichment analysis usina only the subset of proteins in this pathway

+ Cysteine and methionine metabolism

12

3.2806Є-17

+ Selenoamino acid metabolism

9

6.4241 e-16

■f Metabolic oathwavs

22

4.2704Є-06

+ Thiamine metabolism

3

0.00010125

FIGURE 2: Annotation Enrichment Results. Annotation enrichment results, sorted by ascending hypergeometric p-values, are shown in expandible/collapsible HTML tables such as the one shown. When expanded, the genes within the user-submitted list containing the expanded annotation are shown alongside additional statistical information. All results are downloadable as tab-delimited text files.

are always sorted by hypergeometric p-value and whenever possible con­tain links to the primary database’s entry for that annotation or to the pro­tein page of the gene identifier. The number of hits to a certain annotation term are also displayed alongside the p-value, and results may always be expanded to show additional details, such as the specific gene IDs within the list matching a certain annotation (Figure 2). These results are down­loadable as tab-delimited text files which may then be further analyzed or used in conjunction with other databases.

Dynamic visualization of KEGG pathway maps may be accessed from the results table for KEGG pathway enrichment by clicking on any pathway name. The proteins in the list that are members of the particu­lar biological pathway will appear in red, while those proteins existing in Chlamyomonas reinhardtii but not in the list appear in green (Figure 3). Alternatively, by expanding the pathway results and following the link at the bottom, the user may select a custom color scheme for visualizing the proteins on pathway maps. These custom color schemes may be designed on a gene-by-gene basis (choosing colors individually for genes) or in a group-by-group fashion (such as choosing a color for those proteins found within the organism but not in the gene list).

A list of genes may also be converted into a list of gene identifiers of another type. This feature allows easy transformation of gene IDs into corresponding models for use in other databases that may have additional annotation information. Additionally, the resulting list of gene identifiers may be used as a new starting point for enrichment analysis. Because of the different annotations associated with other gene identifier types (albeit of the same proteins), enrichment results using a converted set of gene IDs may yield new biological information.

The gene similarity search tool, the third component of the Algal Func­tional Annotation Tool, accepts single genes and returns functionally re­lated genes (based on gene expression across different experimental con­ditions) using user-specified distance metrics and thresholds. Presently, functionally related genes may be determined using correlation distance based on absolute counts, log counts, or log ratios of expression. The results page shows the original query gene at the top in gray and any resulting genes, sorted by similarity, are shown below the query gene (Figure 4). A colormap based on gene expression is generated for the different genes

across the conditions, and this colormap may be changed to display abso­lute expression, log expression, or log ratios of expression. The distance between any gene and the original query gene is displayed by hovering the mouse over the gene identifier of interest. Quantitative expression data (e. g. absolute counts) are provided for each experiment by hovering over the colormap. Whenever a description of a gene is available, this is dis­played when hovering over the gene identifier as well. Links to external databases (e. g. JGI, KEGG) providing more information about the genes are provided with the results.