FUNCTIONAL TERM ENRICHMENT TESTING

The hypergeometric distribution is commonly used to determine the sig­nificance of functional term enrichment within a list of genes. In this test, the occurrence of a functional term within a gene list is compared to the background level of occurrence across all genes in the genome to deter­mine the degree of enrichment. A p-value based on this test can be calculated from four parameters: (1) the number of genes within the list, (2) the frequency of a term within the gene list, (3) the total number of genes within the genome, and (4) the frequency of a term across all genes in the genome. This test effectively distinguishes truly overrepresented terms from those occurring at a high frequency across all genes in the genome and therefore within the gene list as well. The cumulative hyper­geometric test assigns a p-value to each functional term associated with genes within a given list, and all functional terms are ranked by ascending p-value (i. e. by descending levels of enrichment). Huang et al. reviews the use of the hypergeometric test for functional term enrichment [34]. The Algal Functional Annotation Tool computes hypergeometric p-values using a Perl wrapper for the GNU Scientific Library cumulative hypergeo­metric function written in C to provide a quick and accurate implementa­tion of this statistical test.