Classification Based on Sequence Features

For cellulases with well-characterized enzymatic properties, a classification scheme could be created that captures the reactivities described above. However, there are a large number of cellulase sequences that have been identified from DNA sequencing that have not been characterized biochemically. To include these new sequences in a classification scheme, and to represent the evolutionary relationships that connect them, a system has been developed [26] and refined [27, 28] that groups like sequences together. This information is available in a searchable database (CAZy, for carbohydrate-active enzyme database; http://www. cazy. org/) that includes structural, functional, and phylogenetic information [7]. At the time of this writing, CAZy currently subdivides the glycohydrolase sequences (EC 3.2.1.x) into 122 distinct sequence families, designated GH1-GH122 (and 956 additional nonclassified sequences).

The GH families that are tagged with relevant EC numbers for cellulose degrada­tion (EC 3.2.1.4, EC 3.2.1.21, and EC 3.2.1.91) are shown in Table 1. Although there are 21 families that contain one or more members from one of these three EC groups, the bulk of the cellulases are in GH families 5, 6, 7, 8, 9, 12, and 44, 45. GH families 5, 8, 9, 12, 44, and 45 are largely composed of endocellulases (at the exclu­sion of exocellulases), whereas GH families 6 and 7 include both endocellulases and exocellulases (Table 1); exocellulases in GH-6 act on the nonreducing end, whereas exocellulases in GH-7 act on the reducing end. There do not appear to be GH families composed exclusively of exocellulases. Nearly all b-glucosidases are in GH families 1 and 3.