8.5 Analysis methods

The previous section has provided a very basic introduction to the concept of molecular dynamics. However, up to this point we have only covered what is required to run a simulation and obtain the variation in atomic positions as a function of time. The sobering fact is that even with computers many orders of magnitude faster than today’s machines we will still not be able to achieve biological time scales for all but the very fastest enzymes. Thus, there is little to be gained scientifically in just running an MD simulation and waiting to see the system react. Firstly, the force field prevents covalent reactions from occurring and even if a method that supports bond breaking is used, such as QM/MM, the simulation time scales are too short for direct observation. Thus, it is necessary to make use of a large number of different techniques and analysis methods in order to uncover scientifically meaningful information. There are a huge number of different analysis techniques that can be used with MD methods and more are being devised every day. Here we list some of the more common and useful techniques for studying cellulose and cellulose hydrolysis but this represents a very small subset of what is possible.

Clustering. Clustering is a method of postprocessing the sampling results of an MD sim­ulation to quantify the number of more highly populated states the system visits, and can be used to reprocess a dynamics trajectory to map the essential transitions between states. In particular, when multiple structural states are observed in MD simulations, the identity of the states and the populations in the MD simulation are quantified in a cluster analysis. Once one knows the states, the sequence of transitions between states can be reconstructed from the same trajectory, revealing the nature and frequency of transitions and the paths through intermediate states if there are any. Cluster analysis can be used to analyze any trajectory, whether it is constrained in any way or not, so care must be taken to interpret the results within the framework of any biasing potentials. In cellulose modeling, this kind of analysis is essential for characterizing crystalline and amorphous states and the transitions between them. For example, it is possible to use cluster analysis to test the hypothesis that a system goes straight from state A to state B, the transition being observed in either experiment or simulation. The analysis may show instead, that there is a highly populated state C that the system always, in simulation, goes through on the way, and never goes straight from A to B or B to A.

Normal mode analysis. Normal mode analysis is a powerful tool for extracting the charac­teristic motions of a macromolecule or complex such as a cellulose fiber, cellulose/lignin complex or protein/cellulose system. With this method, the high-frequency modes can be separated out allowing the low frequency and larger displacement motions to be identi­fied. Often, those are the motions that define the behavior and the biological function of macromolecules. The method involves finding a structure of the complex which is at an energy minimum and subsequent diagonalization of the mass-weighted Hessian matrix from which the frequencies of motion and normal mode vector (eigenvectors) can be extracted. However, for larger molecules, the diagonalization of the matrix can quickly become too large a problem for most computers. There are other methods of treating larger molecules and systems of molecules that rely on simple assumptions. One method of treating the larger systems is to minimize the structure to a minimum energy state using the all-atom model, then switch a more approximate method such as the elastic network model, where selected atoms such as alpha carbons are all connected together by a series of harmonic springs, or the Rotations Translations of Blocks (RTB) model which uses an approximate diagonalization method by combining multiple residues into rigid blocks, and then apply the normal mode method (25). Quasiharmonics is a variation of the normal mode analysis in which the effective modes of vibration are calculated from fluctuations which are determined from a MD simulation. Since the fluctuations in an MD simulation contain anharmonic contributions, the quasiharmonic vibrational

modes may differ from the normal modes calculated from the energy minima. The nor­mal mode method has been extended to follow troughs in the potential energy surface, and give information about essential modes of extrema besides minima such as saddles, or transition states, and maxima.

Local water density. Local water density concerns are important when considering many biological models, especially in cases where complex structures compartmentalize regions of water away from bulk water. Considering the complexity of these situations, readers are encouraged to consult specific examples in the literature.