Model Evaluation and Model Error

Statistical models are usually evaluated with reference to their accuracy, which is defined as the deviation of an estimated quantity from a true value. Accordingly, accuracy is a compound function consisting of the precision (repeatability of estimates or variation around a true value) and of the bias (directed deviation from a true value). An accepted measure of accuracy is the mean squared error (MSE) as used in Eqs. 3.9, 3.10, 3.11 and 3.12 (Hellman and Fowler 1999).

MSE (x) = var (x) + bias(x)2 (3.9)

Подпись: MSE (x) = var (x) + image041 Подпись: (3.10)

Here MSE (x) is the mean squared error of a model (accuracy), calculated according to Eq. 3.10.

and,

xi = Model estimate of biomass

№ = True mean value

n = sample size

and Eqs. 3.11 and 3.12 apply

Подпись: 2n

image044 image045 image046

i=1

In addition, to model evaluation the error propagation of the full upscaling process should be addressed to provide error budgets for the biomass estimation. The fact that biomass modelling typically involves a combination of different
sampling and modelling steps complicates error budgeting. Detailed background information on error budgeting in biomass estimation can be found in Cunia (1987, 1990), Wharton and Cunia (1987), Yang and Cunia (1989), and van Laar and Akqa (2007). Cunia (1990, p. 169) points to three general sources of error: “There is first the sampling error: the same sampling procedure applied repeatedly to the same forest population leads generally to selection of different sample units and, thus, to different estimates. And then there is the measurement error when the same sample units (trees or plots) measured by different people lead to different recorded values and, thus, to different estimates. Finally, the third error component is that of the statistical model used in deriving estimates; same inventory data analyzed and interpreted by different statisticians may lead to different estimates”. Cunia (1990) adds the application error as a fourth source of error. This is based on the fact that biomass models are usually parameterised from data of a different population than the population where they are applied for estimation. However, the sampling and measurement error are usually assessed in combination (Cunia 1990). An assumption, which is often made in error budgeting of biomass models is that the models are unbiased, which is strictly speaking not always true but it reduces the focus to the variance as the only source of error (Cunia 1990). The following section will be based on this assumption as well. It is a frequently made mistake to exclude the upscaling from the sample to the tree from the error budget, which can only be done if the trees were fully harvested and their dry weight calculated without harvesting losses and sampling or ratio-modelling steps involved, which is rarely the case in most bioenergy studies. In all other cases error budgets for both upscaling steps (‘sample to tree’ and ‘tree to stand’) have to be determined.

Error propagation equations can be found for example in Ku (1966) or Bevington and Robinson (1992). Also van Laar and Akqa (2007, p. 264) stipulate functions for variance estimation for different forms of linear equations and point out the most frequently used ones in biomass upscaling to be the additive and the multiplicative combination (Eqs. 3.13 and 3.14).

Equation Variance

z = x + y s2 = s2 + + 2 sxy (3.13)

z = x • y sz = y2s| + x2s-2 + 2 • x • y • sw (3.14)

Here sxy is the covariance of x and y, and s| and s2 are the variances.

Equation 3.13 is used in biomass estimation if, for example, crown and stem biomass are added, while Eq. 3.14 is applied in allometric biomass models or for multiplying plot biomass estimates with plot areas (van Laar and Akqa 2007).

Equation 3.14 can also be rewritten as relative error (Chave et al. 2004), as indicated in Eq. 3.15. The last covariance term may be omitted if x and y are independent (van Laar and Akqa 2007, p. 264).

4 = 4 (1 ln(f) у 4 (1 ln(f) у MW/if (315)

z2 x2 1 ln (x) y2 1 ln (y) xy1 ln (y) 1 ln (x) ‘

image047

The term 1 ln(/)/1 ln (x) is the partial derivate of lnf) with respect to ln (x) and is added to increase the accuracy with aid of a Taylor series (Chave et al. 2004). The error sagb2 for a typical allometric model that predicts aboveground tree biomass from diameter at breast height and tree height f(D, H) = aD«HB is calculated as indicated in Eq. 3.16; here noted without the partial derivatives to simplify of notation (vide Chave et al. 2004).

Finally, the errors of the different upscaling steps have to be combined. This results in a series of different terms that are combined according to variances determined by Formulae 3.12 and 3.13. It is essential that all the different upscaling steps involved are taken into account. The combination of errors for this example involves the upscaling from sample to the tree and the upscaling from tree to stand. The latter is basically also the standard procedure for error quantification in volume based forest inventories and is expressed by Eq. 3.16. The major error sources including their combination rules are illustrated in Fig. 3.7.

The final combination of the upscaling steps is attributed to the fact that biomass estimation should be viewed as a two-stage sampling process (Cunia 1990). Further upscaling steps might be added, such as from stands to strata according to the same combination rules under the inclusion of forest area information as can be received by remote sensing (Chap. 2). It should once again be emphasised here that it is critical to include the upscaling from the samples to the tree in this context. Ignoring this upscaling step, as frequently found in literature, is only warranted if the complete drymass of the tree was measured as indicated before. A good example to underpin this statement is the regression of the foliage dry mass from branch diameter as an essential part of the upscaling. The degree of determination of only R2 = 0.68 (Fig. 3.5) shows a considerable error potential and is only one of several error sources in the first upscaling step. Thus ignoring the first upscaling step would lead to a crude underestimation of the error.