Modelling Example

In this section an example of tree biomass modelling is presented, introducing several key techniques. All models were parameterised using the freely available statistical software package R (R Development Core Team 2012).

The starting point for this modelling section is a set of data obtained from 99 Eucalyptus grandis trees from the Karkloof experiment at different ages, ranging

Подпись: Fig. 3.6 The effect of back-transformation bias correction on the estimation of total biomass. After transformation a nearly unbiased model was achieved (full line), while the uncorrected model showed a clear bias (dotted line) image039

from 0 to 11 years. The data is described in du Toit (2008). The biomass has been scaled up to tree level and four biomass fractions that should be modelled (stemwood, stem bark, branches including bark, and foliage). The objective is to create models for the different biomass fractions as well as for the total biomass under the constraint of additivity.

A method for forcing additivity was introduced, based on a separated estimation of total aboveground biomass of the tree and the compositional estimation of proportions for the biomass fractions. The total biomass was estimated with a traditional ln-transformed linear model to account for heteroskedasticity (Eq. 3.6). As independent variables DBH, height and the compound variable D2H were tested. The best model was selected based on the Akaike Information Criterion, which has theoretic foundations in information theory (Burnham and Anderson 2004). The AIC penalises the inclusion of additional variables and thus helps to keep the models parsimonious. The best model fit in this case was achieved by Eq. 3.6.

ln (ABMtotal) = a + b ln (D2H) + c ln(H) (3.6)

Here ABMtotal is the total above ground biomass (kg) of the tree, D is the DBH (cm) and H is the tree height (m). For back-transformation the variance of the residuals о2 was calculated, multiplied by 0.5 and added to Eq. 3.6 for back- transformation bias correction (Eq. 3.7) as proposed by Baskerville (1972).

ABMtotal = e(a+b ln(D2H)Cc ln(H)+^20.5) (3.7)

An almost unbiased model was achieved (Fig. 3.6).

The next step was to model the proportional distribution of the biomass fractions in a simultaneous approach based on the Aitchison-Simplex and an isometric log-ratio transformed (ILR) model (Aitchison 1982; van den Boogaart
and Tolosana-Delgado 2008). The model foundations for this approach have been laid down by Aitchison (1982, 1986) and the method was successfully applied for the estimation of compositions in Geosciences (van den Boogaart and Tolosana — Delgado 2008). The data are modelled as a closed composition with a relative geometry. This means that the individual components are forced to add up to 1 and their relative proportions are of interest rather than their absolute values. Similar to the logarithmic transformation the ILR-transformation takes care of the effect of heteroskedasticity. Expressed in simple words the compositional relative proportions are transformed into a euclidian orthogonal coordinate system. The dimensionality (D) is hereby reduced by one, because the compositions have to add up to 1. This means, that the last component in the euclidian space is not directly predicted but automatically derived in the back-transformation step. Classical multivariate analysis is applied in the euclidian space and then the results are back — transformed into the original coordinates to provide meaningful composition parts.

The ILR-transformation is based on a CLR-tansformation, and a multiplication with a triangular Helmert matrix (van den Boogaart and Tolosana-Delgado 2008). The matrix multiplication does the dimensional reduction from D to D-1. The triangular (D, D-1)-Helmert matrix can be derived from a normalised Helmert contrast matrix as shown in van den Boogaart et al. (2013). Further details on the theoretical framework are provided by Aitchison et al. (2002) and van den Boogaart and Tolosana-Delgado (2008). One convenient side-effect of this method is that the solution in the transformed space can be found by applying well known linear statistics. The “compositions” package of R was used (van den Boogaart and Tolosana-Delgado 2007; R Development Core Team 2012). The code for modelling compositions in R is available from the web page of this book.[2]

After ILR-transformation of the data combinations of the following independent variables were tested: DBH, H, D[3]H, ABMtotai (see Eq. 3.8).

The best model in the transformed space was achieved by a linear combination of DBH, height and total aboveground biomass (Eq. 3.8)

ilr (compositionABM/ = bBHD + cH + dABMtotal (3.8)

The regression parameter output is provided in Table 3.2.

Table 3.2 Regression output for the fit of Eq. 3.8 for the Eucalypt data set in R

DF

Pillai

Approx. F

num Df

den Df

Pr(>F)

(Intercept)

1

0.98300

1,792.48

3

93

<2.2e—16***

DBH

1

0.13955

5.03

3

93

0.002835**

H

1

0.65681

59.33

3

93

<2.2e—16***

Total

1

0.33864

15.87

3

93

2.059e—08***

Residuals 95

**(<0.01 significance level), ***(<0.001 significance level)

These are the parameters in the ILR-transformed space. Backtransformation is conveniently available in the statistical software package compositions. The model can be evaluated, based on the two different model parts separately and on the total model, where the compositions are multiplied with the total biomass estimation.