the marking for bucking under uncertainty in automatic harvesting of forests

E L S E V I E R Int. J. Production Economics 46~47 (1996) 373-385

international journal of

production economics

The marking for bucking under uncertainty in automatic harvesting of forests

E r k k i P. L i s k i * , T a p i o N u m m i

Universi~ o f Tampere, P.O. Box 607. SF-33101 Tampere, Finland

Abstract

The marking for bucking is the problem of converting a single tree stern into logs in such a way that the total stem value according to a given price list for logs is maximized. Proper marking for bucking is of crucial importance in harvesting of forests. The annual production of saw timber in Scandinavia is tens of millions m 3. Since, furthermore, sawn timber is by far the most valuable forest product in Scandinavia, any improvement in the marking for bucking procedure will yield a large profit. To solve the marking for bucking problem optimally one has to know the whole tree stem. However, it is not economically feasible to run the whole stem through the measuring device before cutting. Therefore, it is a normal situation under computer-based marking for bucking that the first cutting decisions are made before the whole stem is known.

In this paper the forest harvesting process will be considered under a general growth curve model useful especially for repeated-measures data. The main objective is to predict the unknown portion Y,/2) of the current stem, which will be estimated from the stem data Y] ,Y2 . . . . ,Yn-1 on the previously processed n - 1 stems and from the known diameter values Y,(1) on the current stem. Then a predictor ofy,(2), say Y,(2), jointly with the known part Y,tl) is used in marking for bucking. It turns out that this technique can improve radically the efficiency of harvesting, so that our results provide important knowledge for developing automatic bucking systems of modern harvesters.

Kevwords: EM algorithm; Maximum likelihood; Parsimonious covariance structures; Root mean square error

1. Introduction

W e in t roduce now the p red ic t ion p r o b l e m in the context of a u t o m a t i c harves t ing of forests. Let Yij deno te a d i ame te r measu remen t t aken on the ith stem at the po in t t j , where t j is the d is tance from the lower par t of the t runk. Suppose tha t we have prev ious ly recorded d a t a on n - 1 stems and

* Corresponding author•

measurement s Ynl , Y ,2 , ",. ,Yn .q -~ on the nth stem. Pred ic t ion is now requi red for the d iamete r values of the nth stem at the stem poin ts tq ,+~,

gq r+ 2 . . . . . t q . Pred ic t ing the future d iamete r values at the s tem poin ts tq r ~ , 1 , t q - r + 2 . . . . . [q

yields an es t imate for the stem curve d(t) , which descr ibes the decrease in stem d iame te r with the increas ing height of the stem.

Table I d isplays the s i tua t ion where da t a on n - 1 previous stems are recorded and measurements Ynl, ~'n2, " ' ' ,Yn,q r on the current stem are

0925-5273/96/$15.00 ~C Copyright 1996 Elsevier Science B,V. All rights reserved SSD! 0 9 2 5 - 5 2 7 3 ( 9 5 } 0 0 0 8 5 - 2

374 E.P. Liski, T, N u m m i / I n t . J. Produc t ion Economics 4 6 - 4 7 (1996) 3 7 3 - 3 8 5

Table 1 Diameter measurements of tree stems

Stems Distance from the thick end of a stem

t I t 2 . . . tq • tq-r+ 1 . . . tq

Past

1 Yll YI2 . . . Y l , q - • Yl,q •+1 . . . Ylq

2 Y:z Y22 . . . Y2,q- , Y 2 , ~ - , . 1 . . . Y2q

n - - 1 Yn- I ,2 Yn- l , 2 . . . Y n - l , q - . Y . - l , q - ~ , + l ".. Y . - a q

Current y.~ Yn2 . . . Yn.q-~ Yn.q-r+ l "" Y.q

observed. Prediction is required for future observations on the current stem at the points t , , q - , + l ,

tn, q- ,+ 2,..., tnq. Let us denote the measurements to be predicted as Y',(2) = (Y, ,q-r+ 1, Yn.q-r+ 2, . . . , Y,q).

According to prevailing harvesting technology in Scandinavia, the tree stems are cut into smaller logs on the spot. High class equipment for measuring tree stems in harvesters has recently been developed, making computer-based marking for bucking possible. In modern computerized harvesters tree stems are run in sequence through the measuring and cutting devices root end first, and at the same time the stem measurements are stored in computer (see Fig. 1). Since also the cutting device is under computer control, automatic marking for bucking and cutting of stems into logs is technically possible. Now t~ denotes the distance from the thick end of the stem to the current position of the measuring equipment. We may therefore assume that complete information is available about the part from the stump up to the position the measuring equipment has reached. Thus, if the measuring equipment is at the point t~, then the cutting device is at the point tj - c.

By log specification rules the log length must lie in a closed interval [/man, /max] and the top diameter must exceed a given limit do, where 0 < /min ~< /max and do > O. Usually the acceptable log length I E [lmi~, l~x] must also be a multiple of a certain minimum increment A (Typical values for spruce, for example, would be l~i, = 30 dm, /max = 60 dm, do = 16cm and A = 3 din). Note that the stem curve d(t) is a non-increasing function. Therefore, the maximum number of logs from tree is

Stem curve d(t)

Values to be predicted [

Cutting equlprmnt Stem ~ightt

Fig. 1. Marking for bucking under uncertainty.

N = [ t a / l m i n ] , where td is the greatest t such that d(t) ~ do. Clearly, the maximum length of the known part before the first cutting decision will be c + lmin- In our numerical experiments the value of c is 6 dm.

The aim of optimal bucking of tree stems into sawlogs is to maximize the price of a tree stem, which is the sum of single log prices. The price of a log is a function of the log length and the small end diameter. In practice price is given as a price table. There are, of course, other ways to quantify the contents of logs. Direct volumetric measurement expresses log contents as the volume of the maximal cylinder that can be put into the log. In general, we wish to maximize a non-negative bounded utility function

N

H ( x o , x l , . . . , x~) = ~ h(Xk -- Xk- 1, d(Xk)) k=t

under the constraints of log specifications. The function h(xk -- x k - 1, d(xk)) can be considered as the price of the kth log from a given stem, where Xk is the kth cutting point of a stem and Xk -- Xk- 1 is the length of the kth log. The maximization of the stem price H ( x o , x l . . . . , xN) can be solved by dy- namic programming, for example (see [1]). In practical situations the maximization depends on many side conditions. The actual price list for logs does not usually produce length and diameter combina- tions demanded by the local sawmill. Further, the operator should steer the bucking according to grade limits, crooked and damaged sections, etc.

E.P. Liski, 7". Nummi/Int. J. Production Economics 46~47 (1996) 373-385 375

To solve the maximization problem under incomplete stem information, we need a prediction for the unknown part of the stem under process. The prediction is based on the known part of the stem, and on the knowledge of trees already processed. The maximum length of the known part before the first cutting decision will be c + lmin.

After the first cut, the measuring equipment can be moved up the stem to the point whose distance from the first cutting point is again c + lmi,. If xl is the first cutting point, then in the next cutting we can put t I = X l , t 2 = X 1 + A, t 3 = x I + 2 A , e t c .

Thus from the statistical point of view every cutting is the first cutting, since the measurements before the cutting are ignored. The stem curve estimate is determined utilizing these new measurements, and the next cutting point is decided on the basis of this new estimate.

The problem of predicting future observations on a statistical unit given past measurements on the same and other similar units is often encountered in practical applications. In most typical applications t 1, t2, . . . ,tq are equally spaced time-points and y~j is the size of the ith statistical unit at the point t j, for example, the weight of an animal or the size of a firm at a given time. Past measurements at time points t~, t2 . . . . ,tq are available on, say, nl units, and for H 2 units one or more measurements are to be simultaneously predicted (n = nl + n2) . Such prediction problems are typical in various fields of statistical application such as biometrics, econo- metrics, medicine, agriculture and engineering. In beef-cattle raising, for instance, decisions on the economically best possible culling age of an animal (or a set of animals) can be based on growth curve predictions. Although in the present harvesting application r/1 = n - 1 and n2 = 1, our method of prediction can be straightforwardly extended for the case when n 2 is greater than 1.

2. The prediction model

Assume that

Yij : d( f j ) -]- eij ,

where E(ei j )= 0 and the mean stem curve d(t) is a smooth decreasing function of t. There have been

many attempts to model the stem curves in forest sciences (cf. e.g. [2]). The stem curve models are often formulated for relative diameters and distan- ces to eliminate the absolute variation caused by differences in tree size. However, models based on relative measurements cannot be directly adopted to tree harvesting situation. The implementation should also be simple and fast, because it is not considered economically feasible to stop the feeding apparatus while carrying out the computations. In this paper we consider only polynomial models which are simple and adequately describe stem curves for tree harvesting application. Then for the ith stem at the heigth point tj

E ( Y i j ) = flo 4- f i l t j + . . . -+- t ip 1/~ 1

We consider prediction under the general frame- work of the Potthoff and Roy model [3]:

Y - - X B T ' + E, (2.1)

where the rows of Y are assumed to be independent and normally distributed with covariance matrix Z, Xis an n x m design matrix, B is an m x p parameter matrix and T is a q x p regression matrix. We assume that both X and T are of full column rank. If Y'i and x'i denotes the ith row of Y and X, respectively, then Y' = ( Y l , Y 2 . . . . . Yn) andX' = (xl, x2, ... ,x,), In fact, the observation matrix Y is given in Table 1. Le ty , be so partitioned that y', = (Y',I1),Y',~2)), where Y',~I) is the observed part ofy , and Ynl2) is the part to be predicted. We can think that Y consists of the measurements obtained in a situation like that in Fig. 2. The data set consists of a sample of observed

3OO

250

200

] 100

50 0 33O ~ 0 ggO 1320

S t ~ I ~ t (c~)

Fig. 2. A sample of 40 s tem curves for spruce (in Finland).

3 7 6 E.P. Liski, T. Nummi/Int. J. Production Economics 46-47 (1996) 373-385

stem curves of spruces, which were processed by a forest harvester in Finland. Predicting diameter values of a stem is considered here as a missing data problem. The estimates of the 'missing' values Y , , q - , + 1, Y,, q - , + 2 . . . . , Ynq yield a prediction for d(t) at the points t q - ,+ l , t q - , + 2 , ... , tq.

In fact, we are interested in predicting the values y~,q_,+j, j = 1, 2 . . . . , up to the point t* where diameter values go under the log limit do(do = 16 cm, for example). Therefore we should choose the point tq so far from the root end that stem diameters at tq are below do. On the other hand, tq should not exceed the length of any tree. If necessary, trees are classified into different size classes according to breast height diameter, for example. Then parameters are estimated for different classes separately. The data considered in Sec- tion 5 consist of trees whose diameter at breast height lies between 21.8 and 25.7 cm. If we choose tq = 1320 cm in this size class (see Fig. 2), then all diameters y~q at tq are below do = 16 cm. Although the length of trees varies considerably, of course, the estimation is based on measurements in the fixed stem section It1, tq]. The aim of this approach is to obtain a balanced design so that closed-form expressions for parameters can be obtained. Thus heavy iterative algoriths can be avoided and the technique becomes realistic for practical purposes. Note also that predictions under parsimonious covariance structures like (3.1)-(3.3) could be easily extended beyond the interval [ t l , t q ] . However, we try to avoid such extensions by choosing tq properly.

For a polynomial of (p - 1)th degree the regression matrix in (2.1) is

T, =

1 1 t l t 2

tP1-1 tg -1

• "" 1

• "" tq

. . . t qP- 1

If we have only one species under consideration (spruce, in our application), then X' = (1, 1 . . . . . 1). But the corresponding model for many species can be estimated as well. Suppose, for simplicity, that we have now only one species under consideration and the first degree polynomial as a predictor.

Then the elements E y , j ( j > q - r) of #,~2) have the form.

~.j = % +/71 t~,

where /71 denotes the average decrease in stem diameter with the increasing height of the stem. We also considered an alternative predictor model, where the first measurement of the butt log or the small end diameter of the previous log was used as covariate. Let zi denote the 'size index' of the ith stem. For such a predictor

and

m j = ~'~ + ~ l t j,

where the constant term fl* = fl lO + f l 2 O Z n and the slope /7* = flll + f121z, now depend also on the 'size index' z. This is a useful form for stem curve since it models the variation in stem form with stem size.

Let us now consider two sets of random vectors

Y i ( 1 ) = 0 S i l , ' - " ,Yi,q-r) ' a n d y i ( 2 ) = (Yi,q-r+ 1, "" ,Yiq)', i = 1, 2 . . . . . n. Thus the observation vector Y'i = (Y'i(1), fli(2)) consists of the subvectors with

EyiO) ---=~i(1) ---- T1B'x l and Eyi~2) = / X i ( 2 ) = - T2B'xi ,

where

and

1 1 .-- 1 ]

/ t l t2 "'" t q - r

t ~ - ' tP2 - ' . . . tPa:~J

1 1 T~ = t q - r + l tq-r+2

p - 1 p - I tq-r+ l tq-r+ 2

The covariance matrix

27=(2" ,1 271z) Z~21 ~ 2 2

--. 1 / • - . tq

p 1 • - . tq

(2.2)

has been partitioned into submatrices such that cov (Y, j ) , Yi~k)) = X~k; J, k = 1, 2.

E.P. Liski, T. Nummi/Int. .L Production Economics 46-47 (I996) 373 385 377

Assume first that B and the covariance (2.2) are known. If the observation vectors are normally distributed, then

.~7n(2) = E(yn(Z)[Yn(1))

= ~n(1) Jr- z ~ 1 2 Z l l l ( y n ( 1 ) - - Pn(1) ) (2.3)

is the best predictor of Y,~z) with respect to the minimum mean square error criterion. The estimation problem still remains in the result (2.3). How- ever, the ML estimates orB and 12 can be calculated from Y(_,)= (Yl, Y2 . . . . . Y,-1) under the model (2.1). Substituting these estimates into (2.3) yields a practical predictor, which can be further improved by using the EM (expectation and maximization) algorithm. Thus the ML estimates from Y~_,~ can be used as initial values for B and 2:. This technique yields an estimate for the whole stem curve, although the data are incomplete. The EM algorithm has been thoroughly examined by Dempster et al. [4]. As we will see, the EM approach can be very well applied for prediction purposes. Liski and Nummi [5] considered prediction of individual measurements with the EM algorithm and compared their results with those obtained by Rao [6]. The EM algorithm for the Potthoff and Roy model is given in Liski [7]. We will present in Section 5 the essential features of the estimation and prediction process.

3. Covariance structure

At the beginning of the harvesting process the number of stems already processed is small relative to the set of measuring points q. This may yield poor estimates of the within-individual covariance matrix. Since q may be quite large (q = 45, for example), the unstructured covariance matrix con- tains an excessive number of parameters to be estimated. To improve the prediction and to lessen the computational burden it is important to consider parsimonious models for S. The importance of parsimonious covariance structures was demonstrated by Lee and Geisser [8], Geisser [9] and Lee [10] when they predicted the unobserved portion of partially observed vectors for growth curve data. Also Fearn [11], Rao [6, 12] and Reinsel [13],

among others, have considered the prediction problem in the growth curves model. Especially the results of Lee [10] support the view that identifying an appropriate covariance structure is the most important aspect in growth curve prediction once the correct growth function is more or less ascer- tained.

The covariance structures considered here are more specialized than the more general structures considered in time-series analysis. These specialized structures are usually well motivated in a growth curves model. In addition to unstructured covariance, we studied serial covariance structure, uniform covariance structure and Rao's simple structure. Serial covariance structure takes the form

Z = ~={pJa- ~}, (3.1)

w h e r e 0 "2 > 0, I P[ < 1 are unknwn and a, b = 1, 2, ... ,q. This covariance structure is usual for repeated measurements of short-time series. The uniform structure can be written as

X = p e e ' + G2]q, (3.2)

where a = > 0 and p > - a 2 / q are unknown, e ' = (1, 1 . . . . . 1) and lq is the identity matrix of order q. The third special structure studied here, known as Rao's simple structure, is defined as

Z = I A T ' + (721q, (3.3)

w h e r e o "2 > 0 and a p x p matrix A are unknown. This structure was initially studied, in a slightly more general form, by Rao [14]. Estimation and model selection under these special structures has been considered by Jennrich and Schluchter [15], Lee [10] and Nummi [16].

4. K-step predictor and EM predictor

We have studied the performance of the l-step predictor

~(0) (0) ~7'(0)[ ,c(O)~- 1 z (0) Yn(2) = ]2n(2t @ ~ 2 1 [ A ' l l ] tY, l l~ - ~/n(1)l, (4.1)

where -21v~°) and 2;~1 °) are the submatrices of S I°~ corresponding to a partitioning similar to (2.2). When applying such a one-step estimator, it is crucial that good initial estimates /W°)= X B ( ° ) T '

378 E.P. Liski, T. Nummi/lnt. J. Production Economics 46-,47 (1996) 373-385

and £~0) for M = XBT' and Z, respectively, are available. Substituting these estimates into (2.3) yields (4.1). In the present harvester application good initial values can be obtained from the previously recorded observations Y(_,~ = 0'1, Y2, . . . , Y,-1). On the other hand, the predictor (4.1) can be regarded as the first step of the EM algorithm. Shih and Weisberg [17] used one-step approximation of the EM algorithm when detecting influential observations in multiple regression. It should be mentioned also that this type of one-step approxima- tions appears frequently in other problems (see the discussion of Shih and Weisberg, [17]).

Since the rows of Y are independently normally distributed with covariance 22 and the mean of Y is X B T ' , the log likelihood function of B and S can be written as

L(B , S; Y) = - ½tr(Y - X B T ' ) ' ( Y -- XBT ' )22 -1

n -- ~ log [ 22[, (4.2)

where the constant ( -nq /2) log(2n) is omitted. D e n o t i n g X B T ' = M and M' = (Pl , / tz , ...,/z,), the formula takes the form

L ( B , 22"~ Y ) : - -~ (Yi - ]21)'22 - 1 (Yi - ~ i ) i = 1

1oglX].

The EM algorithm is an iterative technique consist- ing of two main steps:

(1) Compute first the expected log likelihood E l L ( B , Z; Y)[M ~k), 22(k), O], where the conditional expectation is taken under the conditions X B (k) T' = n (k) and Z = z~ (k). Note that the vector Y',t2) = (Y, ,q-r+ 1, Y, ,q-r+ Z, ".. ,Y,q) to be predicted (belongs to the missing part of the data) is a random variable, while the observed part of the data, denoted by O, is given.

(2) Choose M ~k+ l) and Z ~k+ 1) to be the values of M and 22 which maximize the expected log likelihood given by step 1. To start computation we need the initial values M = M ~°) and Z" = £.~o). Iteration continues until convergence to maximum likelihood is obtained. It

is known that the algorithm converges under fairly general conditions. These conditions are discussed, for example, in Dempster et al. [4].

As an iterative method the EM predictor may be computationally rather laborious, which can be regarded as a serious drawback in practice. To minimize iteration we suggest using k first EM steps as an approximation for the true EM predictor, where k is a small integer. This procedure works well when the percentage of missing data is small and good initial values for parameters are available. In fact, in our harvester application a one-step predictor did practically as well as the EM predictor (cf. also [5]). The EM algorithm for model (2.1) is given in Liski [7], for example. The main features of the expectation step are summarized in Appendix. By formula (A.3) the k-step predictor for Y,~2) is

~(k) __ (k) Z (k l ) (Z ( l k~) - l ( yn (1 )__ ~ln(1)! Yn(2) - - ]~n(Z) "k- (k) . ~

for k = 1, 2 . . . . (4.3)

The case r = 1 has been discussed in Liski and Nummi [5]. Predictor (4.2) is formally the regression function ofy,~2) on Yn(a), where also the past observationsya . . . . ,Y,- 1 are used in the estimation of the mean/~, and the covariance matrices 2;22, Z21 and 2211. The predictor is estimated using only the measurements after the last cutting point, say tf. The stem measurements y~j, tj ~< tf, for stems already processed, i = 1 . . . . ,n - 1 and for the current stem are ignored, since the stem history does not seem to improve the performance of the predictor (see for general discussion Rao [6]).

For the sake of simplicity we drop out the indici- ces from the expected log likelihood function, which will be denoted as (see (4.2) and Appendix)

2 E [ L ( B , S; IT)] = - t r Q ( B ) S -~ - nlog[22[, (4.4)

where

Q(B) = (Y-XBT')'(Y-XBT') + V

and

V = ~ cov(y.,,oJ Z,y.o)), i = l

E.P. Liski, T. Nummi/Int. J. Production Economics 46-47 (1996) 373-385 379

where (m) refers to missing and (o) the observed values. The maximum likelihood estimates for B and Z are those values that maximize the expected log likelihood function (4.4).

If Z is treated as unpatterned, the likelihood equation can be solved explicitly for each iteration. This is done by the multivariate analysis of covariance model (considered, for example, in [7]). The maximum likelihood estimators for B and 22 are

~ = ( . V X ) - I ~ y s 1T(T'S-1T)

and

2 = 1_ [ ( Y - X B T ' ) ' ( Y - X B T ' ) + V], n

respectively, where

S = Y ' [ I - X ( X ' X ) - I X ' ] Y + V.

Non-iterative solutions for (4.4) cannot be found, in general, when imposing a special structure on the covariance matrix. However, for Rao's simple structure and for uniform structure non-iterative solutions exist. Azzalini [18] studied an iterative procedure for the general case 2; = TA T' + a:f2(~) where f2(e) is some parsimonious covariance matrix. Estimators for Rao's simple structure can be obtained as a special case of Azzalini's consider- ations (see also [19]). The parameter A cannot be directly obtained from Eq. (4.4), but the likelihood equation can be solved by introducing a repara- metrization. Under Rao's simple structure the likelihood function (4.4) can be writen as follows (see Appendix for details):

I(B, tp, cs:) = -- t rQ{6- 21 - a- 2T(T'T)- IT '

+ T(T'T)- x/2 ~ - X(T,T)- a/2T, }

- - n l o g l 7 j l - n ( q - p ) l o g a 2, (4.5)

where

7J = a2I + (T'T)X/2 A(T'T) ~/:.

The maximum likelihood estimates for B, ~ and ~r 2, respectively, are

= (X'X)- ~ x ' r r ( r ' r ) -1 (4.6)

= ! ( T ' r ) - 1/2 T'(S + V)T(T' r ) - 1/2 n

~2 = t r [ ( Y ' r + v ) ( z - T ( T ' T ) - 1 T ' ) ] / { n ( q - p ) } .

Thereafter, the estimator for A can be solved from

j : ! ( r , r ) l r , ( s + v ) r ( r ' r ) 1 _

For uniform structure the estimator for the parameter B is (4.6). The estimators for a: and p take the form

~2 = tr Q(/~) - e'Q(B)e/q (4.7) n ( q - 1)

and

lle'Q(B)e ~2} (4.8) P:q( n-q where Q(/}) = + v. These expressions are computationally very simple. Es- timators (4.7) and (4.8) are also more general than the estimators derived by Lee [10] whose technique assumes the inclusion of the constant term in the growth function. Since the EM algorithm is an iterative technique, non-iterative solutions of the maximization step are very appealing.

5. Assessing the performance of the 1-step predictor

We assessed the performance of the 1-step predictors by analysing a sample of spruce stems har- vested by a modern harvester in Finland. The data sets consist of diameter values d(t) of 1325 spruce stems obtained at t = 0, 1A, 2A, 3A, 4A . . . . . where A = 30 (cm). However, in this study we only used stems where the diameter d(t) at breast height (t = 120cm) is greater than 21.8 cm and less or equal to 25.7 cm. The minimum height of the tree was chosen as 1320 cm. This yields a subset of 254 spruces more homogeneous than the data of all 1325 spruces. It is advisable strategy also in a real harvesting situation to compare the current stem with the past stems that are 'close' to the current one. The minimum log length was set at

380 E.P. Liski, T. Nummi/lnt. J. Production Economics 46-47 (1996) 373-385

/rain ----- 30 dm and the maximum log l~ax = 60 dm. The distance between the measuring equipment and the cutting device was c = 6 dm (so-called 1 grip harvester). Hence the maximum length of the known part of the stem before the first cutting decision was 36 dm. To keep the design balanced we utilize the measurements of all past stems and the current stem up to the same point tq. In our application tq = 1320 cm. Although the estimation is based on the section t l , tq, the prediction can be easily extended behind the point tq for parsimonious covariance structures described in the Sec- tion 3. We could also use more general models, where the number of measurements taken for each unit may vary from stem to stem. However, this would require implementation of heavy iterative algorithms which are often too slow in practice.

The stem diameters y~ were measured in mil- limetres. As utility functions we used price functions of the form

7~ h(l ,d ,s) = -~(s + c(d)) ld 2, (5.1)

where I is the log length, d is the small end diameter, s is the average cubic content price/m 3 and c(d) is a correction parameter depending on the small end diameter of the log. In our experiments we used s = 225.50 and c(d) varies between - 3 and 4. In practice the log prices depend also on log quality, and therefore price parameters for (5.1) vary as a function of a quality class. The difference in price between different qualities normally increases with diameter.

We applied a cross-validatory technique for comparing predictors. The stems are chosen suc- cessively under process (the current stem) and all other stems of the sample are assumed to be already processed (the past stems) and therefore known. We calculated the root mean square error (RMSE) of a predictor for all stems i = 1, 2 . . . . , n before every crosscut at the point c + l~i.. The RMSE for the ith stem is defined as

RMSEI = x/(Yi~2) --.~i~2))'(Yi~z) --.~i~z))lr, (5.2)

where r is the number of measurements to be predicted. Note that the predictor is estimated using only the measurements after the last cutting point

tf. Suppose, for example, that the measuring equipment is at the point t13 = 36 dm (tk = (k - 1)3 dm, k = 1, 2 . . . . ,45). Since diameter values are considered only up to the height t45 ----- 132 dm (q = 45), the number of diameter values r to be predicted at t13 is 32. If the first cutting point would be t16 = 45din, then after cutting the last cutting point tf is t16. In the next phase the measurements at the points tl - t16 are ignored and the prediction model is re-estimated from diameter values at points the t16 - - t45 a s i f t 1 6 would be the first point. The RMSE can be viewed as an estimate of the prediction error. On the basis of this measure one can compare the precisions of different predictors. We consider here only low degree polynomial growth curve models which are simple and adequately describe the true stem function at least over a short range of stem values. The RMSE values for the first degree, for the second and for the third degree polynomial were calculated. We used sample sizes 254, 100 and 40. Here the smaller sample is a subset of the larger one.

The RMSE takes directly into account the prediction errors. However, the main aim is to convert tree stems into smaller logs in such a way that the total stem value, according to a given price function for logs, is maximized. If the whole stem is known, the optimal cutting points can be calculated. There may be, however, a number of cutting patterns which give the maximum price or almost the maximum price for a particular stem. When optimization is done on the basis of predicted stem values, the cutting pattern may differ greatly from the optimum pattern. In spite of this, the stem price may be near to the optimum price. Thus the true relationship between the prediction errors, cutting patterns and stem prices can be very complicated.

Some numerical results of our investigations are summarized in Tables 2 and in 3. No systematic improvement was attained with iterative estimators, which are also too heavy in practice. Therefore, we have displayed the results of 1-step predictors only. We concentrate here on comparing the first two crosscut. This is because a very high portion of the price of the tree stem is determined on the basis of the success of these first two cuts. Polynomial predictors performed surprisingly well in our data set. The second degree polynomial

E.P. LisM, T. Nummi/lnt. J. Production Economics 4 6 ~ 7 (1996) 373 385 381

Table 2 Mean RMSE values and total price of stems for the polynomial l-step predictor of degree 1, 2 and 3 before 1st and 2nd crosscut

S Degree Mean RMSE values

1. Cut 2. Cut Percentage of total price

n = 2 5 4 n = 1 0 0 n = 4 0 n = 2 5 4 n = 1 0 0 n = 4 0 n = 2 5 4 n = 100 n = 4 0

lndep. 1 15.837 18.150 15.525 15.837 18.351 15.759 92.16 92.51 94.40 2 15.250 17,316 14.841 15.105 17.891 15.587 92.22 92.08 95.11 3 15.207 17,352 14.883 15.103 18.007 15.510 91.89 92.36 94.90

Uni£ 1 13,997 15,852 12.516 10.982 10.970 8.301 93.18 93.26 93.32 2 12.744 14,152 11.325 10.284 10.240 7.601 93.16 93.84 94.76 3 12.715 14,193 11.359 10.342 10.298 7.582 93.41 94.00 94.86

AR(1) 1 15.835 18.147 15.520 15.649 18.347 15.757 92.16 92.65 94.40 2 15.251 17,316 14.840 15.105 17.891 15,586 92.22 92.08 95.11 3 15.207 17.353 14.884 15.103 18.007 15.514 91.89 92.36 94.94

Rao's 1 15.270 15.776 12.873 12.181 12.583 9.119 94.34 94.93 95.50 2 14.923 15.427 12,708 8,009 8.529 7.048 93.78 94.00 96.06 3 11.774 12,874 9.726 8.382 8.364 7.172 94.68 95.08 96.27

Unstr. 1 12.644 15.423 12.482 8.686 8.741 9.115 94.32 94.03 95.30 2 11.756 14.819 9.938 8.264 8.148 8.681 93.70 94.40 94.35 3 11.522 15.021 9.920 8.588 8.025 8.900 94.07 94.63 94.05

The m a x i m u m p r i c e 16251.7 6650.18 2400.08

Table 3 Mean RMSE values and total price of logs for the polynomial l-step predictor of degree 1, 2 and 3 with covariate before 1st and 2nd crosscut

25 Degree Mean RMSE values

1. Cut 2. Cut Percentage of total price

n = 2 5 4 n = 1 0 0 n = 4 0 n = 2 5 4 n = l O 0 n = 4 0 n = 2 5 4 n = 100 n - 4 0

lndep. 1 8.116 8.258 7.387 6.249 6.256 6.274 95.09 94.73 95.21 2 7.221 6,709 6.836 5.528 5.375 4.832 95.66 95.62 95.95 3 7.217 6.789 6.774 5,542 5.477 4.840 95.51 95.66 96.05

Unif. 1 10.222 10.385 9,101 8.798 8.119 6.578 93.98 94.50 95.58 2 8.744 7.928 7.056 8.038 7.405 5.930 93.71 93.76 95.70 3 8.777 7.945 7,053 8.163 7.584 5.951 93.49 94.14 96.21

AR(I) 1 8.126 8.258 7.319 6.252 6.259 5.525 95.02 94.73 95.33 2 7.222 6.710 6.837 5.528 5.375 4.832 95.66 95.62 95.95 3 7.218 6.790 6.775 5.542 5.477 4.840 95.51 95.66 96.05

Rao's 1 8.466 8.719 7.829 7.562 7.477 7.182 95.77 96.01 96.35 2 7.287 6.586 6,317 5,518 5.654 5.568 95.61 95.85 96.79 3 6.407 6.195 5.747 5,214 5.233 4.935 95.99 96.08 96.62

Unstr. 1 7.391 8.703 8.081 6.194 6.281 8.308 95.93 95.95 96.52 2 5.862 6.690 7.814 5,564 6,469 7.359 95.57 95.30 96.75 3 5.855 6.613 8.224 5.535 6,161 7.447 95.81 95.67 97.51

The max imum price 16251.7 6650.18 2400.08

382 E.P. Liski, T. Nummi/lnt. J. Production Economics 46-47 (1996) 373-385

model proved to be a good choice in most cases, especially if we look at the RMSE-values only. When considering prices, usually the first degree polynomial model yields a good price. We have tried all the covariance structures described in Sec- tion 3. The unstructured covariance matrix provides a basis of comparison. In a real harvesting situation the unstructured pattern yields computationally burdensome predictors due to inversion of large covariance matrices. In Table 2 (no covari- ates) the 2nd and 3rd degree predictors having unstructured covariance matrix produced lowest mean RMSE-values. Approximately equally well performed a 3rd degree predictor with Rao's simple structure, which produces also best prices for all stem samples. When considering 1st and 2nd degree predictors the uniform structure was superior over the other parsimonious structures under consideration. Predictors having independence structure and AR(1) structure performed approximately equally.

Introducing the covariate improves prices and decreases mean RMSE-values systematically (Table 3). Especially the predictor having independence structure and AR(1) structure improved dras- tically when utilizing a covariate. An interesting result of Tables 2 and 3 is that smaller mean RMSE does not necessarily guarantee a higher price. The 3rd degree predictor with Rao's simple structure (with covariate) turned out to be superior to other predictors having parsimonious covariance structure. The independence structure with a covariate provides a useful alternative in practice because of its computational efficiency.

Although the predictor with unstructured covariance does well, it has serious drawbacks in practice. In the first prediction (1. cut) the number of measuring points q is large (q = 45, for example). If n = 40, then the resulting estimate of X is a singu- lar matrix. To make the estimate invertible a small positive constant was added to its diagonal elements, which is an ad hoc procedure. As mentioned before, the predictor is also computationally burdensome and therefore not fast enough for practical purposes.

The success of any of the predictors relies on the supposed similarity of the previously recorded n - 1 stems and the current stem, for which the

prediction is needed. Here our training sample, from which the predictor is estimated, is relatively homogeneous coming from the same stand. In practice, the stem distribution varies according to the stand under harvesting, but changes are typi- cally smooth due to spatial correlation. Therefore, the training sample must be updated. The size of the training sample must be kept small enough so that the estimator can adapt to change in stem distribution. The rigourous sample size optimization is not possible without specific knowledge of the change process, however. Note that here the results are computed for the so-called 1 grip harvester (common in Finland), where the known part of the stem before the cutting decision is relatively short, for example, 360 cm. According to our simu- lations more accurate results can be obtained if the 2 grip harvester is used, where the known part of the stem is much longer, say 500-600 cm.

It is computationally appealing that the 1-step predictor did as well as the corresponding iterative estimator. In general, the predicted values with 1-step predictor were very close to predicted values produced by an iterative EM predictor. This phe- nomenon is mainly due to the fact that in our application the portion of the data to be predicted was small compared with the previously recorded data. So, very good initial values can be obtained. Further, stem curves are smooth functions and, in our sample, relatively similar to each other.

6. Concluding remarks

The most striking finding in our investigations is that the specification of parsimonious covariance structure is a crucial step when seeking a good predictor. In our application the 3rd degree predictor having Rao's simple structure with a covariate was superior to predictors having other parsimonious covariance structures. However, predictors having independence or AR(1) structure with a covariate proved to be reasonable alterna- tives. The predictor with independence structure provides the most promising starting point for the implementation of stem curve predictors in real forest harvesters, It is remarkable that decreasing mean RMSE value does not necessarily guarantee

E.P. Liski, T. Nummi/lnt. J. Production Economics 46-47 (1996) 373-385 383

higher price. This is because underestimation and overestimation of the same size may yield drasti- cally different cutting patterns. Analysing the con- sequences of this asymmetry is an important topic for further study. More research is also needed on the selection and updating of the training sample for a predictor.

A p p e n d i x

Expected log likelihood

The expectation step of the EM algorithm can be computed as follows [7]:

E[L(B, I7; Y)IM ~k~, 17(k), O]

= -- ½tr[Y ~k) - X B T ' ) ' ( Y (k) - -XBT') + v(k)]x 1

-- 21ogIS[, (A.I)

where

V (k) 7- ~ V¢ k) and V¢ k) =COV(Y i (m) lS(k) ,Yi(o)) . i=1

(A.2)

The ith rowy(i k) = (Yi(o) + Yi(2)j of y(k) is partitioned to the observed values and the estimate of the missing values:

;~k) 7---- E(yi (m) lfltik), 17(k), Yi(O)) (A.3)

(k) vF'(k)t~'(k) ~ - 1 = ]l'i(rn) "~- z-'21~z" 111 (Yi(o) - - ~i(0)),

where

M(R) XB(k)T, = (p((), (k),, = ... ,#, ),

X(2k~ and X((~ are appropriate submatrices of S (k).

Maximum likelihood estimates

Rao's simple structure We expand 27 1 by applying a theorem for the

inverse of the sum of two matrices [20, p. 519]. This yields

X I = ( T A T ' + O'21q) - 1

= o-21q -- 0-2T(02Ip + A T ' T ) - I A T '. (A.4)

Denote g* = aaIp + (T'T1/2)A(T'T 1/2), where (T'T) 1/2 is such that [(T'T)I/2] 2 = T'T. If we now apply the inversion theorem to 7/, we obtain

~ - 1 ----- (7 21,, - - 0 2 ( T ' T ) 1 / 2 ( o Z l p

+ A T ' T ) - I A ( T ' T ) 1/2. {A.5)

Solving 0-2(o2/p + AT'T) -1A from (A.5) and substituting the result into (A.4) yields

S -1 = o-2/o - e 2T(T'T) 1r '

+ T(T 'T ) - 1/z~p- I (T ,T)- 1/2 T'.

Since

I,~,1 = ( 0 2 ) q I o - 2 T A T ' + lq[

= (o2)q-" I T'TA + 021pl = (o2)~-"1~°1,

the expected log likelihood function (4.4) can be reparametrized as

I(B, ~, 02 ) = - t rQ(B){o - 2 I - o - 2T(T'T) 1T'

+ T ( T ' T ) - I / 2 ~ I (T 'T) - I /2T ' }

- - n l o g l t P l - - n ( q - - p ) l o g o 2. (A.6)

Differentiating (A.6) with respect to B and setting the derivative to zero yields

= ( X ' X ) - ' X ' Y T ( T ' T ) - I (A.7)

Since X- 1 is positive definite, t r Q ( B ) Z - 1 >~ O. It can be shown by standard techniques (cf. [20, p. 406]) that t r Q ( B ) S - I > ~ t r Q ( B ) E - 1 ) O for every Z-1. Thus the expected log likelihood is maximized at B for every given Z - 1 and for every 71- 1. Therefore, the maximum likelihood estimates for 7 j and a 2 can be found by maximizing I(B, t/,, 02). But 1(/}. 7 ~, 02) depends on 7 j only through the function

f(qu) = _ t r tp - I(T,T) 1/2 T'(S + V)T(T'T)-1/2T']

- - nlogl ~l ,

which attains its maximum ([20 A7.1, p. 523]) at

= + v ) r ( r ' r ) - 1 / 2 n

384 E.P. Liski, T. Nummi/lnt. J.. Production Economics 46-47 (1996) 373-385

A A

Finally, maximizing I(B, 7 j, 0 "2) is equivalent to min- imizing tr [Q(/~)(I- r(r'r)-' r ' ) ] + n(q - p) log 0-2, which yields the maximum likelihood estimate

¢~2 : tr[(Y'Y + V ) ( I - r(r'r)-' T ' ) ] / { n ( q - p)}.

Note that b 2 > 0 with probabililty one. The estimator for A can be solved from ~:

2 = 1 ( T , T ) _ ~ T , ( S + V ) T ( T , T ) _ ~ _ ~ 2 ( T , T ) _ ~ n

The estimators for 0 -2 and p, respectively, are as follows:

tr Q(/~) - e ' Q ( B ) e / q

n ( q - 1) and

l ~ e ' Q ( B ) e ~2} , q

The un i fo rm s t ruc ture

Following the arguments of the previous section yields:

I z I = [pee' + ff2II = (0-2)q-1

and

Z - 1 = (pee ' + 0-2i)- 1 = 0-- 21 0 - - 2 _ _ 1

e e ' , q ~

where we have denoted

= a 2 + qp.

The expected log likelihood function (4.4) takes now the form

l (B , ~, 0-2) = _ trQ(B) 0--2I q~ e e '

- nlog[ ~l - n(q - 1)log0- 2.

The maximum likelihood estimate/~ of B is identi- cal with (A.7). Since l(/i, ~ 0-2) depends on ~ through the function

e ' Q ( B ) e f ( ~ ) = n log ~, ~Pq

the maximum l(/~, ~ 0-2) will be obtained at

e ' Q ( B ) e

qn

D i f f e r e n t i a t i n g 1(/~, ~ 0-2) wi th respec t to 0-2 gives

[ ' l ~?f -4tr Q(/~) q ~0-2 = 0- - - Q ( B ) e e ' - n(q - 1)0, --2.

R e f e r e n c e s

[1] Liski, E.P. and Saario, V., 1986. The feasible admission to the optimization of exploitation of tree stems, Eng. Costs Prod. Econom., 12: 21-28.

[2.] Laasasenaho, S., 1982. Taper curve volume functions for pine, spruce and birch, Communicationes Instituti Fore- stalis Fenniae 108, Helsinki, Finland.

[3] Potthoff, R.F. and Roy, S.N., 1964. A generalized multivariate analysis of variance model useful especially for growth curve problems, Biometrika, 51: 313-326.

[4.] Dempster, A.P., Laird, N.M. and Rubin, D.B., 1977. Max- imum likelihood from incomplete data via the EM algorithm, J. Roy. Statistic. Soc., B 39: 1-22.

i-5,] Liski, E.P. and Nummi, T., 1990. Prediction in growth curve models using the EM algorithm, Comput. Statist. Data Anal., 2: 99-108.

[6.] Rao, C.R., 1987. Prediction of future observations in growth curve models, Statist. Sci., 2: 434-471.

[7.] Liski, E.P., 1985. Estimation from incomplete data in growth curves, Communun. Statistat. - Simulation Corn- put., 14:13 27.

[8] Lee, J.C. and Geisser, S., 1975. Applications of growth curve prediction, Sankhya, A 37, 239-256.

I-9] Geisser, S., 1981. Sample reuse procedures for prediction of the unobserved portion of a partially observed vector, Biometrika, 68:243 250.

[10] Lee, J.C., 1988. Prediction and estimation of growth curves with special covariance structures, J. Amer. Statist. Assoc., 83: 432-439.

[11] Fearn, T., 1975. A Bayesian approach to growth curves, Biometrika, 62: 89-100.

[12,] Rao, C.R., 1975. Simultaneous estimation ofparametersin different linear models and applications to biometric problems, Biometrics, 31: 545-554.

[13] Reinsel, G., 1984. A note on conditional prediction in the multivariate linear model, J. Roy. Statisti. Soc., B 46: 109-117.

[14] Rao, C.R., 1967. Least squares theory using an estimated dispersion matrix and its application to measurement of signals, in: L.M. LeCam, J. Neyman and E.L. Scott (Eds.). Proc. 5th Berkeley Symp. on Mathematical Statistics and Probability, Vol. I, University of California Press, Berkley, pp. 355-372.

E.P. Liski, T. Nummi/lnt. J. Production Economics 46~47 (1996) 373-385 385

[15] Jennrich, R.I. and Schluchter, M.D., 1986. Unbalanced repeated measure models with structured covariance matrices, Biometrics, 42: 805-820.

[16] Nummi, T., 1992. On model selection under the GMANOVA model, P.G.M. van der Heijden, W. Jansen, B. Francis and G.U.H. Seeber (Eds.), Statist. Modelling Elsevier, Amsterdam, pp. 283-292.

[17] Shih, W.J. and Weisberg, S., 1986. Assessing influence in multiple linear regression with incomplete data, Tech- nometrics, 23(3): 231 239.

[18] Azzalini, A., 1987, Growth curves analysis for patterned covariance matrices, in: (eds M. Puri, J.P. Vilaplana and W. Wertz), New Perspectives in Theoretical and Applied Statistics Wiley, New York, 63-73.

[19] Laird, N.M., Lange, N. and Stram, D., 1987. Maximum likelihood computations with repeated measures: Applica- tion of the EM algorithm, J. Amer. Statist. Assoc., 82: 97 105,

[20] Seber, G,A.F., 1984. Multivariate Observations, Wiley, New York

the marking for bucking under uncertainty in automatic harvesting of forests

Documents