functional analysis of real world truck fuel consumption …238366/fulltext01.pdf · technical...
TRANSCRIPT
Technical Report, IDE0806, January 2008
Functional Analysis of Real World Truck FuelConsumption Data
Master’s Thesis in Computer Systems Engineering
Georg Vogetseder
School of Information Science, Computer and Electrical EngineeringHalmstad University
Functional Analysis of Real WorldTruck Fuel Consumption Data
School of Information Science, Computer and Electrical EngineeringHalmstad University
Box 823, S-301 18 Halmstad, Sweden
January 2008
Acknowledgement
If it looks like a duck, and quacks like a duck,we have at least to consider the possibility that we have
a small aquatic bird of the family anatidae on our hands.Douglas Adams (1952-2001)
Thanks to my family, especially my mother Eva and friends.
ii
Abstract
This thesis covers the analysis of sparse and irregular fuel consumption data of longdistance haulage articulate trucks. It is shown that this kind of data is hard to analysewith multivariate as well as with functional methods. To be able to analyse the data,Principal Components Analysis through Conditional Expectation (PACE) is used, whichenables the use of observations from many trucks to compensate for the sparsity ofobservations in order to get continuous results.
The principal component scores generated by PACE, can then be used to get roughestimates of the trajectories for single trucks as well as to detect outliers.
The data centric approach of PACE is very useful to enable functional analysis ofsparse and irregular data. Functional analysis is desirable for this data to sidestepfeature extraction and enabling a more natural view on the data.
iii
Contents
Acknowledgement ii
Abstract iii
List of Figures vi
List of Tables viii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation and Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Methods 7
2.1 General Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Validation Methods . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.4 Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Functional Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Principal Components Analysis through Conditional Expectation . . . 11
3 The Vehicle Application and Data Description 13
3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Impurities in the Truck Data . . . . . . . . . . . . . . . . . . . 14
3.1.2 Data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
iv
4 Results 21
4.1 Basic Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.1 Data Binning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.3 Function Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Application of PACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.1 Baseline PACE Results . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.2 Number of Principal Components . . . . . . . . . . . . . . . . . 34
4.2.3 Error Assumptions in PACE . . . . . . . . . . . . . . . . . . . . 36
4.2.4 Different Kernel Functions . . . . . . . . . . . . . . . . . . . . . 38
4.2.5 Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.5.1 Model Variance . . . . . . . . . . . . . . . . . . . . . . 40
4.2.5.2 Data Variance . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.5 Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Discussion 50
6 Conclusion 51
Bibliography 53
List of Abbreviations 55
v
List of Figures
3.1 Fuel Consumption between Observations . . . . . . . . . . . . . . . . . 14
3.2 Fuel consumption plot generated from the raw data . . . . . . . . . . . 15
3.3 Histograms of the original and the cleaned data . . . . . . . . . . . . . 17
3.4 Fuel consumption plot generated from the clean data . . . . . . . . . . 17
3.5 Scatter plot and histograms . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6 Histogram of the distance between observations . . . . . . . . . . . . . 19
4.1 Distribution and mean/variance of binned data . . . . . . . . . . . . . 22
4.2 Boxplots of binned data . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Outlier detection based on feature extraction . . . . . . . . . . . . . . . 25
4.4 Straight line fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Plot of mean function and principal components . . . . . . . . . . . . . 29
4.6 Scree Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.7 Smoothed covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . 31
4.8 Reconstructed curves versus mean function and raw observations of se-lected trucks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.9 Reconstructed curves and raw measurements for all trucks. . . . . . . . 33
4.10 Reconstructed traces of misfitted trucks . . . . . . . . . . . . . . . . . . 33
4.11 Comparison of reconstructed trajectories with differing number of PCs 35
4.12 Reconstructed trajectories without measurement error assumed . . . . 37
4.13 A comparison of µ with different smoothing kernels . . . . . . . . . . . 38
4.14 A comparison of 3 PCs with different smoothing kernels . . . . . . . . . 39
4.15 Distribution of all mean curves . . . . . . . . . . . . . . . . . . . . . . 41
4.16 Graph of all mean curves . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.17 Trucks with a high influence on the results of PACE . . . . . . . . . . . 42
4.18 Data variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
vi
4.19 Normal Distribution Plots of the PC scores . . . . . . . . . . . . . . . . 45
4.20 Histograms of the probability of trucks . . . . . . . . . . . . . . . . . . 46
4.21 Samples of truck probability . . . . . . . . . . . . . . . . . . . . . . . . 46
4.22 PACE Results of Speed Data . . . . . . . . . . . . . . . . . . . . . . . 47
4.23 PACE Results on Seasonal Fuel Consumption . . . . . . . . . . . . . . 48
4.24 Selected trucks from the Seasonal Fuel Consumption Data . . . . . . . 49
vii
List of Tables
4.1 MSE of PACE with 8 principal components . . . . . . . . . . . . . . . 34
4.2 MSE of PACE with 3 PCs . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 MSE of PACE with 4 PCs . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 MSE of PACE with 29 PCs . . . . . . . . . . . . . . . . . . . . . . . . 36
4.5 MSE of PACE with 8 PCs and error cut-off . . . . . . . . . . . . . . . 37
viii
1
Introduction
1.1 Background
The original idea for analyzing this data came from Volvo Parts AB, one of the main
business units of Volvo Group AB. The role of Volvo Parts is to provide solutions and
tools to the after-market, which includes vehicle electronics diagnostic tools. When a
truck is in the workshop, the vehicle electronics data is read out from the truck using
diagnostics tools from Volvo Parts and transmitted to a central database.
This data, which is collected from the vehicles electronics systems is called logged
vehicle data (LVD) and is collected from sensors within the truck. Several electronic
subsystems supply information for LVD, which can include data from the electronic
suspension, the transmission, and most importantly from the Engine Electric Control
Unit. The current main use of LVD is seemingly just basic analysis, e.g. remote
diagnostics of faulty components and simple statistics.
One of the problems with analysing LVD is the relative lack of observations. The
source of this lack of information is the data retrieval process. The procedure is a time
consuming process, making it a cost factor for the workshops. The time consumption
affects the adoption rate of this procedure in the field negatively, which leads to the
data composition detailed in Section 3.1.
The basic idea behind the problems detailed in this thesis is to expand the usefulness
of the data for Volvo Parts, retrieving additional new information from it and provide
means to access this information. This is done by using recent advanced statistical
1
1. Introduction 2
techniques. As a starting point to the application of these techniques, the analysis of
the fuel consumption data contained in LVD was suggested.
Fuel consumption data is very interesting from a statistical point of view. This interest
stems from being a major cost factor, as well as being influenced by a high number of
other factors, such as:
• Usage patterns of the operator, i.e. the driving style and habits
• Maintenance of the truck
• Gross Combination Weight usage, i.e. the cargo of the truck
• Environment, i.e. hilliness, road condition, etc.
The influence of these and more factors make this data a good indicator. But the
mass of influences also makes exact determination of the underlying cause impossible.
Additionally, some of these influences might cancel each other out, thus removing
information. If it is possible to extract information from fuel consumption data, then
it should work for the rest of the data too.
1.2 Motivation and Novelty
From LVD, it should be possible to extract information on hidden trends, i.e. the
principal components (see Section 2.1.1) that are common to all similar trucks. Based
on these components, it should be possible to determine if a truck is unrelated to other
trucks, i.e. a outlier and to predict future developments in fuel consumption, when
the trucks behavior is similar to that of other vehicles.
It is very easy to take the last observation of each truck in a group of similar trucks to
determine abnormal fuel consumption, but it is hardly possible to calculate underlying
trends or other information from these facts.
To discover information like trends or outliers from LVD, the data of a truck has to
include not only the last observation available, but also past ones. These requirements,
multiple observations of a truck and a set of similar trucks lead to the irregular and
sparse structure of the data used in this thesis. The data is described in more detail
in Section 3.1.
1. Introduction 3
The analysis of this data can be done in at least two ways. The most obvious choice
in methodology would be the use of multivariate statistics, but for several reasons de-
tailed below, the central methodology for this thesis is functional statistics. Functional
statistics focuses on analysing the data as functions, rather than a set of discrete values
1.
Multivariate statistics are a set of methods which work on more than one variable at a
time. Some examples for these methods are regression analysis, principal components
analysis and artificial neural networks. Principally, functional statistics are also part
of this set, as both have multiple variables as input. However, the focus on handling
the input variables as continuous functions rather than arbitrary variables separates
those two fields.
As the observation of trucks in the workshop is not happening regularly, i.e. the
observations can not be fitted to a grid, it is difficult to incorporate all information
from the input into variables for use in multivariate statistics. Therefore, features
like mean, variance, duration of all observations, date of first observation, odometer
count at the last observation, etc. have to be extracted from the data to be able to do
analysis. Inevitably, the extraction of this knowledge leads to information loss, which
is problematic on this already sparse data. The process of discovery and selection of
important features for multivariate analysis is very difficult and time consuming. It is
crucial to extract and select the best and most important features from the data to
minimize the data loss and maximize the information content of the features for the
success of all further steps in analysis. Feature extraction creates an additional layer
of data processing and introduces a large number of tunable knobs.
Functional Data Analysis (FDA) on the other hand, preserves the information in the
data present and does not need feature extraction at all. Furthermore, it facilitates a
more natural handling of the data, describing not only more or less abstract features
of the data, but a function which resembles the data. The choice of using functional
over multivariate data analysis is also motivated by the ability to analyze the func-
tional properties of the data, e.g. derivatives of the data. Additionally, FDA does not
introduce a high number of additional parameters, unlike multivariate analysis.
1A more detailed description on this collection of methods can be found in Section 2.2.
1. Introduction 4
However, multivariate analysis has an advantage over FDA when a high number of
different functions have to be analysed at the same time. FDA has problems in visu-
alizing this higher dimensional data, as well as the necessity of having a high amount
of data for each dimension (curse of dimensionality).
The most important step in FDA is the transformation of the discrete data to a func-
tional basis. Again, the irregular and sparse nature of the data makes this transforma-
tion difficult. For being able to perform FDA on this data, a method called Principal
Components Analysis through Conditional Expectation (PACE) is applied. The foun-
dation of PACE is the assumption that a smooth function is underlying the sparse
data. Under this assumption, it is possible to use even irregular data for the discovery
of principal components.
The main novel aspect of this thesis is the application of FDA and PACE to automotive
data. Previously it has successfully been applied to biological data, economic processes,
bidding in online auction houses, but not automotive data. PACE itself is highly
interesting to be applied to the data at hand, because it is able to work on it without
the need for feature extraction or regular observations.
The methods used in this work can be used to describe the actual fuel consumption of
the observed trucks in customer hands. This means the methods applied to LVD are
driven by data and not by a model.
1.3 Related Work
General sources of information on data analysis – related to this work – are The El-
ements of Statistical Learning [1], Functional Data Analysis [2] and Nonparametric
Functional Data Analysis [3].
The single most important paper related to this work is Functional Data Analysis for
Sparse Longitudinal Data [4], which proposed the method PACE and applied it to
yeast cell cycle gene expression data and to longitudinal CD4 cell percentages. The
percentage is used as a maker for the progress of AIDS in adults.
1. Introduction 5
Functional Data Analysis for Sparse Auction Data [5] combines the PACE approach
with linear regression to predict closing prices of online auctions.
The most related of the few public papers on fuel consumption in heavy trucks is Heavy
Truck Modeling for Fuel Consumption Simulations and Measurements [6]. This work
deals with building a simulation model of fuel consumption. Another paper, which dis-
cusses methods to reduce idle fuel consumption in North American long distance trucks
and highlights typical driver behavior is Analysis of Technology Options to Reduce the
Fuel Consumption of Idling Trucks [7]
Additional information on doing PCA on sparse and irregular data can be found in
Principal component models for sparse functional data[8] and Sparse Principal Compo-
nent Analysis [9]. More related to PACE is Properties of principal component methods
for functional and longitudinal data analysis [10]. Another paper which is related to
the estimation of Functional Principal Component Scores is [11]. Knowledge relating
to linear regression analysis for longitudinal data can be found in [12].
1.4 Limitations
The scope of this thesis is to research the possibilities for the application of FDA meth-
ods to the sparse and irregular automotive data from LVD. It is outside of the scope
of this thesis to establish a conclusive theory about a true long term fuel consumption
model of all truck engines.
The conclusive, globally valid model is impossible because of a relatively low number of
individuals in the data, as well as a limited observation duration and possible differences
in usage patterns of the trucks, i.e. vehicles with a high mileage in a limited time span
do not necessarily exhibit a similar fuel consumption to low mileage trucks in the same
time span.
1. Introduction 6
1.5 Outline
The next chapter ”Methods” describes crucial used methods. This includes underlying
basic methods as well as the foundations of FDA and PACE. The chapter 3 ”Applica-
tion” provides a description of the data used in this thesis and includes information on
the interplay of the proposed methods and the data. Chapter 4 provides comprehen-
sive information on the results. The last two chapters, ”Conclusion” and ”Discussion”,
wrap up the results from this thesis and provide an outlook on possible continuations
of the research.
2
Methods
This chapter is divided into three parts. General Statistical Methods describes non-
functional methods which are fundamental to this work. Functional Data Analysis
provides an introduction into this field. The final part, Principal Components Analysis
through Conditional Expectation gives an overview of this crucial method.
2.1 General Statistical Methods
This section introduces general statistical concepts used in this thesis and a number of
tools to visualize data and test results.
2.1.1 Principal Component Analysis
One of the constitutional methods for analysing LVD is the Karhunen-Loeve transfor-
mation, universally known as Principal Component Analysis (PCA). PCA is also the
foundation to Functional Principal Component Analysis (FPCA)[1, 13].
Basically, PCA is a method to explore data by finding the most important ways the
variables in the data differ from another. It can compress the data by discovering a
low number of linear combinations of input variables which contribute most to the
variability of the input. These linear combinations are found by constructing a linear
basis for the data where the retained variability is maximal.
7
2. Methods 8
Mathematically speaking, the goal is to reduce or compress high dimensional data X
to lower dimensional data Y .
To do this reduction, a number of algorithms are available, here, a method involving
the calculation of the covariance is described.
The first step is to calculate the mean vector µ for each variable:
µi =1
Ki
Ki∑j=1
xij, i = 1 . . . N
where N denotes the number of variables and Ki the number of observations in one
variable.
Subsequently, µ is removed from every observation in X, which is subsequently denoted
as X − X.
In the next step the covariance matrix cov(X − X) has to be calculated. Covariance
is a measure how two variables vary together. If those two variable vary in the same
way (i.e. same prefix), the covariance will be positive. If, on the other hand, the two
variables have different prefixes, the covariance will be negative. A covariance matrix
is the result of calculating the covariance for all members of two vectors. The resulting
matrix gives the grade of correlation between the input vectors.
To find a mapping M that is able to transform the high dimensional data into low
dimensional data, M that maximizes MT cov(X − X)M has to be found. It can be
shown that the best (variance maximizing) mapping is formed by the eigenvectors of the
covariance matrix. Hence, PCA has to solve the eigenproblem to get the transformation
matrix.
cov(X − X)M = λM
The eigenproblem has to be solved d times with different principal eigenvalues λ to get
the principal eigenvectors (or principal components). The low dimensional representa-
tion Y can then be computed by simple multiplication:
Y = (X − X)M
2. Methods 9
2.1.2 Hierarchical Clustering
Hierarchical clustering is a relatively simple method [1] to segment data into related
groups. Clustering is used within this thesis for testing if differing clusters of trucks can
be found from extracted features. Hierarchical clustering needs a dissimilarity measure
between the elements. The standard for measuring the dissimilarity is the euclidean
distance, which is also used in this thesis.
When the distance between all possible pairs of elements is calculated, the clusters can
be built. For building these clusters, there are two different approaches: The agglom-
erative approach, which starts with as many clusters as there are individuals. The
divisive method starts with one big cluster which is then split into smaller clusters.
Agglomerative methods are guaranteed to have a monotonic increasing level of dissim-
ilarity between merged clusters, growing with the level of merging. This property is
not guaranteed to divisive approaches.
The second choice for building the clusters is to decide on the measurement for the
distance between two clusters.
• Single Linkage – The link between the clusters is defined by the smallest dis-
tance between elements in the two clusters.
• Complete Linkage – The link is defined by the largest distance between ele-
ments in the two clusters, the opposite of the first method.
• Average Linkage – Uses the average distance between all pairs of elements in
both clusters.
2.1.3 Validation Methods
A number of methods to validate the results and to estimate variation were used in
the scope of this thesis. These include brief usage of bootstrap, jackknife and various
cross validation methods, such as k-fold and leave-one-out [1].
Bootstrapping is the process of randomly picking a samples from given observations
where a single observation can be chosen multiple times. The goal of a bootstrap is to
approximate the distribution from these samples.
2. Methods 10
Jackknifing can be used to estimate the bias and standard error. Jackknife is very
similar to k-fold and leave-one-out cross validation, as it systematically removes one
or more observations from a sample and then recalculates the results as often as there
are possible readouts.
2.1.4 Diagrams
A number of special diagrams were used to illustrate some results of this thesis. Those
diagrams are dendrograms, boxplots and scree plots [1, 2].
• Dendrograms are tree diagrams which are used to illustrate the result of a clus-
tering algorithm. An example for such a diagram is Figure 4.3. On the vertical
axis the distance between clusters is plotted. A horizontal line denotes a split
between classes at this specific distance measure. This implies that a split at
a higher distance value has a higher dissimilarity between the split classes, as
opposed to a lower distance value split.
• Boxplots describe groups of data – such as binned data – through five statistical
properties. A boxplot example can be seen in Figure 4.2. The box represents
the lower and the upper quartile, showing where half of the data is contained.
The line in this box illustrates the median of data in this group. The whiskers
attached to this box extend to the furthest data point, up to a maximum of 1.5
the distance between the quartiles. Data points outside of this boundary are
usually marked with a cross, indicating a possible outlier.
• Scree plots give an indication of the relevance of a principal component (eigen-
function) by indicating the accumulated eigenvalue up to the n-th principal com-
ponent. This plot can be used to select a suitable number of eigenfunctions. An
example for a scree plot is Figure 4.6.
2.2 Functional Data Analysis
Functional data analysis (FDA) [2, 3] is a collection of methods which enable the
investigation of data in a functional form. Functional data is the idea of looking at a
2. Methods 11
set of observations not as a vector in discrete time, but as a continuous function. The
analysis of functions rather than discrete samples inherits advantages over multivariate
analysis.
An advantage of this property is that the rate of change or derivatives of these functions
can easily be calculated and analysed. FDA also includes variants of multivariate
methods like PCA. Functional PCA, like normal PCA, not only provides a method for
dimensionality reduction, but also characterizes the main modes of variation from a
mean function.
To perform FDA on discretely sampled data, the data has to be converted to a contin-
uous, functional format. This means a function has to be fitted to the sampled data
points. It is not feasible to convert every dataset to a functional form. Especially in
the case of sparse and irregular observations, this task is very difficult, but central to
the success of functional data analysis.
Usually, the methods used to convert data into a functional format are interpolation
and smoothing, or more generally function fitting. A very simple method to do this
conversion would be a least squares fit of a first order polynomial (a straight line).
Usually, a more flexible method is used for this step, namely spline interpolation.
Depending on the underlying data, other fits like Fourier functions are possible.
FDA is easily applicable if the measurements were done with a regular spacing, and
the data is complete over the observation duration. In the opposite case, it is very
difficult to estimate the complete trajectory, when only a single subject is taken into
calculation.
2.3 Principal Components Analysis through Con-
ditional Expectation
Principal Components Analysis through Conditional Expectation (PACE) is a deriva-
tive of functional principal components analysis for sparse longitudinal data, proposed
in the paper Functional Data Analysis for Sparse Longitudinal Data by Yao, Muller
and Wang [4].
2. Methods 12
PACE is an algorithm for extracting the principal components from irregular and
sparse data. It also provides an estimation of individual smooth trajectories of the data.
PACE assumes that the data is randomly located with a random number of observations
per subject. Furthermore it assumes that data is determined by a underlying smooth
trajectory.
The first step in PACE is the estimation of the smooth mean function µ, by using a
local linear line smoother on all measurements combined into one pool of data. The
choice of the smoothing parameter, or bandwidth is done automatically[14] or by hand
in this step.
The covariance surface can then be calculated like a regular covariance matrix. This
raw covariance surface is stripped of the variance (the first diagonal). This raw matrix
is then smoothed utilizing a local linear surface smoother. The bandwidth is chosen
by leave-one-curve-out cross-validation. The smoothing step is necessary to fill in for
missing observations. The estimation of these two model components share the same
smoothing kernel. The choice of a smoothing kernel is discussed in Chapter 4.
From these model components, it is possible to calculate the estimates of the eigenvalues
and eigenfunctions, i.e. the functional principal components of sparse and irregular
data.
The last step is the calculation of the functional principal component scores. Those
scores describe how much of a principal component is retained in a single subject.
However, the conventional method of using numerical integration to recover the Prin-
cipal Component (PC) scores leads to biased results; because of sparse and irregular
data. In this step, the conditional expectation comes into play. It provides the best
prediction of the PC scores if the measurement error is Gaussian, or the best linear
prediction otherwise. PACE is discussed in detail by Yao, Muller and Wang [4].
3
The Vehicle Application and Data
Description
The purpose of this chapter is to outline the connection between the methods proposed
in Chapter 2 and the application of those methods on the Volvo data.
3.1 Volvo Truck Data
The original data received from Volvo Parts AB consists of 2027 observations of 267
trucks. It was collected between June 2004 and May 2007 in North America.
All trucks have the same engine and are configured as articulate truck for long distance
transports on smooth roads. The gross combination weight (GCW), which includes the
weight of the towed trailer and the truck itself is 36 tons, the US federal GCW limit.
Data is retrieved when a truck is in a workshop that is equipped to read out the onboard
electronics and performs this procedure. It is then sent to the Volvo Headquarter in
Gothenburg for storage and analysis.
The data from each observation contains only informations from one of the trucks
onboard electronic systems, the Engine Control Unit (ECU). From these data, two
variables are mainly relevant for this thesis:
• Total distance driven
• Total amount of fuel consumed
13
3. The Vehicle Application and Data Description 14
0 1 2 3 4 5 6 7
x 105
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Distance Driven [km]
Incr
emen
tal F
uel M
ileag
e [k
m/l]
Figure 3.1: This figure shows the distribution of the fuel consumption, when the fuelmileage is calculated only between two observations. The outliers visible in this figurecan be explained by a high amount of idling between two close observations. Whenthe fuel mileage is calculated accumulative, those outliers do not occur.
These variables are not reset when the ECU was read out in the workshop and therefore
behave accumulative. Using these variables as a basis to calculate the fuel consumption
per distance or time has an averaging effect on itself as it includes all former mileage
data. This is necessary because of the unevenly distributed data. If a truck was
read out twice within a very short span of time, the fuel consumption in this interval
is possibly vastly different from the normal fuel consumption behavior of the truck,
possibly because the truck was not moved very far withhin this time span, but idling for
some time. The outliers caused by this effect can be seen in Figure 3.1. These outliers
are the reason for not using the difference in fuel amounts between two observations
as a calculation basis in this thesis. The accumulative approach allows those outliers
to remain in the dataset.
3.1.1 Impurities in the Truck Data
The raw data retrieved from the trucks contains irregular observations or changes in
the truck data which result – in some cases – in a removal of specific observations or
the whole truck from the data set. See Figure 3.2 for a plot of the raw fuel consumption
3. The Vehicle Application and Data Description 15
0 1 2 3 4 5 6 7
x 105
1
1.5
2
2.5
3
3.5
4
Distance Driven [km]
Fue
l Mile
age
[km
/l]
Figure 3.2: Fuel consumption plot generated from the raw data. The lines are linearinterpolations between the observations.
data.
• Incomplete Observations – A truck is missing one of more variables that would
be required for analysis. The observations from this individual can not be used
for the calculations.
• Physically impossible changes in accumulative variables – Between two
observations of a single truck, accumulative variables changed to a smaller value.
This means that a later observation in time has a smaller number of total driving
distance than an earlier measurement for example. This is physically impossible,
but observable if the ECU has been replaced or the contents of the ECU were
erased during a software update. This criteria applies to 44 trucks. Although it
is possible to use a subset of the observations from each of these trucks. This was
not done, because the quality of the measurement might have been compromised
and the manual effort of cleaning the data is a time consuming task for very few
usable measurements.
• Empty and Duplicated Observations – Some observations do not contain any
new information, but only seem to be resubmits of earlier or empty observations
with a different time stamp. These particular observations are removed from the
3. The Vehicle Application and Data Description 16
final data, but the remaining observations of this truck are used. Phenomena
like these might occur, when the data aquisition process in the workshop was
interrupted, or a transmission error occurred.
• Early Observations – These observations are too early in the life of the truck
to give a meaningful information. The removal of these observations is moti-
vated by the unusual fuel consumption of a truck in this state. The unusual fuel
consumption is caused by a high number of short trips the truck has to travel
before it can be put into regular service. Examples are drives to paint shops or
truck customizers as well as transfers to the customer. The number of observa-
tions purged when this criteria is set to remove all measurements below 10000
km is 150, when all measurements before 1000km are deleted, the number of
observations drops by 100. See Figure 3.3.
From the 269 initial individual trucks, 56 trucks are removed. In terms of observations,
from originally 20271 observations 1320 remained in the data set, when the lower
border for observations is set to 1000km. See Figure 3.4 for a plot of the cleaned
fuel consumption data. The most visible change to Figure 3.2 is the lower number of
outliers at roughly 0 kilometers, which is mostly an effect of the removal of very early
observations.
3.1.2 Data structure
Some properties of the data make the task of analyzing inherently difficult. Most of
these properties stem from the sparsity of the data. Sparseness in this case means that
every truck has been observed on average just 7.405 times with a standard deviation
of 2.4083 observations. The sparseness of the data is visualized in Figure 3.5.
• The data is not fully observed. The observations of a single truck often are not
scattered over a very long distance in time or driven distance, but measured only
within a short span. The average duration between the first observation of a
truck and the last one, where measurements are taken, is 317841 kilometers with
a standard deviation of 114208 kilometers. The mean focus of the observations
1Excluding incomplete observations, as they are not usable at all.
3. The Vehicle Application and Data Description 17
0 1 2 3 4 5 6 7 8 9
x 105
0
50
100
150
200
250
Distance Driven [km]
Num
ber
of O
bser
vatio
ns
Raw DataCleaned Data
Figure 3.3: This comparison shows the number of observations on the raw data versusthe cleaned data. The overall reduction in the number of observations as well as thelower amount of observations at the beginning is noticeable.
0 1 2 3 4 5 6 7
x 105
1
1.5
2
2.5
3
3.5
4
Driven Distance [km]
Fue
l Mile
age
[km
/l]
Figure 3.4: Fuel consumption plot generated from the clean data. Note the lack ofoutliers at the beginning of the data.
3. The Vehicle Application and Data Description 18
0 1 2 3 4 5 6 7 8 9
x 105
2
2.2
2.4
2.6
2.8
3
Driven Distance [km]
Fue
l Mile
age
[km
/l]
Figure 3.5: The scatter plot in this figure highlights the sparse and irregular distributionof the data. The histograms describe the distribution of the observations along the axes.
is at 303232 kilometers deviating by 133609 kilometers, which means that most
of the trucks are not observed from the beginning, but observed later on in their
life-cycle.
• The density of measurements varies. This implies that the placement of measure-
ments is irregular throughout the duration of their observation. As the trucks are
independent of each other, the times when observations happen are not correlated
with each other. For a visual representation of the irregular duration between the
measurements, see Figure 3.6. This figure indicates a non-normal distribution.
The average distance between observations is 52020 kilometers with a standard
deviation of 61858 kilometers.
• Unsupported curvature. The irregular placement and the sparsity of variables
causes this property to occur. If a part of a curve has a high curvature, which
can be approximated by ‖ d2ydx2‖ or ( d2y
dx2 )2. When this is the case, the relative
resolution of the data at the point of the high curvature should also be high to
enable a good estimation of the underlying function [2].
3. The Vehicle Application and Data Description 19
0 0.5 1 1.5 2 2.5 3 3.5 4
x 105
0
50
100
150
200
250
300
350
Distance Driven between Observations [km]
Num
ber
of O
bser
vatio
ns
Figure 3.6: This figure shows the distribution of distances between two observations ofthe same truck.
3.2 Approach
The first part in analyzing truck data, which is described in section 4.1, is to establish
results with basic multivariate analysis as a basis where the results of functional analysis
can be compared to. This part shows pitfalls and difficulties when applying standard
multivariate methods to the data.
The first possible way for multivariate analysis is feature extraction. It is a difficult
task to find relevant features to extract. A simple statistical feature will be extracted
from the data to be able to give an idea how feature extraction works. The second
possibility for multivariate analysis is to put the observations into bins. This is done
in order to be able to align the data onto a vertical grid.
The second way is necessary, because it is very hard to visualize and convert to the
original data format from the extracted features. However, binning cannot easily be
used for outlier detection. Usually, some of the bins are likely to have only a low
number of observations which makes outlier determination in this bin very difficult. If
the bins are made larger, multiple – or even all – observations of a single truck might
be put into a single bin. This leads to increased difficulty in differentiating between
normal and outlying observations.
3. The Vehicle Application and Data Description 20
These steps should lead to two results : A simple outlier detection, based on a clustering
of the extracted features and a variance and mean estimation for the data, based on
the binned data.
The task of estimating fuel consumption behavior for a single truck, outside of its
observation duration using the extracted features is very hard. This is because the
mapping between the values of the features and a function is not available. Addition-
ally, information from other, similar trucks is not taken into consideration.
The last step in Basic Analysis (Sect. 4.1) is a demonstration of the main problem
of applying FDA on the data at hand: The difficulty of fitting a function to a single
truck.
The main task of this thesis is to apply the PACE algorithm to the data (Sect. 4.2), and
to try out the various options within the PACE algorithm. In this section, the results of
PACE in general will be assessed, the difference between PACE with different options
in regard of the PACE generated functions as well as general statistical properties, such
as the mean function.
The first advantage in using the PACE algorithm in comparison to the basic methods
is the lack of need to pre-process data, i.e. to extract features or otherwise process the
data. This non-parametric input of the data is complimented by a number of options
to tune the algorithm itself for various needs (amount of information retained, if the
input data has measurement errors, etc.).
The next step is to try out a number of methods which can be applied to the results of
PACE. For example to calculate the possibility of the fuel consumption of a particular
truck, given all the other trucks.
PACE enables the user to analyse the sparse and irregular data at hand, enabling the
use of additional techniques from FDA, whereas using only multivariate data analysis
or normal FDA on the same data is very difficult to do and does not incorporate the
information gathered from the other trucks.
PACE makes outlier detection, estimation of the function outside the observation du-
ration and the gathering of common statistical properties, like mean and variance in
functional form, from sparse and irregular data a lot easier or even possible.
4
Results
4.1 Basic Data Analysis
The aim of this section is to provide an overview of basic multivariate analysis possi-
bilities with the available data. Functional methods are applied from Section 4.2.
4.1.1 Data Binning
One approach, as described in the previous chapter, is the creation of a vertical grid
for the data domain followed by binning the data into a limited number of “buckets”
along the time or distance axis, similar to creating a histogram. If there is more than
one observation of a truck in one of these bins, an average of these measurements is
put into the bin. This has to be done to avoid biasing in case of dense observations of
a truck within a short timespan.
The size and the quantity of the bins is crucial for binning. With the data at hand, 25
bins were used, which results in a size of 36087 kilometers per bin.
In Figure 4.1 the number of observations per bin, as well as an estimation of the mean
function and the variance of the data can be seen.
In Figure 4.2 a boxplot of the binned data and one of the results of bootstrapping [1]
the mean value per bin (10000 bootstrap samples) are illustrated.
21
4. Results 22
0 5 10 15 20 250
20
40
60
80
100
120
140
Bins
Obs
erva
tions
Histogram
0 2 4 6 8
x 105
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3Mean and Deviation
Distance Driven [km]
Fue
l Mile
age
[km
/l]
Figure 4.1: The histogram depicts the number of observations per bin. Especially thefirst and the last few bins have a very small number of observations, which leads tothe abnormal results in these bins in the mean and standard deviation figure on theright. This figure shows the mean as well as the standard deviation estimated from thebinned data.
4. Results 23
5 10 15 20 25
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
3.1
Bin
Val
ues
Binned Data Boxplot
5 10 15 20 25
2.35
2.4
2.45
2.5
2.55
2.6
2.65
2.7
2.75
2.8
Bin
Val
ues
Bootstrapped Mean Boxplot
Figure 4.2: The figures show boxplots for the binned data (left) and bootstrapped meanvalues (right). The left boxplot is a simple plot of the raw binned data, providing aneasy visualization. The right boxplot is generated by bootstrapping the mean of eachbin 10000 times. Bootstrapping should give an idea of how much the mean can vary,if new data has the same distribution as the data at hand.
4. Results 24
4.1.2 Feature Extraction
The features which are retrieved from all observations of a single truck, are used to
construct a simple outlier detector with hierarchical clustering.
The goal of this simple outlier detector is to find trucks, whose mean is deviating
significantly from the mean of the entire data. A single extracted feature was used in
this case:
∆Truck = (µTruck − µAll)2
The data was then clustered with a hierarchical algorithm, using average distance
linking. The outlying classes were subjectively selected by looking at the resulting
dendrogram. For the results, see Figure 4.3.
4. Results 25
1 5 2 4 3 7 6
0.05
0.1
0.15
0.2
0.25
Dendrogram
Class
Cla
ss D
ista
nce
0 2 4 6 8
x 105
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
3.1
Distance Driven [km]
Fue
l Mile
age
[km
/l]
Plot
Figure 4.3: Results of outlier detection based on feature extraction. The left figureshows the dendrogram of the clustering algorithm. This figure shows that the class 6 isan extreme outlier, whereas the classes 3 and 7 are also quite different from the mainpart of the data. The basis for this classes being outliers is a vastly different mean fromthe rest of the data. In the other figure, the outlying clusters are highlighted. Theextreme outlier is marked red, the normal outliers are marked green and the normaldata is colored blue. The class 3 has 5 members, whereas the other outlier classes havejust 1 member. The classes 1 and 5 have 114 respective 52 members. Class 2 has 27members, whereas 4 has 13 members.
4. Results 26
4.1.3 Function Fitting
Finding a plausible function that is fitting the data of the trucks well is difficult, because
of the open-ended nature of the measurements. If a set of observations have a defined
start and an end of their measurements, i.e. the data is fully observed, it is easy
to interpolate the data in between, even if the data within this span is sparse. This
property of the data at hand is also discussed in Section 3.1.
If the set of data is not fully observed, it is almost impossible to get a reliable fit outside
the observation span of a single entity. This reliable fit outside of this span is necessary
for performing FDA on this data, as FDA needs the same set of basis functions, or in
the case of spline interpolation, the same knots for all functions to work.
It was not possible to get a good fit on this data with splines, where all of the knots are
distributed the same for all truck entities. Also, polynomial fits, i.e. the approximation
of the data with low (< 5) order did not result in a stable fit for the available data.
The most reliable fit under these conditions were generated by fitting a linear function
to the fuel consumption observations. These results in fitting the sparse and irregular
data motivate the idea of combining the observations by the means of PACE, to be
able to get better fits from the reconstructed trajectories.
The results of fitting a straight line to the data can be seen in Figure 4.4.
4. Results 27
2 4 6 8
x 105
1.5
2
2.5
3
3.5
Distance Driven [km]
Fue
l Mile
age
[km
/l]
2 4 6 8
x 105
1.5
2
2.5
3
3.5
Distance Driven [km]
Fue
l Mile
age
[km
/l]
Figure 4.4: On the left, all fitted straight lines are shown. The right figure shows themean straight line along with the standard deviation of the slope and the offset (blue)and the standard deviation of just the offset (dashed). The main problem with thisstraight line fit are a number of fits with high gradients, which are not valid outsidetheir observation span. However, the mean line shows a slight increase in fuel economy,just like the mean curve from PACE (Figure 4.5).
4. Results 28
4.2 Application of PACE
The goal of this section is to elaborate on the application of the PACE method on the
truck data, focusing only on fuel consumption per kilometer over the distance axis.
Along with the results of this first application, some options available for a fine-tuning
of the method will be presented and a general estimate of variability will be given.
4.2.1 Baseline PACE Results
The data in use for this initial run of the PACE method is the cleaned set, with all the
trucks removed which have less than 2 observations. Additionally every observation,
that happened before a threshold of 10000 km has been removed. The PACE method
has some interchangeable sub-methods. For the baseline results, mostly the same
parts as in the original method described in [4] were used. Thus, the kernel used for
smoothing the mean function is the Epanechnikov kernel [4] and the input data is
assumed to contain measurement errors.
A small discrepancy to the original method is the choice of using Fraction of Variance
Explained1 (FVE), instead of the Akaike Information Criterion [1] (AIC) to select the
number of PCs. The FVE threshold is set at 95 % of variance explained.
Regarding Figure 4.5, the smoothed mean curve should be taken with a grain of salt,
especially the variance plots and the measurement density plots in Figure 3.3 should
be considered. The number of PCs selected by FVE is 8, which accounts for 96.57 %
of the total variation. The scree plot (Section 2.1.4) of the principal components from
this analysis can be seen in Figure 4.6. The first, strong principal component is almost
a straight line, which is basically shifting the mean from its starting point closer to
the position of the measurements. The second and the fourth principal component
seem to serve partially as corrective for trucks with a higher initial fuel economy than
the average truck. The smoothed covariance matrix generated and used by PACE is
visualized in Figure 4.7 by a color-matrix.
1The sum of the eigenvalues of a certain number of eigenfunctions divided by the sum of alleigenvalues has to exceed a certain threshold. The first number of PCs which exceeds this thresholdis subsequently used.
4. Results 29
0 2 4 6 8
x 105
2.45
2.5
2.55
2.6
Distance Driven [km]
Fue
l Mile
age
[km
/l]
Smooth mean curve
0 2 4 6 8
x 105
−3
−2
−1
0
1
2
3
4x 10
−3
Distance Driven [km]
Principal Components
55.72 % 11.88 % 8.65 %4.03 %
Figure 4.5: The smooth mean function generated by PACE (left) is the basis for allother results. The four most significant PCs (right) are the strongest ways in whichthe individual trucks vary. The legend quantifies the strength of the PCs.
4. Results 30
0 5 10 15 20 25 30 35 40 45 5080
10
20
30
40
50
60
70
80
90
10096.57
Number of principal components
Fra
ctio
n of
var
ianc
e ex
plai
ned
(%)
Scree Plot
Figure 4.6: The scree plot, which highlights the trade-off between the number of PCsused versus the variance retained. The use of more than 10 PCs makes little sense, asthe Fraction of Variance Explained (FVE) is not improving much.
4. Results 31
0 2 4 6 8 10
x 105
0
1
2
3
4
5
6
7
8
9x 10
5
Distance Driven [km]
Smoothed Covariance Matrix
Dis
tanc
e D
riven
[km
]
−0.04
−0.02
0
0.02
0.04
0.06
0.08
Figure 4.7: The smoothed covariance matrix generated by PACE. (The diagonal, whichis the variance, has been removed prior to smoothing.) The main part of the matrixshows a small positive covariance (green).
4. Results 32
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2Vehicle # 14
Fue
l Con
sum
ptio
n [k
m/l]
2 4 6 8
x 105
Vehicle # 106
2 4 6 8
x 105
Vehicle # 92
2 4 6 8
x 105
Vehicle # 72
2 4 6 8
x 105
Vehicle # 4
Figure 4.8: These plots exhibit the mean curve(red), the corresponding original obser-vations(green) and the reconstructed curve(blue). Vehicle 14 and 106 have high valueson all major PC scores, under opposite prefixes. Number 92 has the lowest PC scoresoverall; Trucks 72 and 4 have average PC scores. High PC scores lead to extremevalues, especially on the strong first PC.
From the estimated PCA scores, the mean function µ and the principal component
functions, the individual traces of the trucks can be reconstructed, which should give a
rough estimate on the behavior of the truck. A number of selected reconstructions can
be viewed in Figure 4.8 and a collection of all traces and the original measurements
can be seen in Figure 4.9.
As a next step, for an analysis of the results, the goodness-of-fit of the original mea-
surements versus the reconstructed traces is assessed. To estimate the goodness-of-fit,
the mean squared error [1] between the discrete observation and the estimated re-
construction is considered. However, the irregular measurement intervals are making
assessment of the results difficult.
In Figure 4.10 some examples of bad fits are explained. Just taking the mean of the
mean square error (MSE) of all observations of one truck is prone to skewing, as well
as just summing up the MSE for each single truck. A more sensible approach to
4. Results 33
0 1 2 3 4 5 6 7 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2
Distance Driven [km]
Fue
l Mile
age
[km
/l]
Figure 4.9: This graph shows all reconstructed traces (gray) and original measurements(blue). Note how the traces tend to follow the observations, especially when the relativeoccurrence of observations is low.
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2Vehicle # 73
Fue
l Con
sum
ptio
n [k
m/l]
2 4 6 8
x 105
Vehicle # 102
2 4 6 8
x 105
Vehicle # 106
2 4 6 8
x 105
Vehicle # 202
Figure 4.10: As described in the text, these figures depict misfitted trucks. Vehicle#73 and #106 show trucks which provide bad fits, whereas #102 is a truck which isonly identifiable as misfit when median mean square error (MSE) is applied. Truck#202 is a counter-example, where the misfit is more noticeable when the mean MSEis used.
4. Results 34
Method Max. MSE Mean MSE Median MSE Std. MSE
Mean MSE per Truck 0.189% 0.0343% 0.0209% 0.0383%Median MSE per Truck 0.238% 0.0215% 0.0096% 0.0331%
All Observations Pooled 0.679% 0.0310% 0.0089% 0.0629%
Table 4.1: MSE of the reconstructed traces by PACE and the original observationswith 8 PCs. In the last column, the standard deviation of the MSE is given.
get reliable error measurements is to use the median of the individual MSE as error
measure. A good example of a bad fit is truck #102 (Figure 4.10), which is, when the
median MSE is used, the third worst fitting truck, in contrast to mean MSE, where
the truck is ranked 63rd.
A counter-example is provided by vehicle #202 which is ranked 3rd using the median
and 19th with mean MSE. In this example, one of the observations is a strong outlier,
which is influencing the median MSE, because of the low number of observations on
this truck.
Because both measurement methods have their respective merits, both are used for
judging the fit of the individual trucks. In addition to these two methods, which view
the trucks as separate entities, all truck observations will be pooled and the overall
MSE is given. The results can be seen in Table 4.1
After the establishment of these baseline results, various parts of the PACE method
can be changed to see their influence on the results.
4.2.2 Number of Principal Components
As a first variation, the number of PCs will be varied and the resulting MSE table will
be compared. Also, a visual comparison will be offered. In addition to the baseline
threshold of FVE – 0.95, thresholds of 0.75, 0.85 and 0.9999 will be subject to this
experiment. The difference between the baseline and the variants is just the number of
PCs. With a lower number of PCs, the MSE in the data should be higher. The problem
of using a higher number of PCs, which result in a lower MSE, probably cause a worse
performance in generalisation, i.e. the principal components might be over-fitted to
the existing data.
4. Results 35
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2Vehicle # 14
Fue
l Con
sum
ptio
n [k
m/l]
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2Vehicle # 105
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2Vehicle # 92
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2Vehicle # 72
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2Vehicle # 4
8 PC3 PC4 PC29 PC
Figure 4.11: These plots show how much reconstructed traces vary with different num-bers of PCs involved.
Method Max. MSE Mean MSE Median MSE Std. MSE
Mean MSE per Truck 0.2653% 0.0451% 0.0264% 0.0514%Median MSE per Truck 0.255% 0.0296% 0.0139% 0.0430%
All Observations Pooled 0.729% 0.0421% 0.0121% 0.0818%
Table 4.2: MSE of the reconstructed traces with 3 PCs (76.69% variance retained).
Thus, the only difference to the baseline result is the number of PCs, graphs of the
mean function and the PCs themselves will be omitted. Only the MSE table and the
reconstructed trajectories of selected trucks will be shown. For the baseline table, see
Table 4.1 and for a comparative visualization of reconstructed trajectories see Figure
4.11
As expected, the MSE results from the variations (Tables 4.2, 4.3, 4.4) perform anal-
ogous to the scree plot visible in Figure 4.6. When using a lower number of PCs the
Method Max. MSE Mean MSE Median MSE Std. MSE
Mean MSE per Truck 0.233% 0.0405% 0.0247% 0.0453%Median MSE per Truck 0.263% 0.0260% 0.0110% 0.0401%
All Observations Pooled 0.800% 0.0374% 0.0102% 0.0756%
Table 4.3: MSE of the reconstructed traces with 4 PCs (85.34% variance retained).
4. Results 36
Method Max. MSE Mean MSE Median MSE Std. MSE
Mean MSE per Truck 0.179% 0.0319% 0.0196% 0.0358%Median MSE per Truck 0.220% 0.0200% 0.0084% 0.0309%
All Observations Pooled 0.635% 0.0286% 0.0081% 0.0578%
Table 4.4: MSE of the reconstructed traces with 29 PC (99.99% variance retained).
error increases, whereas a high number of used principal components do not necessar-
ily boost the error performance much. This means, the scree plot and the fraction of
variance retained is proportional to the size of the MSE.
4.2.3 Error Assumptions in PACE
There are two possibilities to tune the behavior of PACE regarding “measurement
errors”:
• The assumption that the observations are containing no ”measurement errors”.
• In addition to the presence of ”measurement errors”, the estimated errors are cut
off at the quartiles for the estimation of the error variance σ.
The notion of ”measurement errors” in this context is a bit misleading, as PACE
assumes a underlying smooth function. The accumulative fuel consumption data is
precise enough, but the variation of the observations around this smooth function can
be considered as noise. The assumptions on measurement error mostly influence the
calculation of the PC scores.
The previous results are containing the results for PACE with assumed measurement
errors without cut off. Thus, his section just covers the other two possible modes of
operation are covered in this section.
4. Results 37
Method Max. MSE Mean MSE Median MSE Std. MSE
Mean MSE per Truck 0.173% 0.0316% 0.0199% 0.0350%Median MSE per Truck 0.214% 0.0195% 0.0087% 0.0298%
All Observations Pooled 0.628% 0.0286% 0.0081% 0.0584%
Table 4.5: MSE of the reconstructed traces with 8 PC. For the estimation of errorvariance, all data outside the quartiles were cut off.
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2Vehicle # 14
Fue
l Con
sum
ptio
n [k
m/l]
2 4 6 8
x 105
Vehicle # 105
2 4 6 8
x 105
Vehicle # 92
2 4 6 8
x 105
Vehicle # 72
2 4 6 8
x 105
Vehicle # 4
Figure 4.12: Reconstructed traces of selected trucks with no measurement error as-sumed. The influence of this assumption can be seen clearly in Vehicle #105, wherePC scores are maximized to fit at the observation points.
Baseline PACE with error cut off:
Table 4.5 shows that the MSE for with error cut off is almost as small as the MSE with
29 PC, which is a clear improvement over the baseline. Basically the additional cut off
leads to reduced outliers, which seems to improve the performance in comparison with
baseline results.
Using PACE under the assumption of zero measurement error:
As there is no assumed measurement error, the MSE reconstruction error is very small.
But the problem with this tight fit is, that reconstruction is only accurate at the original
measurement points. This is shown in Figure 4.12.
4. Results 38
0 1 2 3 4 5 6 7 8 9
x 105
2.45
2.5
2.55
2.6
2.65
2.7
Driven Distance [km]
Fue
l Con
sum
ptio
n [k
m/l]
Epanechnikov kernel Rectangular kernelGaussian Kernel
Figure 4.13: This Figure shows the effects of using different kernels for smoothing themean curve µ. The Gaussian kernel produces a very smooth mean curve, whereas therectangular kernel picks up noise from the measurements. Epanechnikov produces acompromise to these kernel variants.
4.2.4 Different Kernel Functions
Usually, the Epanechnikov kernel [4] is the standard choice for the smoothing steps in
the PACE method. This kernel function has a compact basis and definitive endings.
Alternative choices of kernels are rectangular and Gaussian kernels[4]. Whereas the
rectangular kernel has definitive endings, the Gaussian kernel extends to infinity. For
the smooth mean curve and the principal components the rectangular kernel has the
effect of adding some noise to the curves, whereas the Gaussian kernel has stronger
smoothing properties. In Figure 4.13 all three mean curves and in Figure 4.14 the three
most significant PC curves are visible.
In comparison, the overall MSE of the pooled data is slightly higher for PACE with
a rectangular kernel than it is with a Epanechnikov kernel (0.0351% mean, 0.0113%
median in the rectangular case versus 0.031% mean, 0.0089% median with the Epanech-
nikov kernel). In the Gaussian case, the fit is worse than with the other kernels
(0.0468% mean, 0.0162% median)2.
2These results were archived with 8 principal components.
4. Results 39
1 2 3 4 5 6 7 8
x 105
−3
−2
−1
0
1
2
3
4x 10
−3
Distance [km]
Principal Components
Rectangular PC1Rectangular PC2Rectangular PC3Epanechnikov PC1Epanechnikov PC2Epanechnikov PC3 Gaussian PC1Gaussian PC2Gaussian PC3
Figure 4.14: This Figure shows the effect of using different kernels for smoothing onthe PCs. The order of the PCs is visualized by the thickness of the lines, i.e. thethickest line depicts the first principal component. Generally, the same observationsas in Figure 4.13 apply.
4.2.5 Variances
There are two different variances in the results, model and data variance.As the different
name indicates, the variances result from different sources, and therefore must be
handled differently.
The model variance is based on the question of how sure we are of a model. A gen-
eralisation of this variance would be to do leave-one-curve-out cross validation on the
smooth mean curve. This enables a visualization of how much influence a single curve
has on the overall result of the mean curve or the principal components.
The data variance represents the density of measurements in a certain part of the curve.
For calculating the variance and the confidence interval of a certain part of the curve,
the number of trucks that influence a part of the curve has to be given. There are two
different approaches to this:
One approach is to bin the data and calculate the variances of the bins as shown in
Section 4.1, and use them to get approximate results for the variance.
4. Results 40
Another approach is the use of reconstructed curves as a basis for calculating the
variances. There are two different implementations to this approach. Either the recon-
structed curves are taken into account only within the interval of their real observations,
i.e. incorporate only observations which are relevant for this particular interval. The
other approach is to use the complete reconstructed curves, which ignores the number
of real observations in a part of the curve.
4.2.5.1 Model Variance
The data used for this experiment is generated by PACE which uses 8 principal compo-
nents and every observation which happened before the truck had run 1000 kilometers
was removed. The method which is used to generate the necessary data to analyze the
model variance is leave-one-curve-out cross validation. This validation method gener-
ates a result from PACE for excluding one truck at a time in the data. Thus, there are
as many PACE results generated as there are trucks.
Model variance gives us two results. The first result is a ranking of the most influential
trucks to the mean curve or a PC. This result can be found when a PACE result which
excludes a particular truck and the overall PACE result is compared.
Additionally, model variance gives the distribution and variation for a particular result
from PACE. In Figure 4.15 the very peaky distribution of all the mean curves at
different points can be seen. Figure 4.16 shows all leave-one-out mean curves, the
overall mean curve µ and the standard deviation σ curves. The average deviation of
σ from µ is 0.0016 km/l, the maximal deviation 0.0039 km/l. Figure 4.17 is a fuel
consumption plot of interesting vehicles in regard of their influence on the mean curve
or on the PCs.
4.2.5.2 Data Variance
Analysing data variance gives an idea on the variation of the data around the mean
function. As mentioned before, there are two methods to accomplish the inclusion: The
first, simpler method is to use the data from binning. The other method, outlined in
this part of the thesis is, to consider only the segments of the reconstructed data where
4. Results 41
2.5 2.55 2.60
20
40
60
80
100
120
140
160
180
Fuel [km/l]
Num
ber
of C
urve
s
50000 km
2.5 2.55 2.60
20
40
60
80
100
120
140
160
180
Fuel [km/l]
225000 km
2.5 2.55 2.60
20
40
60
80
100
120
140
160
180
Fuel [km/l]
400000 km
2.5 2.55 2.60
20
40
60
80
100
120
140
160
180
Fuel [km/l]
575000 km
2.5 2.55 2.60
20
40
60
80
100
120
140
160
180
Fuel [km/l]
750000 km
Figure 4.15: The distribution of all mean curves generated with the leave-one-curve-outmethod at various points. Two properties of the mean curves are visible, namely thepeakiness of the distribution and the higher deviation from the mean at 50000 km andat 750000 km.
1 2 3 4 5 6 7 8
x 105
2.5
2.52
2.54
2.56
2.58
2.6
2.62
2.64
Distance [km]
Fue
l Con
sum
ptio
n [k
m/l]
Figure 4.16: All mean curves generated by leave-one-out-cross-validation and the orig-inal µ (blue) and σ (red) curves. The small distance between the σ curves and the µcurve highlight the density of curves around the mean.
4. Results 42
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2Vehicle # 43
Fue
l Con
sum
ptio
n [k
m/l]
2 4 6 8
x 105
Vehicle # 15
2 4 6 8
x 105
Vehicle # 88
Distance [km]2 4 6 8
x 105
Vehicle # 23
2 4 6 8
x 105
Vehicle # 186
Figure 4.17: Plot of trucks with a high influence on the results of PACE. Trucks 43and 15 have a strong influence on the µ curve since they provide data at he end ofall observations where data is very sparse. Vehicle 88 is the truck with the smallestinfluence on µ. It has both a small observation duration and average measurements.Truck #23 has the highest influence on the first PC and truck #185 has the smallestinfluence on the first PC.
real data support is existent. Both results can be seen in Figure 4.18. Both methods
deliver a similar result. The main difference is the the resolution of the result based on
PACE, which is much higher. However, unlike the binning results, the estimated data
between the observations is also incorporated into the variance results, which means
that regions with low data support are also represented in the variance.
4.3 Prediction of Fuel Consumption with PACE
Prediction in this case essentially is the usage of the reconstructed trajectories from
the PC scores to guess the fuel consumption of a truck at a certain point3.
The baseline to measure the effectiveness of the prediction, the value of the last mea-
surement available will be used as the predicted value. This straight assumption is
good on the accumulative data because the fuel consumption is usually developing in
an almost straight line.
3If the data — unlike the available truck data — is not open ended, an alternative to the directuse of the trajectories would be the use of regression methods [5].
4. Results 43
0 1 2 3 4 5 6 7 8
x 105
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
Distance Driven [km]
Fue
l Mile
age
[km
/l]
0 1 2 3 4 5 6 7 8
x 105
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
Distance Driven [km]
Fue
l Mile
age
[km
/l]Figure 4.18: The standard deviation extracted from the binned data is visible on theleft. The right graph shows the standard deviation of of the data, reconstructed fromthe observation duration and the trajectories regenerated from the PC scores of PACE.
For testing the prediction of new observations, the last observation of the truck to
predict is removed from the data, and the PACE results are calculated without it. The
prediction at the time of the removed observation is taken from these results. This
procedure is done for each available truck.
In this general prediction test, straight line prediction produces a maximum error of
5.04%, a mean error of 0.58% with a standard deviation of 0.81%. Whereas using
the reconstructed trajectories to predict produces a maximum error of 5.49%, a mean
error of 1.25% with a standard deviation of 1.06%.
These results emphasize the straight nature of data. In general, these results show
that it is better to assume steady continuing fuel consumption behavior for forward
prediction.
If the trajectory is used as the true reference instead of the real observations, straight
line prediction has a maximum error of 2.81%, a mean error of 0.48% with a standard
deviation of 0.56%. Prediction using PACE produces a maximum error of 2.31%,
a mean error of 0.38% with a standard deviation of 0.40%. These results are not
necessarily an indicator for prediction quality, but for result stability in the case of a
single missing observation.
4. Results 44
Using the reconstructed trajectories directly for prediction is affected by the assumption
of the presence of a measurement error – i.e. a basic underlying deviation even at the
points with known observations, and the bad fit which usually occurs when dealing with
outliers. However, given the relatively constant measurements of individual trucks
and the preexisting error between the actual observations and the trajectories, the
prediction works and is quite stable regarding the removal of observations.
4.4 Detection of Outliers with PACE
The main idea behind outlier detection with PACE, in particular with the PC scores,
is to be able to quantify how normal and likely the fuel consumption behavior of a
single truck is.
The first step in quantifying this probability, the distribution of the PC scores has to
be found. In this case, the scores are normally distributed, which can be seen in Figure
4.19. This makes the calculation of probabilities for a single PC score easily possible.
By using the probabilities from just the first principal component, the same outliers
as with simple feature extraction (Section 4.1.2) can be found. Basically, the same
outliers can be found by just using the raw PC scores.
However, if the probabilities of several PCs are calculated, it is possible to calculate the
“normality” of a truck. The distribution of these probabilities, with a varying count of
PCs used can be seen in Figure 4.20. In Figure 4.21 a few example fuel consumption
plots of trucks along with their normality can be seen. For these samples, the weighted
four first PCs were used.
In comparison to clustering the data with extracted features, as in Section 4.1, this
approach delivers a probability value and not a grouping of data. The probability
value provides finer increments in comparison to the clusters. The normality method is
coupled tightly to the PCs calculated with the help of PACE, whereas clustering works
on arbitrary extracted features. However, for finding outliers in the data, calculating
the normality is much more non-parametric.
4. Results 45
−200 0 2000.0030.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99
0.997
Data
Pro
babi
lity
PC 1
−40 −20 0 20 400.0030.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99
0.997
PC 2
Data
Pro
babi
lity
−50 0 500.0030.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99
0.997
Data
Pro
babi
lity
PC 3
−20 0 200.0030.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99
0.997
Data
Pro
babi
lity
PC4
Figure 4.19: These are the normal probability plots for the first four PCs. The dashedred line represents the ideal normal distribution, whereas the blue crosses are the actualobservations.
4. Results 46
0 0.5 10
20
40
60
80
100
120
Probability
Num
ber
of O
bser
vatio
ns
1 PC
0 0.5 10
20
40
60
80
100
120
Probability
3 PCs
0 0.5 10
20
40
60
80
100
1204 PCs
Probability0 0.5 1
0
20
40
60
80
100
1204 PCs (weighted)
Probability
Figure 4.20: These histograms show the likelihoods for the occurrence of a single truckwith different counts of PCs used for the calculation. When multiple PCs are used,the result is given by the product from the probabilities of all principal components.In the rightmost histogram the likelihoods are weighted by the eigenvalues of the PCs.
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2
Vehicle # 88Prob. 78,84%
Dist. [km]
Mile
age
[km
/l]
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2
Vehicle # 92Prob. 75,60%
Dist. [km]2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2
Vehicle # 4Prob. 50,74%
Dist. [km]2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2
Vehicle # 72Prob. 53,48%
Dist. [km]
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2
Vehicle # 6Prob. 25,93%
Dist. [km]
Mile
age
[km
/l]
2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2
Vehicle # 16Prob. 18,20%
Dist. [km]2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2
Vehicle # 14Prob. 2,360%
Dist. [km]2 4 6 8
x 105
2
2.2
2.4
2.6
2.8
3
3.2
Vehicle # 106Prob. 11,84%
Dist. [km]
Figure 4.21: These graphs depict trajectories of several trucks, along with their nor-mality. The normality describes how average the fuel consumption of one truck isregarding to all other trucks.
4. Results 47
2 4 6 8
x 105
40
45
50
55
60
65
70
Distance [km]
Ave
rage
Spe
ed [k
m/h
]
Mean Function
2 4 6 8
x 105
−4
−3
−2
−1
0
1
2x 10
−3
Distance [km]
PCs
0 2 4 6 8
x 105
0
10
20
30
40
50
60
70
80
90
100Observations
Distance [km]
PC1 43% PC2 16%PC3 13%PC4 7%
Figure 4.22: This figure shows the mean curve of average vehicle speed, the PCs and ascatter-plot of all available observations. The mean curve is an indicator, that truckswith an high odometer count have a higher average speed.
4.5 Expansion of our Application
As an example of the application of PACE on other data the average vehicle speed is
used. Furthermore, in this section the PACE method is used on cyclic fuel consumption
data, even if PACE was developed for longitudinal data.
The results of PACE on the average vehicle speed can be seen in figure 4.22. In
comparison to the results from the fuel consumption data, the speed data has also a
similar distribution of the observations.
The most interesting outcome from the analysis of seasonal fuel consumption was not
the mean curve, but the second and the third principal component, which seem to
resemble high fuel efficiency in spring and autumn. Based on those two PCs, it is
possible to calculate the strength of those two seasonal effects in respect to the other
trucks.
4. Results 48
0 100 200 3002.5
2.52
2.54
2.56
2.58
2.6
Day of the Year
Fue
l Con
sum
ptio
n [k
m/l]
Mean Function
0 100 200 300−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
Day of the Year
Principal Components
Fue
l Con
sum
ptio
n [k
m/l]
PC1 PC2PC3PC4
Figure 4.23: The left figure shows the mean fuel consumption when the fuel consump-tion is observed over the year. The peak at the very beginning and the very end of theyear is probably caused by the lack of observations at this time of the year. On theright, the first four PCs are shown. The first PC is a linear offset, whereas the secondand the third components probably show seasonal effects.
In Figure 4.23 the mean curve as well as the strongest PCs can be seen. Traces of
interesting trucks in relation to the strength of their seasonal effects can be seen in
Figure 4.24. Note that the data was collected in up to just three years, so these results
might be affected by a mild winter or a rainy summer.
4. Results 49
0 2002
2.2
2.4
2.6
2.8
3
3.2Vehicle # 121
Day of the Year
Fue
l Con
sum
ptio
n [k
m/l]
0 2002
2.2
2.4
2.6
2.8
3
3.2Vehicle # 192
Day of the Year0 200
2
2.2
2.4
2.6
2.8
3
3.2Vehicle # 190
Day of the Year0 200
2
2.2
2.4
2.6
2.8
3
3.2Vehicle # 6
Day of the Year
Figure 4.24: Vehicle #121 is exhibiting no seasonal dependencies, which suggests aconstant climate. Vehicle #192 has a high PC2 score, i.e. a high fuel efficiency in thespring. Vehicle #190 has a low fuel efficiency in the winter and has high values inboth PC2 and PC3 scores. Vehicle #6 has a low fuel efficiency only in spring. Thiscan be because a change in utilisation of this truck or those measurements are from adifferent year.
5
Discussion
A natural perspective for a possible continuation of this work would be the expansion
of the method applications to different datasets, especially to more specialized ones.
For example, data in a similar quantity as the one used, but from a corporate fleet of
quasi identical trucks which are in service within the same climate zone with similar
loads, etc. This data would likely be better suited for research on detecting trends,
as well as detection of trucks with unexpected behavior, i.e. outliers. An example
for such a data set would be data from Scandinavian long distance trucks, where the
differences in fuel consumption between summer and winter should be clearly visible.
To further research on seasonal variation of fuel consumption, an expansion of PACE
onto cyclic data might be useful.
Furthermore the analysis of data containing more observations might be interesting, as
a lot of small underlying influences in the fuel consumption data would be uncovered.
With these datasets, research on the asymptotic properties, as well as the distribution
of the data would be more useful as with the small amount of mixed data at hand.
With denser data, it might also be viable to switch to calculate fuel consumption based
on the fuel amount used between two observations instead of using the fuel amount
consumed since the truck was manufactured.
50
6
Conclusion
Sparse and irregular data is hard to analyse with multivariate and functional statistics.
The steps which make analysis with those two approaches hard are feature extraction
and data interpolation. Feature extraction needs a careful extraction of relevant fea-
tures. The manual work in this case is not very desirable. The main problem in doing
interpolation on this data is the open-ended nature. Outside the given observations
for a single truck it is very difficult to estimate a function without knowledge on the
underlying model.
However, it is possible to analyse such data in a functional way if the Principal Com-
ponents Analysis through Conditional Expectation (PACE) method is used. If the
Gaussian assumptions made by PACE are acceptable, the methods provide a complete
data centric approach to extract a mean curve and principal components from the data
as well as complete trajectories regenerated from the principal component scores of the
individuals.
These results can be used as basis for further analysis, such as classification and re-
gression. While these tasks can also be approached with feature extraction, PACE uses
all data available and is much more non-parametric. Also, the functional approach of
PACE keeps the data in an more natural format than the format of abstract extracted
features of the individuals.
Most of the variation in the data used in this work (long-distance articulate truck
fuel consumption data) can be captured with a small number of principal components.
51
6. Conclusion 52
However, the data does not contain highly significant general trends and easily separa-
ble clusters. Some outlying individuals are contained in this data, but because of the
meta-data nature of the fuel consumption, it is not possible to distinguish between a
possible truck fault and environmental influences.
Fuel consumption is difficult to predict, as it can change very rapidly when the en-
vironment changes. The available truck data has no definitive end or start. Samples
were taken at arbitrary times and are connected by just the truck configuration. Thus
prediction for this data can just give an educated guess about the fuel consumption
based on the data from the other trucks, not for this individual truck.
However, it is possible using the principal component scores, to calculate if a truck is
normal – relative to its peers – or if it is is relatively unlikely to happen, i.e. an outlier.
The data-centric approach of these methods is bound by the quality and quantity data
used, which is true for all statistical methods. To overcome difficulties with sparseness
and irregularities in the observations the methods described above are very reasonable
and enable functional analysis of this data.
Bibliography
[1] Hastie, T., R. Tibshirani, and J. Friedman: The Elements of Statistical Learning.
Springer Verlag, 2001.
[2] Ramsay, J. O. and B. W. Silverman: Functional Data Analysis. Second Edition.
Springer Verlag, 2006.
[3] Ferraty, F. and P. Vieu: Nonparametric Functional Data Analysis: Theory and
Practice. Springer Verlag, 2006.
[4] Yao, F., H.G. Muller, and J.L. Wang: Functional Data Analysis for Sparse Longi-
tudinal Data. Journal of the American Statistical Association, 100(470):577–591,
2005.
[5] Liu, B. and H.G. Muller: Functional Data Analysis for Sparse Auction Data.
2007. http://www.smith.umd.edu/ceme/statistics/Liu_Muller_FDA%20for_
Sparse_Auction_Data.pdf Preprint published online. Retrieved 2007-11-25.
[6] Sandberg, T.: Heavy Truck Modeling for Fuel Consumption Simulations and Mea-
surements. Master’s thesis, Division of Vehicular Systems, Department of Electri-
cal Engineering, Linkoping University, Linkoping, 2001.
[7] Stodolsky, F., L. Gaines, and A. Vyas: Analysis of Technology Options to Reduce
the Fuel Consumption of Idling Trucks. Technical report, ANL/ESD-43, Argonne
National Lab., IL (US), 2000.
[8] James, GM, TJ Hastie, and CA Sugar: Principal component models for sparse
functional data. Biometrika, 87(3):587–602, 2000.
53
Bibliography 54
[9] Zou, H., T. Hastie, and R. Tibshirani: Sparse principal component analysis. Jour-
nal of Computational and Graphical Statistics, 15(2):265–286, 2006.
[10] Hall, P., H.G. Muller, and J.L. Wang: Properties of principal component methods
for functional and longitudinal data analysis. Ann. Statist, 34(3):1493–1517, 2006.
[11] Yao, F., H.G. Muller, A.J. Clifford, S.R. Dueker, J. Follett, Y. Lin, B.A. Buchholz,
and J.S. Vogel: Shrinkage Estimation for Functional Principal Component Scores
with Application to the Population Kinetics of Plasma Folate. Biometrics, 59:676–
685, 2003.
[12] Liang, K.Y. and S.L. Zeger: Longitudinal data analysis using generalized linear
models. Biometrika, 73(1):13–22, 1986.
[13] Maaten, L. J. P. van der, E. O. Postma, and H. J. van den Herik: Dimension-
ality reduction: A comparative review. 2007. http://www.cs.unimaas.nl/l.
vandermaaten/dr/DR_draft.pdf Preprint published online. Retrieved 2007-12-
01.
[14] Fan, J. and I. Gijbels: Variable Bandwidth and Local Linear Regression Smoothers.
The Annals of Statistics, 20(4):2008–2036, 1992.
List of Abbreviations
LVD . . . . . . . . . . . . . . . . . . . Logged Vehicle Data
EECU . . . . . . . . . . . . . . . . . Engine Electric Control Unit
FDA . . . . . . . . . . . . . . . . . . . Functional Data Analysis
FD . . . . . . . . . . . . . . . . . . . . Functional Data
PCA . . . . . . . . . . . . . . . . . . . Principal Component Analysis
PC . . . . . . . . . . . . . . . . . . . . Principal Component
PCs . . . . . . . . . . . . . . . . . . . Principal Components
PACE . . . . . . . . . . . . . . . . . Principal Component Analysis through Conditional Ex-
pectation
MSE . . . . . . . . . . . . . . . . . . . Mean Squared Error
AIC . . . . . . . . . . . . . . . . . . . Akaike Information Criterion
FVE . . . . . . . . . . . . . . . . . . . Fraction of Variance Explained
55