artificial neural networks for genome-enabled …€¦ · artificial neural networks for...
Post on 24-May-2020
59 Views
Preview:
TRANSCRIPT
Aus dem Institut für Tierzucht und Tierhaltung
der Agrar- und Ernährungswissenschaftlichen Fakultät
der Christian-Albrechts-Universität zu Kiel
___________________________________________________________________
ARTIFICIAL NEURAL NETWORKS FOR
GENOME-ENABLED PREDICTION IN CATTLE:
POTENTIAL AND LIMITATIONS
Dissertation
zur Erlangung des Doktorgrades
der Agrar- und Ernährungswissenschaftlichen Fakultät
der Christian-Albrechts-Universität zu Kiel
vorgelegt von
M.Sc. agr.
ANITA EHRET
aus Hannover, Niedersachsen
Dekan: Prof. Dr. Dr. h.c. Rainer Horn
1. Berichterstatter: Prof. Dr. Georg Thaller
2. Berichterstatter: Prof. Dr. Dr. h.c. mult. Daniel Gianola
Tag der mündlichen Prüfung: 2. Juli 2014
___________________________________________________________________
Die Dissertation wurde mit dankenswerter finanzieller Unterstützung des BMBF im Rahmen des
AgroClustEr "SYNBREED - Synergistische Pflanzen- und Tierzucht“ (Projekt A3.3) angefertigt.
2
Schriftenreihe des Instituts für Tierzucht und Tierhaltung der Christian-Albrechts-Universität zu Kiel; Heft 206, 2014 © 2014 Selbstverlag des Instituts für Tierzucht und Tierhaltung
der Christian-Albrechts-Universität zu Kiel Olshausentraße 40, 24098 Kiel Schriftleitung: Prof. Dr. Dr. h.c. mult. E. Kalm
ISSN: 0720-4272 Gedruckt mit Genehmigung des Dekans der Agrar- und Ernährungswissen- schaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel
4
ANN: Anitas Neural Network, version 1.4
(__) (__)
(oo) (oo)
/------\/ cows \/------\
/ | || beware || | \
* /\---/\ /\---/\ *
~~ ~~ ~~ ~~
“Another Chapter in the biological Alice-in-Wonderland”
(Hazel 1942)
5
Table of Contents CHAPTER 1 7
General Introduction CHAPTER 2 15
Artificial neural networks for genome-enabled prediction in animal and plant breeding: A review CHAPTER 3 45
An empirical application of neural networks with back-propagation to genome enabled predictions of complex traits in Holstein Friesian and German Fleckvieh cattle CHAPTER 4 67
Use of genomic and metabolic information as well as milk performance records for prediction of subclinical ketosis risk via artificial neural networks CHAPTER 5 87
Extreme Learning Machine: A New Approach for Genomic Prediction of Complex Traits CHAPTER 6 97
General Discussion CHAPTER 7 111
General Summary CHAPTER 8 117
Zusammenfassung
9
General Introduction Genomic-enabled prediction is increasingly important in animal and plant breeding and has
also received attention in human genetics (de los Campos et al., 2013). In the last few years,
the rapid development of high-throughput and sequencing technologies have allowed large-
scale genotyping of thousands of genetic markers (e.g. single-nucleotide polymorphisms
(SNPs)) in genomes of plant and animal species. Such dense panels of molecular markers
allow the mapping of the genome (i.e. QTLs) of individuals and populations through linkage
disequilibrium between single markers and variants of genes. Most traits are assumed to be
determined by large numbers of genes with small effects (Falconer and Mackay, 1996), acting
additively in order to express the phenotype of an individual. In 2001, based on this
assumption Meuwissen et al. derived a ground-breaking approach for genome-based
prediction of complex traits. The authors implemented a whole-genome regression model
where phenotypes are linearly regressed on thousands of markers simultaneously. In contrast
to pedigree-based predictions, this approach allows for the accurate prediction of genetic
values of animals early in life. A few years later Schaeffer (2006) set the foundations for the
great success of this approach in practice. In recent years genome-enabled prediction has
gained ground both in animal (VanRaden et al., 2009) and plant breeding (Bernardo and Yu,
2007, Crossa et al., 2010) and genomic selection (GS) has been implemented in several plant
and animal breeding programs.
At the same time the phenomena of “missing heritability of complex traits” attracted
worldwide interest in the scientific community of animal breeders and related research areas
like human medicine (Manolio et al., 2009). Although genome-enabled studies had already
provided valuable insights into genetic basis, they explained a relative small property of the
heritability of most complex traits (Eichler et al., 2010). Many explanations for this missing
(or hidden) heritability have been suggested and several working groups pointed out that the
fundamental problem might be a mismatch between the genetic architecture of complex traits
and the statistical techniques applied (Yang et al., 2010; de los Campos et al., 2010).
Therefore, the existence of non-additive effects and possible higher non-linearities is also an
important explanatory approach. Genetic mechanism might involve complex interactions
among genes, between genes and environmental conditions, or epigenetic effects which are
not fully captured by additive models (de los Campos et al., 2010).
10
In this context there has been a growing interest in using semi- and non-parametric methods
for genome-enabled prediction of quantitative traits to account for non-additive gene effects
and higher non-linearities as well as genotype-environment interactions (Gianola and Kaam,
2008; Gianola et al., 2006; de los Campos et al., 2010). However, when performing genome-
enabled predictions by regressing phenotypes on high-density SNP panels, the number of
markers (p) vastly exceeds the number of available samples (n). Fitting this large-p with-
small-n regression becomes a statistical problem, so that such regression requires methods
using some type of variable selection or shrinkage estimation (de los Campos et al., 2013).
Due to the potential loss of information through such data and parameter reduction-based
methods, pattern recognition-based methods have become promising alternatives for genome-
enabled approaches. Pattern recognition-based methods are part of a wide range of machine
learning methods, which are able to detect and characterize gene interactions expressed as
genotype patterns (Musani, 2007). In this context especially so called artificial neural
networks (ANNs) are interesting tools in genome-enabled prediction, taking advantage of the
full dimension of data. ANNs are supposed to be universal approximators of complex
functions (Hornik et al., 1989). According to Kolmogorov Theorem, these networks are
proven to capture arbitrary functions, and thus might be able to approximate non-linear
relationships between predictors and responses intuitively. This property is achieved through
a universal learning ability, which is provided by particular training algorithms. With respect
to genome-enabled prediction this concept leads to another advantage e.g. that no fixed
genetic model has to be used, so that the genetic architecture of the target trait remains
unknown a priori.
In the following, first a more general overview is provided about the historical background
and development of ANNs in the field of artificial intelligence. The research work on ANNs
has been inspired by the structure and information processing of biological nervous systems,
in particular the brain with its closely connected neurons. This natural concept is used to
express statistical models in form of network diagrams (Pereira and Rao, 2009). The first
models of ANNs were introduced by Warren S. Mc Culloch and Walter Pitts in 1943. They
described the function of a biological neuron through a simple threshold model based on
logical inputs (active=1, inactive=0; McCulloch and Pitts, 1943). They showed that
combinations of artificial neurons are able to mimic nearly any logical and arithmetic
function. In 1949, the psychologist Donald O. Hebb postulated that a learning ability of such
networks is given by a change of the activation level of neurons which is proportional to a
change in the strength of connection between them (Hebb, 1949). Both, the upcoming of the
11
computer as well as the learning ability of ANNs led to an intense experience and further
improvements of the principle concepts in the following years. In 1956, ANNs were brought
to broad attention, when Marvin Minsky, John McCathy, Nathaniel Rochester and Claude
Shannah organized the first international conference on artificial intelligence on the campus
of Dartmouth College in Hanover, New Hampshire (Garson, 1998). A few years later, a
neuropsychologist, Frank Rosenblatt, introduced a fundamental principle, the Perceptron
model, to the artificial intelligence community (Rosenblatt, 1958). The original concept was
conceived for classification issues, since the perceptron was able to recognize characters,
however, it was not yet capable to generalize its classifications for new patterns. During the
next years substantial developments followed. Then, in 1969, Marvin Minsky and Seymour
Papert pronounced that ANNs were not able to solve any 'exclusive or problems' (XOR;
Minsky and Papert, 1969). Hence, ANNs were almost forgotten, when in 1982 the physician
John Hopfield rediscovered the natural possibilities and introduced his Hopfield neural
network (Hopfield, 1982). This led ANNs to pass again through a period of further
applications and innovations. During this time, the currently well-known multilayer
perceptron was introduced by Paul Werbos (Werbos, 1972). However, first in 1986, ANNs
were brought to a wide popularity, when David Rumelhart introduced a ground-breaking
learning rule, the back-propagation of error algorithm, which conferred the multilayer
perceptron universal learning ability (Rumelhart et al., 1986). Many of the advantages in
using ANNs have to do with the great flexibility of the concept, such as new innovative
architectures, and training rules, which are just as important as the availability of powerful
computers (Hagan et al., 1996). The development and applications in the area of ANNs are
increasing steadily. Hence, ANNs are very flexible machines, which are introduced to
different application fields, like classification, pattern recognition, prediction and forecasting,
process control, optimization and decision support (Hanrahan, 2011). But investigating this
machinery is not trivial, as they are not easy to handle in producing stable outputs.
Furthermore, computational issues have to be considered when ANNs are used in empirical
studies. In the network community, there is a general consensus that the performance of
ANNs in e.g. predictions will strongly depend on data structure, settings of the learning
algorithm as well as network architecture. Because of complexity and computational costs the
number of studies reporting ANNs used for genome-enabled prediction in animal and plant
breeding remain small.
12
The thesis is organized as follows: In Chapter 2 a more detailed survey is provided on the
method and recent applications to genome-enabled predictions in animal and plant breeding.
An application of ANNs to genome enabled prediction using real data is then presented in
Chapter 3. Different data sets as well as network input data and network configurations were
tested, to assess several factors on the predictive performance of milk traits in dairy cattle.
The results emphasize that ANNs can be implemented on large scale SNP-data, but were not
able to outperform standard linear techniques in animal breeding. Differences in prediction
performance were small, when dimension reduction inputs were used, whereas ANNs failed
to produce reasonable predictions when the entire SNP-marker information was used as input
to the network. At a glance, results obtained in the 3rd Chapter showed that, first feature
selection methods are recommended when real data are used in ANNs for genome-enabled
prediction through large scale data. Second, due to complexity ANNs reach limits in
predicting traits which seem to behave principally additively (e.g. milk traits) and can be
predicted well with less parameterized linear methods. ANNs might show their predictive
power when predicting traits where non-linearity is evident (e.g. functional traits) and
parametric methods fail. Therefore, in Chapter 5 ANNs were applied to ketosis prediction, a
functional trait, since here non-linearity is plausible. In predictions of traits where several
factors determine the phenotype, ANNs seem to be a sophisticated tool to incorporate
different sources of information simultaneously to enhance prediction performance and
accuracy. In the 4th Chapter of the present study a modified ANN approach was introduced to
genome-enabled prediction of complex traits in animal breeding. A new learning algorithm
was implemented to ANNs to take advantage of the full dimension of the data, without losing
performance through model complexity. Extreme learning machine (ELM) is a basis function
extension method, introduced to the artificial intelligence community by Huang et al. (2004).
The ELM can be used instead of training an ANN with back-propagation or other learning
rules. Far fewer parameters are used, and therefore, this method is promising in cases where
features vastly exceed sample size and the full dimension of data can be used. Results show
that the modified network outperforms ANNs when using the entire marker matrix, while
reducing computational costs.
13
References
Bishop, C. M. 2006. Pattern Recognition and Machine Learning.
Gianola, D., H. Okut, K. Weigel, and G. Rosa. 2011. Predicting complex quantitative traits
with Bayesian neural networks: a case study with Jersey cows and wheat. BMC
Genetics 12:87.
Hastie, T., R. Tibshirani, J. Friedman, and J. Franklin. 2009. The elements of statistical
learning: data mining, inference and prediction. The Mathematical Intelligencer 27:83-
85.
Manolio, T. A., F. S. Collins, N. J. Cox, D. B. Goldstein, L. A. Hindorff, D. J. Hunter, M. I.
McCarthy, E. M. Ramos, L. R. Cardon, A. Chakravarti, J. H. Cho, A. E. Guttmacher,
A. Kong, L. Kruglyak, E. Mardis, C. N. Rotimi, M. Slatkin, D. Valle, A. S.
Whittemore, M. Boehnke, A. G. Clark, E. E. Eichler, G. Gibson, J. L. Haines, T. F. C.
Mackay, S. A. McCarroll, and P. M. Visscher. 2009. Finding the missing heritability
of complex diseases. Nature 461:747-753.
Musani, S. K., D. Shriner, N. Liu, R. Feng, C. S. Coffey, N. Yi, H. K. Tiwari, and D. B.
Allison. 2007. Detection of Gene times Gene Interactions in Genome-Wide
Association Studies of Human Population Data. Human Heredity 63:67-84.
Pereira, B. D. B. and C. R. Rao. 2009. Data Mining using Neural Networks: A Guide for
Statisticians
http://www.po.ufrj.br/basilio/publicacoes/livros/2009_datamining_using_neural_netw
orks.pdf.
Yang, J., B. Benyamin, B. P. McEvoy, S. Gordon, A. K. Henders, D. R. Nyholt, P. A.
Madden, A. C. Heath, N. G. Martin, G. W. Montgomery, M. E. Goddard, and P. M.
Visscher. 2010. Common SNPs explain a large proportion of the heritability for
human height. Nat Genet 42:565-569.
Rosenblatt, F. 1958. The perceptron: a probabilistic model for information storage and
organization in the brain. Psychological Review 65:386.
Rumelhart, D. E., G. E. Hinton, and R. J. Williams. 1986. Learning representations by back-
propagating errors. Nature 323:533-536.
McCulloch, W. and W. Pitts. 1943. A logical calculus of the ideas immanent in nervous
activity. The bulletin of mathematical biophysics 5:115-133.
14
Minsky, M. and S. Papert. 1969. Perceptrons. M.I.T. Press, Oxford, England.
Hebb, D. 1949. The Organization of Behavior. Wiley, New York.
Hornik, K., M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are
universal approximators. Neural Networks 2:359-366.
Hopfield, J. J. 1982. Neural networks and physical systems with emergent collective
computational abilities. Proceedings of the National Academy of Sciences 79:2554-
2558.
Hagan, M. T., H. B. Demuth, and M. Beale. 1996. Neural network design. PWS, Boston.
Hanrahan, G. 2011. Artificial Neural Networks in Biological and Environmental Analysis.
CRC Press, Inc.
Hastie, T., R. Tibshirani, J. Friedman, and J. Franklin. 2009. The elements of statistical
learning: data mining, inference and prediction. The Mathematical Intelligencer 27:83-
85.
Garson, G. D. 1998. Neural Networks: an introductory quide for social scientists. Sage
Publications, Inc.
Widrow, B. and M. E. Hoff. 1960. Adapting switching circuits. Pages 96-104.
17
Artificial neural networks for genome-enabled prediction in animal and plant breeding: A review
Anita Ehret1, Llibertat Tusell
2, Daniel Gianola
2,3,4 and Georg Thaller
1
1 Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Germany
2 Department of Animal Sciences, University of Wisconsin-Madison, USA 3 Department of Dairy Science, University of Wisconsin, USA
4 Department of Biostatistics and Medical Informatics, University of Wisconsin, USA
Submitted for publication in Theoretical and Applied Genetics
Abstract
In genome-enabled prediction, different regression machines that are non-linear in either
features or parameters have been recently proposed and tested with empirical data. Because of
their flexibility, artificial neural networks are good candidates for enhancing prediction of
complex traits, especially if these are underlined by non-additive gene actions such as
dominance and epistatic interactions. The aim of this work is to give an overview of artificial
neural network methodology and recent applications in animal and plant breeding with a
special focus on genome-enabled prediction of complex traits. Multilayer feed-forward
networks and radial basis function networks are the most frequently used neural networks in
genome-enabled prediction of complex traits. Successful implementation requires avoiding
over-fitting of the training data to enhance the predictive ability of future records while
keeping computational costs affordable. This becomes a data-related and important model
selection task and has a crucial influence in the prediction of yet-to be observed phenotypes.
Aspects such as data preprocessing, choice of a proper network architecture and setting of
parameter values must be assessed carefully. Molecular marker pre-selection or use of
genomic relationships instead of whole-genome markers as inputs to the network seems
advisable to overcome over parameterization, over-fitting and to reduce computational costs.
Recent studies in animal and plant breeding suggest that the superiority of properly specified
artificial neural networks over other prediction machines used in genome-enabled prediction
highly depends on the species, the target trait, the environmental factors and sample size as
well on the genetic relatedness between individuals in the training and testing sets.
Keywords: artificial neural networks, plant breeding, animal breeding, genome-enabled
prediction
18
Introduction
Classical prediction models based on additive inheritance, successfully applied in the
breeding and genetics of agricultural species, often mismatch the actual genetic architecture
of complex traits, which may involve non-additive effects such as various types of gene by
gene interactions and genotype-environmental interactions (e.g. Yang et al., 2010). How e.g.
dominance and other effects can contribute to the understanding of quantitative genetic
variation of complex traits is prepared by Wellmann et al. (2011). To overcome this
assumption in the context of whole genome-enabled prediction field, non-parametric and
machine learning methods, such as artificial neural networks (ANNs), have been recently
proposed with the expectation of outperforming on predictive performance of standard linear
whole-genome approaches (e.g. Okut et al., 2011; Gianola et al., 2011). A good overview and
comparison of prediction skills and requirements of the wide range of methods (also
addressing ANNs) commonly used in the whole genome prediction area in animal and plant
breeding is provided by de los Campos et al. (2013).
The technique of ANNs was developed in the field of artificial intelligence to emulate a
biological nervous system (neural network) as a mathematical information processing system.
In general, ANNs are interconnected artificial neurons organized in several layers thus they
mimic the structure of the human brain (Pereira and Rao, 2009). Algebraically, an ANN can
be viewed as a schematic of Kolmogorov's theorem (1957) for representation of multivariate
functions (Kůrková, 1992) such they can be seen as universal function approximators. A great
advantage of ANNs is their “universal learning ability”, which is achieved during a training
phase, such they are able to learn properties (patterns) from e.g. genomic data without a need
of explicitly define a genetic model. After successful training, the information is stored in the
connections strengths of the ANN and is dynamically adapted in a process that is not
centralized but highly parallelly distributed. Hence, an ANN can approximate solutions to
problems to the same class that were not explicitly trained, even when the problem is highly
non-linear (Kriesel, 2007). A properly specified ANN can be a powerful tool able for learning
about complex traits affected by cryptic genomic interactions and for predict future outcomes,
such as phenotypes in personalized medicine.
In spite of this great potential, the use of such flexible machineries is not straightforward:
ANNs are prone to over-fit the training data and are computationally costly when
generalization methods are used. Moreover, due to their great flexibility, results can be
variable in cross-validation, i.e., outputs may not be stable.
19
This work aims to give an overview of ANNs and their current and potential applications in
animal and plant breeding, with special emphasis in genome-enabled prediction of complex
traits. First, the types of ANN models used for regression analyses in this field are described.
Secondly, specific data setting requirements and evaluation parameters to properly specify an
ANN are discussed. Finally, ANN applications, properties and limitations in the field of
genetics are reviewed and appraised critically.
ANN models for genome-enabled prediction
The capability to transfer ANNs to genome enabled predictions is high; in general ANNs
could be implemented in a non-penalized, penalized and also in a Bayesian framework (de los
Campos et al., 2013). The two ANNs most used in genome-enabled prediction of complex
traits are the multilayer feed-forward networks, also called multilayer feed-forward
perceptrons (MLP) and the radial basis function networks (RBFNN).
Multilayer feed-forward neural networks
The MLP is the most widely used neural network for universal classification and regression
approaches. It is an extension of the concept of Rosenblatt's perceptron (Rosenblatt, 1958),
where single artificial neurons are interconnected across several layers. The information
always flows in one direction, so that the outputs from one layer form the input to the next
layer. The network is completely linked if every single neuron is connected with all neurons
in the next layer.
The two-layer feed-forward neural network, also called single hidden layer Perceptron, covers
most of the statistical applications and is able to solve classification problems that are not
linearly separable (Hagan et al., 1996). Its architecture consists of one input, one hidden and
one output layer, as shown in Figure 1.
20
Figure 1 Architecture of a two-layer feed-forward neural network with three neurons in the hidden
layer.
xip = network input e.g. marker genotype p of individual i; w1.p = network weight from the input to any
neuron of the hidden layer; w2.s; = network weight from the hidden to the output layer for neuron s
(s=1,…,5); yi network output, e.g. fitted phenotype of individual i; f(.) = activation function of the
hidden neurons; g(.) = activation function of the output neuron. Σ denotes a computation of each of the
neurons,”bias1” represents an intercept for each of the neurons in the hidden layer, and “bias2” is an
intercept in the output layer.
Intuitively, a two-layer feed-forward neural network can be thought as a two-step regression
(Hastie et al., 2009). The central idea is to extract linear combinations of the inputs as derived
features in the hidden layer and then model the target as a function of these features in the
output layer.
21
In the hidden layer of an ANN for genome-enabled prediction, the genome derived covariates
of an individual i (i=1,...,n), , (j=1,...,p number of genomic covariates, e.g. marker
genotypes) are linearly combined with a vector of weights, , plus an intercept, or "bias",
, represents one of neurons. The resultant linear combination is then
transformed using an activation function giving the output of hidden neuron t (t=1,...,s)
for individual :
(1)
The activation function can be either linear or non-linear and has to be monotonically
increasing. A non-linear activation function in the hidden layer gives the neural network a
greater flexibility than standard linear regression models (MacKay, 2008) because the
transformation is adaptive via the parameters. In the output layer, the s data-derived scores
and an output intercept (“bias”) b are combined and transformed by the use of another
activation function to yield the predicted phenotype:
(2)
Here, are the weights connecting the s derived functions with the output,
is another activation function, which can be linear or non-linear although an identity function
is used in most cases, when is a continuous-valued phenotype.
The training of a two-layer MLP consists in finding the set of weights and biases of the
network that minimize the differences between the network output (i.e. predicted phenotype)
and the observed value (i.e. true phenotype), typically measured as sum of squared
differences. The adjustment of the weights is done via an iterative process where several
examples of the task are presented to the network (Gurney, 1997). The set of examples, i.e.
genotypes and their corresponding phenotypes, presented to the network during the training is
known as the training set.
The most widely used supervised learning algorithm for MLPs in a regression framework is
the back-propagation of errors algorithm (Rumelhart et al., 1986). This rule is based on a
minimization of the error function and typically employs the gradient descent method to
locate the optimal solution (Werbos, 1994). Algebraically, the back-propagation algorithm is
a generalized delta rule (Kriesel, 2007) and can be viewed as a non-linear least-squares
regression problem. Intuitively, the algorithm can be decomposed into five steps as shown in
Figure 2.
ijx
][1
t
jw
ta st ,...,1= s
(.)tf
i
[ ] )(1 1
][∑ =
+=p
j ij
t
jtt
t
i xwafz
(.)tf
w
[ ]i
s
t
t
iti εzwbgy ++= ∑ =)(
1 2
st www 2221 ,...,, (.)g
iy
22
Figure 2 Scheme representation of the back-propagation algorithm.
MLP = multilayer perceptron
First, weights are initialized with small random values and a small positive value for the
learning constant. Second, in the forward propagation step, each data point is fed to the input
layer, while the output is generated based on the weights. The predicted output is then
compared with the true (observed) output value and the squared error of each unit is
calculated. In a third step, the total squared error is propagated back from the output over the
hidden layers to the input units. In a fourth step, error gradients in both the output and the
hidden layers are computed and used for calculating the weight corrections. Once weight
changes are computed for all units, all weights are updated at the same time. The last step
consists in going back to the second step and iterating the process until the error function
converges. The back-propagation algorithm is powerful and simple. From a practical
perspective, this learning rule and its variations are flexible enough to cover highly complex
23
interactions between predictor variables. However, it can be slow in reaching convergence
and prone to reach local minima. In most cases a certain threshold is used to terminate the
algorithm (Hanrahan, 2011). Flexibility reaches a limit when the network architecture is not
optimally chosen, or when the number of predictors is very large compared to the data cases
available. In this case there is a drastic increase in model complexity and, hence, in number of
parameters to estimate. This is prone to occur when MLPs are used for genome-enabled
prediction problems where p>>n. It also leads to a very slow convergence of the back-
propagation algorithm and can cause over-fitting. For an empirical control of model
complexity, the distribution of the connection strengths (i.e., weights and biases) of the
network can be taken as an indicator of the extent of the regularization attained. Typically,
weight values decrease as model complexity increases (Gianola et al., 2011). A natural and
automated approach of dealing with model complexity is to use penalized (typically
Bayesian) methods, where a compromise between model goodness of fit and complexity of
the network is naturally attained without the need of artificial constraints (Gianola et al.,
2011; Okut et al., 2011). This leads to a subgroup of MPLs, called Bayesian regularized
artificial neural networks (BRNN) that were first introduced by Neal (1992) and MacKay
(1992a). The main difference to MLPs, where an optimal set of weights is chosen by
minimizing a cost function, is that the Bayesian approach involves a prior probability
distribution of the network weights. Regularization is obtained by treating the weights as
random effects following a specified prior distribution, given the observed data (Lampinen
and Vehtari, 2001). In 2011, Gianola et al. (2011) adapted the methodology to prediction of
complex phenotypes through the use of genomic data.
Denote as to all connection strengths of the neural network (including hidden and output
biases). Under a BRNN, estimates of the parameters of the network are obtained from the
posterior density of given the output data, the residual and connection strength variance
parameters ( , respectively):
(3)
Where is the sampling model with variance , is a prior distribution
indexed by a variance , and is the marginal density of the data. Although
this representation implies that biases and connection strength are assigned the same prior it is
advisable to avoid regularization of the biases.
ψ
ψ
22 , ψε σσ
),(
)(),(),,(
22
22
22
ψε
ψε
ψεσσ
σσσσ
y
ψψyyψ
p
ppp =
),( 2εσψyp
2εσ )( 2
ψσψp
2ψσ ),( 22
ψε σσyp
24
Samples from this distribution can be obtained via Markov chain Monte Carlo sampling, but
this process can be computationally expensive given the high number of nonlinearities needed
to be accounted for. An alternative is an empirical evaluation of the Bayesian sampling via a
Hybrid Monte Carlo algorithm (HMC; Neal, 1996) as well as a procedure known as empirical
Bayes, which does not use sampling but can get stuck at local modes. The latter first
computes the conditional posterior mode of assuming that the variance components are
known, and estimate these in a second step. The first step is equivalently achieved by
minimizing the following ‘penalized’ sum of squares:
(4)
where ,
and ei is the difference between observed and predicted
phenotypes for the fitted model. The inference on variance components in the second step is
done via maximization of an approximation to the marginal likelihood of the data
, a procedure known as empirical Bayes. This is based on a Laplacian approximation that can
fail badly is the number of predictors is large. Refer to Gianola et al. (2011) for further details
regarding the whole estimation procedure and the tuning of parameters η and .
MacKay (1992b) states that an effect of the Bayesian framework should be a reduction in the
high cost of the learning process. In comparison to non-regularized ANN, a Bayesian/MCMC
approach gives an effective way of sampling relevant parts of the model space and then
averaging predictions over high-probability models (Hastie et al., 2009). In theory, Bayesian
model averaging results in better and more robust predictive performance.
Radial basis function neural networks
Radial basis function neural networks (RBFNNs) are another type of neural network for
solving regression problems. These were developed later than MLPs and were first proposed
by Broomhead and Lowe (1988) and Poggio and Girosi (1990). RBFNNs are single hidden
layer, feed-forward and completely linked ANNs with an identity output function and radial
basis functions (RBFs) in the hidden layer. The radial basis functions calculated in the hidden
neurons are similar to kernel functions in kernel regressions (Kriesel, 2007). The most
employed radial basis function is the Gaussian with form:
(5)
ψ
∑∑==
+=m
i
i
n
i
ieF1
2
1
2)( ψϕηΨ
221
eση = 22
1ψσ
ϕ =
),( 22ψε σσyp
ϕ
[ ]2][ exp tit
t
i hz cx −−=
25
where is the Euclidean norm between the input vector and a center vector for
neuron t. The bandwidth of the RBF for neuron t needs to be empirically determined. Other
norms like the Mahalanobis distance (Ripley, 1996), as well as other activation functions can
also be used e.g. thin-plate-spline, multi-quadratics (Kriesel, 2007). In an RBFNN the
connections between the input and hidden layer are not weighted, so it can be seen intuitively
that the input layer does not participate in information processing. The output for the data
example i results from weighting and combining the outputs of the RBFs of the hidden
neurons and a bias, as in equation (2).
The training of a RBFNN proceeds in two steps. First the basis function parameters of the
hidden layer (center, bandwidth) are determined, and second, the weights between the hidden
and the output layer are inferred from the output of the hidden layer and the target data
(Pereira and Rao, 2009). There are several ways of dealing with the centers and the bandwidth
parameters of the RBF from the hidden layer: (1) fixed selection, (2) conditional fixed
selection, and (3) selection adaptive to the learning process (Kriesel, 2007). In the latter case
the number of neurons in the hidden layer is not constant. A certain number of neurons and
centers and bandwidths are selected a priori and then are extended or reduced during learning.
This method is particularly suited to and widely used for function approximations (Fritzke,
1994). After setting the default numbers of neurons and centers, the weights are analytically
calculated using, for example, the ordinary least-squares method (Chen et al. 1991). However,
unsupervised learning algorithms like the k-means algorithm can be implemented as well
(Pereira and Rao, 2009). Once all errors of the output values are calculated, the maximum
error is sought. The growth of the network is due to a replacement of this maximum error with
a new RBF neuron. The effect of this new neuron depends on the bandwidth of the RBFs and
the already existing RBF neurons will be adjusted when adding the new neuron. This
adjustment is made by moving the centers of the other neurons away from the new neuron,
reducing their bandwidth slightly. The current output vector of the network is compared to the
teaching input and the weight vector is improved by means of training. Therefore, the selected
centers maximize the decrease in the mean squared error of the RBFNN, and the algorithm
stops when the number of added centers attains a desired precision, or when the number of
centers is equal to the number of input neurons (González-Camacho et al., 2012).
ti cx − ixtc
th
][t
iz
26
Construction of ANNs for genome-enabled predictions
The choice of a network architecture and the setting of specific parameters of an ANN have a
crucial influence on the predictive ability. This is an important model selection task required
to ensure optimal implementation of ANNs. In many cases, the only way to tune the
parameters of an ANN for a given dataset is to run several models and keep the one with the
specifications giving the best predictive performance. Here, general guidelines regarding
specific settings for ANNs in genome-enabled prediction are provided.
Choosing the Network architecture
When using ANNs, a challenge is to keep the network architecture as simple as possible to
avoid capturing random error or noise (i.e. prevent over-fitting), but allowing enough
flexibility to the network to describe the underlying relationship and solve the mapping
problem (Geman, et al. 1992).
The number of input and output neurons in the network is a priori defined by the data. In
genome-enabled prediction the number of input neurons is given by the number of genomic
covariates included in the analyses, and the number of output neurons is commonly
determined by the phenotype (e.g. a continuous or a categorical trait).
In the case of MLPs, the number of hidden layers and the number of hidden neurons per layer
depends strongly on the mapping problem and it becomes a model selection task to avoid a
poor performance of the MLP. Most MLPs have two or three layers maximum. Four layers or
higher dimensional networks are rarely employed. In ANNs for genome-enabled prediction,
only two-layer MLPs have been used with up to 10 neurons tested for predictive performance
(Gianola et al., 2011; Okut et al., 2011). In general, a compromise should be attained: if too
many neurons are kept in the hidden layer, the network is good in memorizing the training
data, but it is worse in predicting yet-to-observed data. On the other hand, a model with few
neurons may impede a good approximation of the underlying function of the problem.
The choice of the activation function used in the hidden layer of the MLP is related to the
expected underlying function for the problem. Typically, the activation function is smooth,
continuous, and increases monotonically with input values, so that its derivative is always
positive. The most common ones are the identity, the step or threshold, the logistic (sigmoid)
and the hyperbolic tangent, as shown in Figure 3. The type of activation function used in the
output layer is determined by type of output. For example, if the output is continuous, a linear
function is used.
27
Figure 3: Activation functions commonly used in artificial neural networks
Setting the parameters of the back propagation algorithm
Minimizing the cost function of the back propagation algorithm in MLP often leads to
suboptimal performance and to a very slow convergence rate. The error function in most
cases is non-convex and possesses many local minima. As a result, the final solution obtained
is dependent on the data quality, the network architecture chosen and on the parameters of the
algorithm such as the initialization of the weights and the learning rate adopted. In practice, it
is always sensible to try a number of different training configurations, and to choose the
solution giving the lowest error. To avoid the algorithm getting stuck in local minima, an
approach is to modify the initialization of the weights in order to optimize the starting point of
the algorithm using a few training iterations. This should be done by use of small random
values, and by modification of their variance values. The learning rate defines the step length
of the weight corrections and controls the speed and accuracy of the learning procedure
(Kriesel, 2007). In general there is no unique optimal learning rate for any ANN model, so it
has to be adjusted individually to the given problem. Starting with a larger learning rate and
choosing smaller values is advisable.
28
Types of genomic information used as input data
Several sources of genomic information have been used as inputs for genome-enabled
prediction with ANNs. If marker genotypes are used, e.g. SNPs or DArTs, the use of an input
matrix of dimension n×p consisting of all marker genotypes of all individuals is
required. With the software available at present this is computationally feasible only when a
small number of markers is used because the number of parameters to estimate increases
drastically with the number of inputs used (Gianola et al., 2011). Up to now, most of the
implementations of ANNs in animal breeding have used only a sub-panel of the original
marker matrix, and 41,995 has been the largest number of markers used as inputs in a MLP
for genome-enabled prediction (Ehret et al., to be published).
An alternative approach that reduces model complexity and computational costs, is the use of
genome-derived relationships among individuals (e.g. suggested by VanRaden
2008) or pedigree information (e.g. the additive relationship matrix computed from
pedigree) instead of the use of the entire marker matrix as inputs to the network
(Perez-Rodriguez et al.2013). This is feasible in ANNs because these are standard metrics in
animal breeding and one can think of the infinitesimal model as a regression on pedigrees or
on a genomic relationship matrix, in the case of genomic BLUP (de los Campos et al., 2009;
Gianola et al., 2011). In these cases, the number of inputs is equal to the number of
individuals (n). Recently, Perez-Rodriguez et al. (2013) presented a software which also
allows inclusion of dominance genomic relationships as inputs into the network. Aiming at
reducing the number of parameters to be estimated, but without losing much of the
information contained in the entire marker matrix , the principal component scores ( )
of can also be used as inputs to the network (Tusell et al., 2013). The is an n×n matrix
obtained from the singular value decomposition of (Mandel, 1982) as .
Overall, ANNs are flexible and can deal well with any kind of genomic covariates used as
inputs. However, given that computational limitations still exist, dimensionality reduction is
advisable when the number of inputs is very large.
Feature scaling
Input and output data need to be adapted to the network peculiarities for the learning
algorithm to work properly. This is known as feature scaling, which can be achieved by
normalizing the data, i.e., putting the input and output data on the same scale (e.g. ranging
from -1 to 1). This ensures that all inputs are treated equally in the regularization process, an
{ }ijx=X
{ }ijg=G
{ }ija=A
{ }ijx=X
X UD
X UD
X 'UDVX =
29
issue that may have a large influence on the final solution. Any output or input value i of a
variable, , can be easily normalized as:
(6)
where is the sample mean of the variable and is the range of y , although the
standard deviation of y can be used instead of the range.
When the input is the matrix of marker genotypes for all SNP or DArT genotypes of
all individuals, feature scaling can be accomplished by using a proper genotype codification.
For example, assuming additive allele effects, SNP genotypes can be coded as −1, 0, and 1 for
the homozygote for the minor allele, heterozygote, and homozygote for the other allele, i.e.
, respectively. Assuming dominance effects, SNP genotypes can be coded as −1
or 0 for homozygotes and the heterozygotes, respectively, i.e. . DARTs are binary
marker genotypes that can be coded as . Subsequently, the codes can be centered
and standardized.
Often, phenotypes are averages of several records. If a varying number of records is used to
compute the phenotypes, a possible way of accounting for heterogeneous residual variance
when using ANNs is to multiply both inputs and phenotypes by where stands for
the number of records used to compute the phenotype i (i=1,...,n). Predictions obtained with
the ANN should be then reconverted to the original scale.
Assessment of the ANN predictive ability
Prediction ability of a neural network is assessed in terms of its ability to generalize to an
independent data set. To achieve this, the assessment must be done in data observations that
have not been presented to the ANN during the estimation (training) of the ANN parameters.
Usually, data is partitioned into two groups. One group, the training set, is used for designing
and estimating parameters of the ANN. The remaining data, the testing set, is only used to
assess prediction performance. In the training set, genotypic as well as phenotypic data are
presented as examples to the network, whereas in the testing set only the genotypes are
presented to the ANN to obtain the predicted phenotypes. The phenotypes in the testing set,
viewed as the yet-to-be observed phenotypes, are only used as a reference value in the
posterior assessment of the predictive ability.
}{ iy=y
yy
yi
iMinMax
yy
−
−=
µ*
yµ yy MinMax −
{ }ijx=X
{ }1,0,1−=ijx
{ }0,1−=ijx
{ }1,0=ijx ijx
∑=
−n
i
i
i
bn
b
1
1
ib
30
It is also possible to divide the dataset into 3 groups: training, testing and validation set. In
this case, the latter is used to optimize the training by assessing how well the model is
behaving during the training process by means of some error measures calculated in the
validation set. Once the weights are estimated, predictive ability is assessed the testing set. In
order to evaluate the predictive ability of an ANN, any measure of fit that is appropriate for
the data, model, and problems at hand can be used. When the observations are continuously
distributed, the Pearson’s correlation coefficient between the yet-to-be observed and the
predicted phenotypes in the testing set is the most common one:
(7)
where is the number of records in the testing set, and are the yet-to-be
observed and the predicted phenotype for the i th record, respectively, and and are
the average yet-to-be observed and predicted phenotypes, respectively. Some studies use the
“accuracy” of prediction, defined as .
Another goodness of fit criterion used is the predictive mean squared error,
(8)
but the ’root mean squared error’ or ‘median absolute deviation’ could be also used to
summarize the errors. PMSE gives a measure of “closeness” (prediction biases and prediction
variance) whereas measures the degree of scatter. When these measurements are used in
the training set, they provide empirical measurements of the degree of over-fitting of the
ANN; for example, if is close to 1 with training data this indicates over-fitting, and poor
out of sample generalization. For binary classification ANNs, each observation can be
correctly or incorrectly predicted. In this case, other measurements for describing the
goodness of fit can be used such as the misclassification error rate, the sensitivity or
specificity (Cavero et al., 2008).
To obtain a stable estimate, that account for sample variance originated in the creation of the
training and testing sets, the predictive performance of the ANN is commonly evaluated in a
k-fold cross-validation. Here, the entire dataset is randomly split into disjoint subsets,
, of approximately equal size, called folds. The network is then trained and
( )( )
( ) ( )2
1 ,1
2
,
1 ,,
∑∑
∑
==
=
−−
−−=
testtest
test
n
i predipred
n
i trueitrue
n
i predipredtrueitrue
yyyy
yyyyr
testnitruey , ipredy , iy
truey predy
2r
( )∑ =
− −=testn
i ipreditruetst yynPMSE1
2,,
1
r
r
D k
kDDD ,....., 21
31
tested k times. At each time l the ANN is trained on and tested on .
When k is equal to the number of observations in the entire dataset this type of design is
called “leave-one-out” cross-validation, an only one observation at each fold is left in the
testing set. The overall predictive ability assessed in a k-fold cross-validation is obtained as
the average predictive ability of the testing sets over the k folds (Kohavi, 1995). The
advantage of the k-fold cross-validation is that all observations are used for training several
times but each observation is used for validation only once.
Another approach that allows obtaining an empirical distribution of the estimates of the
predictive ability measurements is to use a repeated random sub-sampling validation. With
this method random partitions of the dataset into training and testing are generated. The
model is fitted in the training data of each partition, and predictive ability is assessed using
the testing data. The results are then averaged over the partitions. Some whole genome-
enabled predictions with ANN have used this type of cross-validation design by generating 50
random partitions with 90% of the data in training and 10% in testing sets (González-
Camacho et. al., 2012; Gianola et al., 2011; Pérez et al., 2013). The advantage of this design
in comparison to the k-fold cross validation is that the proportion of the data in the
training/validation split is not dependent on the number of folds. However, some observations
may never be selected in the testing sets, whereas others may be selected more than once.
Software available for implementation of ANNs in genomic prediction
There are few implementations of computational systems using ANNs available. Common
software programs for fitting ANN models include SAS (SAS Institute Inc., 2008), R (R Core
Team, 2011), MATLAB (Beale et al., 2011), and there are also some stand-alone programs
such as the fbm software package of Neal (1996) for BRNNs. In SAS the procedure NEURAL
can be used to fit ANN models, but it is not included in the standard version. An alternative is
MATLAB, there the function trainbr can be used to fit MLPs and BRNNs. RBFNN models
could be modeled using the newrb function (Beale et al., 2011). However, most of the
software available lacks flexibility for dealing with the data and models commonly used in
genome-wide enabled prediction and is not publicly available. Pérez-Rodríguez et al. (2013)
developed an R-package called brnn that implements BRNNs for animal and plant breeding
applications. It includes the feature of fitting together in the same ANN different sources of
information, such as additive and dominance genomic relationships as well as pedigree-based
relationships and provides a scaling function for the data pre-processing (inputs and outputs).
{ }k,....2,1∈ lDD − lD
32
Application of artificial neural networks in animal and plant breeding
The first application of ANNs in animal breeding was by Macrossan et al. (1999), who
employed them as a method for optimizing mating allocation to maximize production traits in
the Australian dairy industry. A linear and a Bayesian linear regression, as well as a BRNN
and a non-regularized MLP model, were compared in their ability to predict first daughter
milk yield phenotypes using dam and sire production and environmental effects as input
variables. All models gave similar predictive ability: the correlation between predicted and
observed response variables in the testing set of all the models was 0.76. This suggested that
there was little or none non-linearity in the dairy traits analyzed. Furthermore, the authors
pointed out that the choice of an optimal training set size was important to deal with the
computational costs of using MLPs and BRNNs in prediction. Aiming at improving reliability
of mastitis detection in practical applications, Cavero et al. (2008) used ANNs as a
classification method for mastitis detection of cows in automatic milking systems. They used
several MLPs with a resilient back-propagation learning rule and tested different network
architectures, including a network with two hidden layers. Days in milk, maximal electrical
conductivity, relative deviation of electrical conductivity, milk yield and milk flow were used
as inputs to the network. The output layer was a single neuron with a continuous output value
that was classified as positive mastitis if the somatic cell counts of the cow exceeded a certain
threshold, or negative otherwise. Sensitivity, specificity and error rate were the parameters
used to assess the average accuracy of the classification in the testing sets of a 5-fold cross-
validation. They concluded that ANNs are generally useful tools for computerized decision
support systems and recommended the use of neural networks as a first screening tool to
detect groups of cows with a high susceptibility of developing mastitis. However, the
performance of the ANNs was not completely satisfactory, since specificity ranged from 51.1
to 74.9% and the error rate was between 51.3 and 80.5%, depending on the threshold used to
diagnose mastitis and for a block-sensitivity of at least 80% in the training set, possibly, due
to insufficient input information related to mastitis.
Models that are non-linear in either features or parameters were proposed as an alternative to
the classical linear ones to enhance genome-enabled prediction of complex traits in animal
and plant breeding (Gianola et al., 2006; Gianola et al., 2011; González-Camacho et al., 2012;
Long et al., 2007). In 2011, Okut et al. (2011) and Gianola et al. (2011) suggested the use of a
BRNN as a potential approach to account for possible non-linear relationships between
genomic covariates and complex traits. Empirical results pointed out that the ability of
predicting body mass index data in mice with BRNN was low irrespective of the network
33
architecture considered, and not different from the one attained by a linear model (the
correlation between predicted and true phenotype in the testing set ranged from 0.14 to 0.18).
However, they concluded that even only using a subset of markers from all the SNPs
available, BRNN could lead to a predictive performance similar to the one obtained with
linear regression models. They found that all the network architectures examined were
consistent in flagging SNPs associated with the trait. Gianola et al. (2011) investigated MLPs
with Bayesian regularization for genome-enabled prediction of grain yield and milk
production traits in wheat and cattle, respectively. The study examined different network
structures with up to 6 neurons in the hidden layer and different activation functions. In order
to avoid the high dimensionality of the SNP data in the cow dataset, they used as inputs to the
network either the additive genomic relationship matrix among individuals (Van Raden,
2008) or the numerator additive genetic relationship matrix among individuals obtained from
the pedigree. For the wheat dataset biallelic markers (DArTs) were used as inputs. They
noticed that a network with one output neuron and one hidden neuron and a linear activation
function is equivalent to a linear random regression, where the weights are the regression
coefficients of a standard linear approach. Here, the weights of a neural network can be
interpreted as marker effects. This is the only exception were an ANN can be linked to an
infinitesimal model. They concluded that the non-linear ANN approaches can outperform
linear models in predictive ability of complex traits when using genomic information, where
the number of inputs is typically much larger than the number of animals in the data sample
(p >> n), especially when Bayesian regularization is used. However, like in other predictive
models, their superiority highly depends on characteristics of the data used.
It has been pointed out that a large source of genetic variability in plant traits comes from
non-additive, interaction effects e.g., dominance and epistasis (Holland, 2008). González-
Camacho et al. (2012) proposed the use of RBFNN to predict maize traits planted in different
environments through the use of genomic markers. A reproducing kernel Hilbert spaces
model (RKHS; Gianola et. al., 2006) and the linear Bayesian LASSO model (Park and
Casella, 2008) were also included in the analyses. RKHS and RBFNN models set a similar
but slight superiority over the linear model in prediction ability of records simulated under
gene × gene interaction, possibly because of their better ability to capture complex non-
additive interactions. Further, the authors cautioned that redundant information derived from
inclusion of a large number of markers as covariates can impair predictive ability of non-
linear models. This can be overcome by directly introducing differential weights of the SNPs
34
in the kernel (Long et al., 2010), using predefined SNP weights, performing a prior SNP
selection, or using marker haplotypes.
Heslot et al. (2012) compared the ability to predict 8 wheat datasets of 11 standard, non-
parametric and machine learning prediction models including an ANN with a back-
propagation algorithm and a linear activation function in the hidden layer. The performance of
the models was assessed by Pearson's correlation coefficient between the observed and
predicted genomic breeding values (referred to accuracy). In cross-validation the average
accuracy over several datasets of the ANN (0.55) was similar to the best performing model
(RKHS), but the ANN performed poorly in terms of the variance of the estimates (mean
squared error). This suggested that the ANN produced large variance and unstable results.
Pérez-Rodríguez et al. (2012) compared linear and non-parametric regression models for
genome-enabled prediction in wheat, including BRNNs and RBFNNs approaches. Predictive
ability of the models was evaluated by the average Pearson's correlation between predicted
and yet-to-be observed output values and by the predictive mean squared error (PMSE)
obtained in 50 random partitions (90% and 10% of the records assigned to the training and
testing sets, respectively). The study indicated clearly that RKHS, RBFNN and BRNN
models had a markedly better predictive accuracy than the linear Bayesian LASSO, Bayes A
or Bayes B procedures. This may be due to the fact that in wheat, like in maize, additive ×
additive as well as genotype × environmental interactions play an important role in the
expression of traits. The authors pointed out the importance of forming training and validation
sets properly, given that the degree of genetic relatedness of the individuals belonging to each
group has an influence on the accuracy of the predicted values. Similarly, Tusell et al. (2013)
compared the performance of different linear and non-linear prediction models in association
with different input covariates for predicting litter size in pigs. They used a pedigree-based
mixed-effects model, Bayesian Ridge Regression, Bayesian LASSO, Genomic BLUP, RKHS,
BRNN and RBFNN. The neural networks employed either the additive genomic relationship
matrix (G) or the principal components scores (UD) of the marker matrix as inputs. The
model comparison was assessed on 3 large datasets: approximately 2,600 and 1,600 purebred
and 1,900 crossbred sows genotyped for 60K SNPs. Predictive ability was measured by the
average correlation between realized and predicted litter size in the testing sets (10% of the
data) in a 10-fold cross-validation, and separately for each line. On average, the non-
parametric models (RKHS, BRNN, RBFNN) gave similar predictions to their parametric
counterparts. However, the predictive ability of BRNN varied drastically, giving the best
prediction when G was used in crossbreds (0.31), where non-additive genetic effects are
35
expected to be important, but the worst when UD was used in one of the purebred lines (0.02).
The predictive ability of RBFNN ranged from 0.16 to 0.23. The authors concluded that the
performance of ANNs was promising, but sensitive to the type of genomic information used
as input and to intrinsic characteristics of the data set, such as population structure and
effective sample size given that predictive ability was higher in the crossbred than in purebred
lines. The latter showed evidences of population substructure and naturally a higher degree of
inbreeding.
In recent years several studies have used genotype imputation approaches aiming at
increasing predictive accuracy and performing genetic evaluation of individuals genotyped
with low density panels, together with individuals with high density SNP genotypes (Weigel
et al., 2010; Mulder at al., 2012). Under the assumption that imputed genotypes do not add
additional information to the one already supplied by the observed genotypes, Felipe et al.
(unpublished), observed that genotype imputation would have little or no effect in predictive
accuracy of models already capable of extracting more information from the data, such as
ANNs. The effect of genotype imputation on predictive ability of two mice traits was
evaluated using the Bayesian LASSO, RKHS and BRNN in a cross validation scheme (2/3 of
the data in the training set and 1/3 in the testing set). The BRNN could not deal well with high
imputation errors and performed poorly with the imputed markers, whereas the other two
models gave similar (and better) predictive results. Using the same dataset but more markers
than Okut et al. (2011), they obtained a lower prediction performance for the same trait,
indicating that BRNN performs better using effect based selected markers instead of evenly
spaced ones. The use of imputation techniques together with the increasing availability of
genomic data makes feature selection methods an important tool to be used for reducing
dimensionality of whole-genome prediction approaches.
The usefulness of ANNs as predictor models for the analyses of functions of interest in
animals breeding was assessed by Okut et al. (2013). To evaluate the predictive performance
of ANNs in a linear additive system, they predicted expected progeny difference of marbling
score in Angus cattle (a trait computed with BLUP under additive inheritance assumptions).
Two different kinds of ANNs were investigated: BRNN and a scaled conjugate gradient back-
propagation ANN (SCGANN; Møller 1993; Kriesel 2007; Okut et al. 2013) which is an
extension of the back-propagation algorithm. They also investigated linear and non-linear
activation functions and different network architectures with 1 to 4 neurons in the hidden
layer. Feature selection was examined as well by using two different datasets, one with a 3K
panel and one with 700 preselected markers. There was no evidence that more complex
36
networks (more neurons in the hidden layer) produced better predictions than the linear model
used as benchmark, possibly because of the additive pre-smoothing of the trait. Within ANNs,
BRNNs produced more accurate predictions than the SCGANNs. Table 1 shows a
summarized description of the studies reviewed above.
37
Table 1 Studies that used artificial neural networks for prediction of complex traits in plant and animal breeding
Year Authors Species Input data Target trait Network
architecture
Network
model
CV Other models
analyzed
1999 Macrossan et al. cattle production + environmental traits
first daughter milk yield
2-layer 1-4,8 neurons
BRNN, MLP with BP
no BLR and LR
2008 Cavero et al. cattle Milk traits mastitis 2-3 layer 10,20,30 neurons
MLP with a resilient BP
yes %
2011 Gianola et al. cattle, wheat A, G, DArT milk traits, yield 2-layer 1-6 neurons
BRNN yes %
2011 Okut et al. mouse SNP BMI 2-layer 1-7 neurons
BRNN yes %
2012 González-Camacho et al.
maize SNP trait-environmental combinations
2-layer RBFNN yes RKHS, BL
2012 Pérez-Rodríguez et al.
wheat DArT trait-environmental combination
2-layer BRNN RBFNN
yes BRR,BL, BayesA, BayesB,RKHS
2012 Heslot et al. wheat, barley, maize
DArT,SNP, SSR
different yield traits 2-layer MLP with BP and weight decay
yes BL, wBSR, RF, BayesC, SVM, RKHS, elastic net, empirical Bayes
2013 Tusell et al. pig UD, G litter size 2-layer 4 neurons
BRNN RBFNN
yes Pedigree model, RKHS,BL
2013 Okut et al. cattle SNP marbling score 2-layer BRNN SCGANN
yes BayesCπ, BayesCπ=0
unpub
lished
Felipe et al. mouse SNP BMI, Body weight
2-layer 5 neurons
BRNN yes BL,RKHS
A = numerator relationship matrix; G = Genomic relationship matrix; SNP = Single nucleotide polymorphism; DArT = Diversity array technology markers; UD = Principal component score of marker matrix; SSR = single sequence repeats; BMI = Body mass index; BP = Back-propagation; RKHS = Reproducing kernel Hilbert spaces regression; BLR = Bayesian linear regression; LR = Linear regression; BRR = Bayesian ridge regression ; BL = Bayesian Lasso; RF = random forest; SVM = support vector machines; wBSR = weighted Bayesian shrinkage regression; CV = cross-validation.
38
Potential benefits and pitfalls in using ANNs for genome-enabled prediction
Due to their great flexibility, ANNs have been used in a wide spectrum of different
applications including genome-enabled prediction of complex traits in animal and plant
breeding. ANNs have a great potential as predictive machineries. One of the most appealing
properties is that ANNs do not require most of the prior assumptions that commonly underlie
parametric statistical models because they can be non-linear in either features or on
parameters and, if properly specified, they may be able to capture complex signals from the
data and result in a better predictive accuracy. However, the ability of ANNs to accommodate
interactions and non-linear relationships between genomic covariates and phenotypes depends
strongly on the quality of the learning process, the data structure and the network model
employed. Hence, although ANNs for genome-enabled prediction can potentially outperform
predictive accuracy of linear prediction methods, this is not always the case. Further, semi-
parametric methods less computationally expensive, such as RKHS have shown more
consistent performance in different datasets (Heslot et al., 2012; Tusell et al., 2013).
The large number of parameters in a richly structured ANN often impairs its predictive power
compared with the one obtained with less parameterized prediction models for a given dataset
size. This is due to the tendency of ANNs to over-fit, which is copy signal and noise present
in the training data. Predictive performance would be higher if the number of examples
presented to the network during training were optimal for correctly estimating connection
strengths. Unfortunately, when the dimension of the network increases markedly, as it is the
case in genome enabled prediction, the number of parameters grows quickly and ANNs are
prone to over-fit the data as stated above. Therefore, the use of generalization approaches, like
Bayesian methods, or adapted algorithms like “weight decay” (Kriesel, 2007) are advisable.
Up to now, even though computations are fast and highly parallel, ANN analyses are still
limited to a certain number of inputs which may be far away from what the high-
dimensionality genomic data demands. This is one of the reasons why, instead of using an
entire marker dataset, SNP selection, genome-derived relationships or other dimensionality
reduction techniques are used to produce ANNs with a reasonable predictive ability.
The studies reviewed show that the performance of ANNs highly depends on the
characteristics of the dataset used. The optimal architecture of an ANN is highly dependent on
intrinsic characteristics of the data, especially when the dataset is small. This is why it is
essential to use a suitable cross-validation scheme for evaluating the results of different
architectures and parameter settings, as well as to attain a sensible measure of uncertainty.
39
If the researcher is not merely interested in prediction but is aiming at inferring causal
relationships or interactions between genomic covariates and traits, the use of ANNs may not
be advisable. The reason is that ANNs have a 'black box' behavior, meaning that (typically)
there is no direct way to interpret the parameters learned by the network. In this sense,
practical implementation of ANN is limited to cases where no biological interpretation of the
parameters is needed, such as breeding value estimation for selection purposes or finding
association between markers and quantitative traits. Overall, ANNs have a great potential
predictive ability but proper knowledge of the network requirements is required for a
successful implementation in genome-enabled prediction of complex traits. Also, prediction
performance of ANNs depends strongly on the peculiar characteristics of the data and the
problem to be solved. Hence, there are no universal recommendations available to ensure
good predictive ability of ANNs that apply to different case studies, and the optimal settings
of an ANN depend specifically to each task. To a large extent, capturing the postulated
superiority of ANNs over other prediction methods strongly depends on the knowledge and
experience of the user and on the data structure obtained.
40
Author’s contributions
AE worked out the object of this study and drafted the manuscript. LT and AE finalized the
manuscript. DG supervised the study and edited the manuscript. GT revised the manuscript.
All authors read and approved the final manuscript.
Acknowledgements
This research was funded by the German Federal Ministry of Education and Research
(BMBF) within the AgroCluster “Synbreed – Synergistic plant and animal breeding”. The
authors thank Paulino Pérez-Rodríguez (Department of Animal Sciences, University of
Wisconsin-Madison, Madison, 53706, USA) for comments and suggestions.
References
Beale, M. H., M. T. Hagan, and H. B. Demuth. 2011. Neural Network Toolbox™ MATLAB
Broomhead, D. and D. Lowe. 1988. Multivariable functional interpolation and adaptive
networks. Complex Syst 2:321-355.
Cavero, D., K. H. Tølle, C. Henze, C. Buxadé, and J. Krieter. 2008. Mastitis detection in dairy
cows by application of neural networks. Livestock Science 114:280-286.
Chen, S., C. F. N. Cowan, and P. M. Grant. 1991. Orthogonal least squares learning algorithm
for radial basis function networks. Neural Networks, IEEE Transactions on 2:302-309.
de los Campos, G., H. Naya, D. Gianola, J. Crossa, A. s. Legarra, E. Manfredi, K. Weigel,
and J. M. Cotes. 2009. Predicting quantitative traits with regression models for dense
molecular markers and pedigree. Genetics 182:375-385.
de los Campos, G., J. M. Hickey, R. Pong-Wong, H. D. Daetwyler, and M. P. L. Calus. 2013.
Whole-genome regression and prediction methods applied to plant and animal
breeding. Genetics 193:327-345.
Fritzke, B. 1994. Fast learning with incremental RBF networks. Neural Processing Letters
1:2-5.
Geman, S., E. Bienenstock, and R. Doursat. 1992. Neural Networks and the Bias/Variance
Dilemma. Neural Computation 4:1-58.
Gianola, D., R. L. Fernando, and A. Stella. 2006. Genomic-assisted prediction of genetic
value with semiparametric procedures. Genetics 173:1761 - 1776.
41
Gianola, D., H. Okut, K. Weigel, and G. Rosa. 2011. Predicting complex quantitative traits
with Bayesian neural networks: a case study with Jersey cows and wheat. BMC
Genetics 12:87.
González-Camacho, J. M., G. los Campos, P. Pérez, D. Gianola, J. E. Cairns, G. Mahuku, R.
Babu, and J. Crossa. 2012. Genome-enabled prediction of genetic values using radial
basis function neural networks. Theoretical and Applied Genetics 125:759-771
Gurney, K. 1997. An Introduction to Neural Networks. Taylor& Francis, Inc.
Hagan, M. T., H. B. Demuth, and M. Beale. 1996. Neural network design. PWS, Boston.
Hanrahan, G. 2011. Artificial Neural Networks in Biological and Environmental Analysis.
CRC Press, Inc.
Hastie, T., R. Tibshirani, J. Friedman, and J. Franklin. 2009. The elements of statistical
learning: data mining, inference and prediction. The Mathematical Intelligencer 27:83-
85.
Holland, JB. 2008. Theoretical and biological foundations of plant breeding. In Plant
Breeding: The Arnel R. Hallauer International Symposium: Blackwell Publishing,
pages127-140.
Hornik, K., M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are
universal approximators. Neural Networks 2:359-366.
Kohavi, R. 1995. A study of cross-validation and bootstrap for accuracy estimation and model
selection. The Intrnational Joint Conference on Artificial Intelligence IJCAI, pp 1137-
1145
Kriesel, D. 2007. A brief introduction to neural networks. Available at
http://www.dkriesel.com
Kolmogorov AN. 1957. On the representations of continuous functions of many variables by
superpositions of continuous functions of one variable and addition. Doklady
Akademii Nauk USSR 114:953-956.
Kůrková, V. 1992. Kolmogorov's theorem and multilayer neural networks. Neural Networks
5:501-506
Lampinen, J. and A. Vehtari 2001. Bayesian approach for neural networks review and case
studies. Neural Networks 14:257 – 274
42
Long, N., D. Gianola, G. J. M. Rosa, K. A. Weigel, and S. Avendaño. 2007. Machine learning
classification procedure for selecting SNPs in genomic selection: application to early
mortality in broilers. Journal of Animal Breeding and Genetics 124:377-389.
Long, N., D. Gianola, G. J. M. Rosa, K. A. Weigel, A. Kranis, and O. Gonzalez-Recio. 2010.
Radial basis function regression methods for predicting quantitative traits using SNP
markers. Genetics Research 92:209-225.
MacKay, DJC. 1992a. Baysian interpolation. Neural Computation 4:415 – 447
MacKay, DJC. 1992b. A practical bayesian framework for backpropagation networks. Neural
Computation 4:448-472
MacKay DJC. 2008. Information theory, inference and learning algorithms
Macrossan, P., D. Hand, J. Kok, M. Berthold, H. Abbass, K. Mengersen, M. Towsey, and G.
Finn. 1999. Bayesian neural network learning for prediction in the Australian dairy
industry. Advances in Intelligent Data Analysis. Vol. 1642. Springer Berlin
Heidelberg. Pages 395-406
Mandel, J. 1982. Use of the singular value decomposition in regression analysis. The
American Statistician 36:15-24.
Møller, MF. 1993. A scaled conjugate gradient algorithm for fast supervised learning. Neural
Networks 6:525-533
Mulder, H. A., M. P. L. Calus, T. Druet, and C. Schrooten. 2012. Imputation of genotypes
with low-density chips and its effect on reliability of direct genomic values in Dutch
Holstein cattle. Journal of Dairy Science 95:876-889.
Neal, RM. 1992. Bayesian training of backpropagation networks by the hybrid Monte Carlo
method. Dept of Computer Science, University of Toronto, Tech Rep CRG-TR-92-1
Nea,l RM. 1996. Sampling from multimodal distributions using tempered transitions.
Statistics and computing 6:353-366
Okut, H., D. Gianola, G. J. M. Rosa, and K. A. Weigel. 2011. Prediction of body mass index
in mice using dense molecular markers and a regularized neural network. Genetics
Research 93:189 - 201.
Okut, H., X.-L. Wu, G. J. M. Rosa, S. Bauck, B. W. Woodward, R. D. Schnabel, J. F. Taylor,
and D. Gianola. 2013. Predicting expected progeny difference for marbling score in
43
Angus cattle using artificial neural networks and Bayesian regression models.
Genetics Selection Evolution 45:34.
Park, T. and G. Casella. 2008. The Bayesian Lasso. Journal of the American Statistical
Association 103:681-686.
Pereira, B. D. B. and C. R. Rao. 2009. Data Mining using Neural Networks: A Guide for
Statisticians
http://www.po.ufrj.br/basilio/publicacoes/livros/2009_datamining_using_neural_netw
orks.pdf.
Pérez-Rodríguez, P., D. Gianola, J. M. González-Camacho, J. Crossa, Y. Manès, and S.
Dreisigacker. 2012. Comparison between linear and non-parametric regression models
for genome-enabled prediction in wheat. G3: Genes|Genomes|Genetics 2:1595-1605.
Pérez-Rodríguez, P., D. Gianola, K. A. Weigel, G. J. M. Rosa, and J. Crossa. 2013. Technical
Note: An R package for fitting Bayesian regularized neural networks with applications
in animal breeding. Journal of Animal Science.
Poggio, T. and F. Girosi. 1990. Networks for approximation and learning. Proceedings of the
IEEE 78:1481-1497.
R Development Core Team. 2011. R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing, Vienna, Austria. URL
http://www.R-project.org. R version 2.14.0.
Ripley, BD. 1996. Pattern recognition and neural networks. Cambridge University Press,
Cambridge
Rosenblatt, F. 1958. The perceptron: a probabilistic model for information storage and
organization in the brain. Psychological Review 65:386
Rumelhart, D. E., G. E. Hinton, and R. J. Williams. 1986. Learning representations by back-
propagating errors. Nature 323:533-536.
SAS Institute Inc. 2008. SAS User's Guide, Version 9.2., Cary, NC.
Tusell, L., P. Pérez-Rodríguez, S. Forni, X. L. Wu, and D. Gianola. 2013. Genome-enabled
methods for predicting litter size in pigs: a comparison. animal 7:1739–1749
44
Van Raden, P. M. 2008. Efficient methods to compute genomic predictions. J Dairy Sci
91:4414 - 4423.
Weigel, K. A., G. de los Campos, A. I. Vazquez, G. J. M. Rosa, D. Gianola, and C. P. Van
Tassell. 2010. Accuracy of direct genomic values derived from imputed single
nucleotide polymorphism genotypes in Jersey cattle. Journal of Dairy Science
93:5423-5435.
Wellmann, R. and J. Bennewitz. 2011. The contribution of dominance to the understanding of
quantitative genetic variation. Genetics Research 93:139.
Yang, J., B. Benyamin, B. P. McEvoy, S. Gordon, A. K. Henders, D. R. Nyholt, P. A.
Madden, A. C. Heath, N. G. Martin, G. W. Montgomery, M. E. Goddard, and P. M.
Visscher. 2010. Common SNPs explain a large proportion of the heritability for
human height. Nat Genet 42:565-569.
47
An empirical application of neural networks with back-propagation to genome enabled predictions of complex traits in Holstein Friesian and
German Fleckvieh cattle
Anita Ehret1, David Hochstuhl
2, Daniel Gianola
3,4,5 and Georg Thaller
1
1 Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Germany
2 Institute for Theoretical Physics and Astrophysics, Christian-Albrechts-University, Germany 3 Department of Animal Sciences, University of Wisconsin-Madison, USA
4 Department of Dairy Science, University of Wisconsin, USA 5 Department of Biostatistics and Medical Informatics, University of Wisconsin, USA
Submitted for publication in Genetic Selection Evolution
Abstract
Background: Recently, artificial neural networks (ANN) have been proposed as promising
machines for molecular marker based genomic predictions of complex traits in animal and
plant breeding. ANNs are universal approximators of complex functions, able to capture
cryptic relationships between SNP-markers and phenotypic values without the need of
explicitly defining a genetic model. This concept is attractive for high dimensional and noisy
data, especially when the genetic architecture of the trait is unknown. However, properties of
ANNs in prediction of future outcomes on real data for genomic selection are not well
characterized and, due to high computational costs, using entire marker sets is difficult. We
examined different non-linear network architectures as well as several genomic covariate
structures as network inputs in order to assess the ability of predicting milk traits in three
dairy cattle data sets using large scale SNP data. For training, a regularized back propagation
algorithm was employed. The average correlation between the observed and the predicted
phenotypes in a 20 times 5-fold cross-validation was used to assess predictive ability. A linear
network model served as benchmark.
Results: Predictive abilities of different ANN models varied markedly, whereas differences
between data sets were small. Dimension reduction methods enhanced prediction performance
in all data sets, while at the same time computational cost was reduced. For the Holstein-
Friesian bull data set, an ANN with 10 neurons in the hidden layer achieved a predictive
correlation of r = 0.47 for milk yield when the entire marker matrix was used. Predictive
ability increased with employing the genomic relationship matrix (r = 0.64) as an input and
was best (r = 0.67) when principal component scores of the marker genotypes were used.
Similar results were found for the other traits in all data sets.
Conclusion: ANNs are powerful machines for non-linear genome-enabled predictions in
animal breeding. However, to produce stable and high-quality outputs, variable selection
methods are highly recommended, when the number of markers vastly exceeds sample size.
Keywords: Artificial neural networks; genome-based prediction; milk traits; genomic selection
48
Introduction
In genome-enabled prediction of traits in animal and plant breeding, building appropriate
models can be extremely challenging, especially when the association between predictors and
target variable involves non-additive effects (de los Campos et al., 2009; Wellmann and
Bennewitz, 2009; Gianola et al., 2006; Gianola and van Kaam, 2008). Linear methods
frequently used in genome-enabled predictions typically ignore gene by gene interactions as
well as higher order non-linearities. To meet this challenge, and to take possible non-
linearities into account in prediction, there has been a growing interest in the use of semi- and
non-parametric methods (de los Campos et al., 2013). In this context, machine learning
methods and, in particular, artificial neural networks (ANNs) have been considered to be
promising predictive machineries (Gianola et al., 2011; Tussel et al., 2013; Okut et al., 2013).
Nevertheless, there have been only a few empirical applications of ANNs to genome-enabled
prediction in animal and plant breeding. In what follows, we will first give a short overview
of this method and of the state of the art in the field of animal breeding. For a more detailed
survey on the use of ANNs with respect to genome-enabled predictions, we refer to Ehret et
al. (to be published).
Initially ANNs were developed in the field of artificial intelligence and were first introduced
for image recognition. The central concept was inspired by knowledge of the nervous system,
especially the human brain with its closely connected neurons (Pereira and Rao, 2009). The
idea has been used for defining statistical models in form of neuron diagrams, as shown in
Figure 1.
49
Figure 1 Schematic representation of an artificial neuron (xi = input value; wj = weights linked to
single input values; f(.) = activation function of the artificial neurons ; z = output of artificial neuron; Σ
indicates some computation)
In an idealized artificial neuron, all received input information ix ( i = number of inputs, e.g.
marker genotypes) is weighted via appropriate elements jw and summed up. The sum of all
weighted inputs is transformed by an activation function ( ).f to produce the neuron output z
(e.g. the predicted phenotype). The activation function can be either linear or non-linear and
its purpose is to restrict the amplitude of the neuron's output.
To mimic the physical structure of the human nervous system, artificial neurons are
interconnected and organized in networks with several layers which together form the ANN.
In most applications the information usually flows straight forwardly, thus the output of one
neuron forms the input of another one. Algebraically, an ANN can be represented as a
schematic of Kolmogorov's theorem (Kolmogorov, 1957) for representation of complex
functions, by which ANNs are proven to be universal function approximators (Kurková,
1992). Therefore, in the context of genome-enabled prediction (Meuwissen et al., 2001),
ANNs are theoretically able to account for cryptic interactions between genotypes and
phenotypes without the need of explicitly specifying a fixed genetic model (Gianola et al.,
2011), thus the genetic architecture of the trait may remain unknown a priori.
In short, an appealing property of ANNs is that they do not require most of the prior
assumptions that commonly underlie parametric statistical models. ANNs can be non-linear in
both features and parameters and, if properly specified, may capture complex signals from the
data and deliver a better predictive accuracy. This is achieved through a learning phase where,
for example, several pairs of genotype-phenotype combinations are fed into the network.
50
According to a specific learning rule, the ANN can memorize the function of training
samples. Learning is an iterative process, where at each iteration the weights (connections
between single artificial neurons) of the ANN are steadily adjusted, in order to minimize the
difference between observed and predicted output (e.g. phenotypes) or training error (Gurney,
1997). The process is stopped when a previously defined threshold of the error function in
training samples is reached. Hence, adjusting the weights properly to a given mapping
problem is an optimization task. The most widespread supervised learning rule for ANNs is
the back-propagation of error algorithm (Rumelhart, 1986). It is a supervised learning
algorithm and can be seen either as a gradient descent method to locate the optimal solution
(Werbos, 1994), or as a generalization of the delta rule (Kriesel, 2007). The algorithm is based
on minimization of the error function with respect to the weights and in general employs a
least squares solution, so that the procedure can be viewed as a non-linear regression
algorithm. A trained ANN is able to predict unknown future outcomes of the same physical
process that created the training samples.
ANNs are flexible and powerful and can be implemented in various ways (de los Campos et
al., 2013). In animal breeding, ANNs were claimed to have the ability of outperforming
frequently used standard linear regression models in prediction of yet to be observed
phenotypic values through genomic data (Okut et al., 2013; Tusell et al., 2013; Gianola et al.,
2011). However, ANNs are computationally costly, especially when applied to high
dimensional genomic data, where the number of parameters to be estimated typically exceeds
the number of available samples. Therefore, at least in animal breeding, only subsets of
markers have been used to make ANN computational feasible (Okut et al., 2011; Okut et al.,
2013).
This study is the first application employing an entire genomic marker set as source of input
information for genome-enabled prediction of milk traits in dairy cattle. The predictive
performance of ANNs will depend on network architecture, training phase and peculiar
characteristics of the data (Gianola et al., 2011; Ehret et al., to be published). In order to
assess the importance of these factors, we applied different ANN structures to prediction of
yet to be observed phenotypic values from large scale SNP data. A linear ANN with one
neuron in the hidden layer and with the genomic relationship matrix (VanRaden et al., 2008)
used as input information served as a benchmark. Such an ANN produces results
approximately corresponding to those of GBLUP (Pérez-Rodríguez et al., 2013; Gianola et
al., 2011), which is a standard method for genome-enabled prediction in animal breeding.
51
Material and Methods
Single hidden layer feed-forward ANN with back-propagation
Research on theory and applications of artificial networks is steadily increasing. Several types
of ANNs have been extensively used for various purposes, such as classification, pattern
recognition, prediction and forecasting, process control, optimization and decision support
(Hanrahan, 2011). A frequently used type of ANN for regression and forecasting is the two-
layer feed-forward perceptron (Ehret et al., to be published), also called single hidden layer
feed-forward neural network. These are ANNs where an input layer of source nodes and an
output unit are completely linked, with only one hidden layer between them, as illustrated in
Figure 2.
Figure 2 Architecture of a single hidden layer feed-forward neural network.
xij = network input e.g. marker genotype j of individual i; w1m = network weight from the input to
hidden layer; w2s = network weight from the hidden to the output layer; yi network output e.g.
predicted phenotype of individual ; f(.) = activation function at the hidden neurons; g(.) = activation
function at the output neuron; Σ indicates some computation
52
This kind of network is able to reproduce most mathematical functions fairly well, while
keeping a simple architecture. Moreover, it has good properties when working with high
dimensional data, as it is the case in genome enabled predictions (Ehret et al., to be
published).
Mathematically, the mapping of such an ANN can be viewed as a two-step regression (Hastie,
2009). The central concept is to extract in the hidden layer linear combinations of the inputs
as basis functions and then model the target as a function of these basis functions in the output
layer. In terms of genome-enabled prediction, in the hidden layer the genomic covariates ijx
(for mj ,,1K= , where m denotes the number of genomic covariates) of an individual i (for
ni K1= ) are linearly combined with a vector of weights [ ]t1jw that are specified in the training
phase, plus an intercept, in ANN’s terminology also called "bias" ta with st ,,1K= denoting
a neuron.
The resulting linear score is then transformed using an activation function ( ).f to produce the
output of the single hidden neuron
[ ] [ ]( )ij
m
j
t
jtt
t
i xwafz ∑ =+=
1 1 . (1)
In order to model some non-linear relationship between phenotype and input, the hyperbolic
tangent activation function ( ( )xx
xx
ee
eex
−
−
+
−=tanh ) can be employed in the hidden neurons,
giving the ANN a greater flexibility than that of standard linear regression models (Mackay,
2003).
In the output layer, the s genotype-derived basis functions, resulting from the hidden layer,
are also linearly combined by using the swww 22221 ,,, K weights and an output intercept b . In
the output neuron, the resulting linear score is transformed, this time by use of a linear
activation function ( ).g , to calculate the predicted phenotype of individual i (for ni ,,1K= ),
as
[ ]( )t
i
s
t tti zwbgy ∑ =+=
1 2 . (2)
The linear activation function is often an identity function.
At the training phase, the ANN locates the optimal weights by minimization of an error
function of the training set. Here, a back-propagation algorithm was used for training. It is a
convenient and simple iterative algorithm that usually performs well, even with complex data.
Unlike other learning algorithms (like Bayesian learning) it has good computational properties
when dealing with large scale data. To enforce the network generalization ability,
53
regularization was used for training, since this is not naturally achieved through the learning
algorithm. The algorithm either was stopped when the maximum number of 1000 iterations
was reached, or when the averaged mean squared error ( aMSE ) between predicted and true
phenotype reached a certain threshold ( 310−≤aMSE ).
Parameters of the learning algorithm were optimally adjusted in the individual data sets in a
pre-processing step, and subsequently were held constant for all following runs. In each
training round, the weights were initialized to small random values (ranging between -0.1 and
0.1). This helps the algorithm to find the optimal solution. Beyond choosing the best training
configuration of the learning algorithm, we examined different artificial neural network
architectures to assess the best predictive ANN. Up to 20 neurons in the hidden layer were
tested for their influence on predictive quality. All ANN calculations were performed using a
C++ program written by the authors, while the data pre-processing was done with the publicly
available statistical software R (R Core Team, 2011).
Benchmark model
To compare the non-linear ANN models with a standard method used in animal breeding, all
data sets were also evaluated using a quasi GBLUP. Here, the input variable was the genomic
relationship matrix G which was fed into an ANN with one neuron in the hidden layer and
linear activation functions in the hidden as well as in the output layer. The network is similar
to GBLUP, in the sense that the network performs a multiple linear regression, in which the
weights of the hidden layer can be interpreted as regression coefficients. When G is
proportional to 'XX , where X is the incidence matrix of a linear regression model on
markers, this is equivalent to ridge regression (Gianola et al., 2011; Pérez-Rodríguez et al.,
2013), as it is the case in GBLUP (Whittaker, 2000).
Phenotype and genotype data
To evaluate the impact of data structure, three different data sets were presented separately to
the ANN. Inputs were genomic data on 3,341 German Fleckvieh bulls, 2,303 Holstein-
Friesian bulls, and 777 Holstein-Friesian dams. All animals were genotyped with a 50k SNP-
panel and, after quality control, 39,344, 41,995 and 41,718 SNP markers were used in the
analyses respectively, as shown in Table 1. Quality control included eliminating SNPs with
minor allele frequency <0.05 and missing genotype frequency >0.95. For the remaining loci,
missing genotypes were imputed using the population based imputing algorithm Minimac
54
(Howie et al., 2012), a computationally efficient extension of MaCH (Li et al., 2010) which
takes pre-phased haplotypes as inputs (Pausch et al., 2013).
Table 1 Data used.
Dataset Animals in
analysis
Number of markers
after quality control
Type of phenotype
records
German Fleckvieh bulls 3341 39344 SNPs DYD of milk traits
Holstein Friesian bulls 2303 41995 SNPs DYD of milk traits
Holstein Friesian dams 777 41718 SNPs YD of milk traits SNP = single nucleotide polymorphism; YD = yield deviations; DYD = daughter yield deviations.
As phenotypic records, three milk traits (milk, fat and protein yield) were employed. For the
Holstein-Friesian and German Fleckvieh bulls, daughter yield deviations (DYD) were used as
phenotypes and, for the Holstein-Friesian cows, yield deviation (YD) was the response
variable. A summary of the phenotypes is given in Table 2.
Table 2 Summary statistics of the phenotypes.
Mean Variance Min Max
German Fleckvieh bulls
Milk yield DYD 1779.16 219257.40 -852.48 3372.67
Protein yield DYD 59.34 214.18 -23.56 108.65
Fat yield DYD 63.94 320.81 -39.12 137.11
Holstein Friesian bulls
Milk yield DYD 707.44 434324.64 -852.09 3706.01
Protein yield DYD 41.88 391.42 -24.19 104.57
Fat yield DYD 41.14 645.42 -45.81 139.74
Holstein Frisian dams
Milk yield YD 3.26 26.13 -14.36 19.37
Protein yield YD 0.91 1.54 -4.86 4.23
Fat yield YD 0.21 0.50 -4.17 1.80 DYD = daughter yield deviations; YD = yield deviations
55
Feature scaling was applied to the data sets, to enhance numerical stability. This is needed, as
otherwise the learning algorithm may not work properly (Hastie et al., 2009) Feature scaling
ensures that all sources of information are treated equally in the training process as it often
has a large influence on the final solution. In particular, inputs and outputs must be in the
same scale, i.e., ranging approximately from -1 to 1. For all phenotypes the following
normalization was used
y
yi
iMax
yy
µ−=
* , (3)
where yµ is the sample mean of the variable and yMax is its maximum.
Genomic information
To test the impact of different genomic inputs on ability of the ANN to predict yet to be
observed phenotypes, three different genomic covariate structures were used for all data sets.
First, the raw genomic marker matrix { }ijx=X of all SNPs of all individuals was employed.
X is of dimension mn× where n is the number of animals and m the number of markers.
Here, feature scaling was accomplished by coding SNP genotypes as -1, 0, and 1 for the
homozygote for the minor allele, heterozygote, and homozygote for the other allele, assuming
additive allele effects (Ehret et al., to be published).
Second, aiming to reduce model complexity and computational cost, genome-derived
relationships among individuals were also used as inputs. The genomic relationship matrix
{ }ijg=G was calculated following VanRaden (2008),
∑=
−
−−=
m
j
jj qq1
)1(2
)')(( EXEXG . (4)
Here, G is a standardized genomic relationship matrix, where genotype codes in { }ijx=X
are centered by subtracting their expected frequencies (qj) at each locus. The dimension of the
resulting matrix is nn× . Third, to minimize the loss of information in the original marker
matrix, while keeping the dimension small, principal component scores ( UD ) of X were
used as inputs as well. The UD is obtained from the singular value decomposition of X
(Mandel et al., 1982),
'UDVX = . (5)
56
Here, U is an nn × orthogonal matrix, where the columns consist of eigenvectors of 'XX .
D is a matrix of dimension mn× containing the square roots of the non-zero eigenvalues of
'XX on the diagonal, and columns of the mm × matrix V are the eigenvectors of XX'
(Tusell et al., 2013).
For feature scaling, the G and UD matrices were linearly transformed using the
normalization function of the package brnn (Pérez-Rodríguez et al., 2013) in the R program
(R Core Team 2011) so all elements of the resulting input matrices ranged between -1 and 1.
Model validation
For comparing predictive abilities of the ANN models in the different scenarios, a five-fold
cross validation scheme was applied and repeated 20 times (Kohavi, 1995). The data sets
were randomly divided into five subsets of genotypes and associated phenotypes. One subset
(testing set) was left out for testing the predictive ability of the model, whereas the other four
subsets were used as training samples (training set) to estimate model parameters. During
cross-validation runs, each of the five generated subsets served as testing set in one round,
with missing phenotypes. At every round, Pearson's correlation coefficient ( r ) between
observed and predicted phenotypes in the testing set was calculated. Since 20 different
randomizations assigning the genotype-phenotype combinations to five folds were used, this
scheme yielded 100 independent cross-validation runs. For the ANNs, different initializations
of back-propagation weights in [-0.1, 0.1] were used, to avoid that the algorithm repeatedly
gets stuck in a local minimum. The predictive ability of each model reported here was
Pearson's correlation between observed and predicted phenotype values averaged over all 100
individual cross-validation runs.
Results and Discussion
Figure 3 presents the combined results for all prediction scenarios tested. It includes results
from different data sets in columns, with milk yield, protein yield and fat yield as response
variable in the rows. The single panels (a-h) show the dependency of the average Pearson's
correlation coefficients of cross-validation runs on network architecture (1 to 20 neurons in
the hidden layer) for different genomic covariate structures ( X , G , UD ) used as input to the
ANNs.
57
Figure 3 Comparison of predictive abilities for all scenarios.
Different data sets are in the columns, in rows milk, protein and fat yield are shown. Panels (a-h) show
the average Pearson’s correlation coefficients over cross-validation runs on the vertical axis, and the
number of hidden neurons tested on the horizontal axis. Results of different genomic covariate
structures used as inputs (X, G, UD) are presented in each panel.
Predictive ability of different ANN architectures and input factors
The ANN models differed little in predictive performance in terms of number of neurons in
the hidden layer when either the G matrix or the UD matrix was used. This was observed in
all data sets for all traits, as shown in Figure 3. Results are consistent with those of Okut et al.
(2011), who showed that predictive ability of ANN models did not depend on network
architecture when sample size was larger than the number of markers used in the analyses.
Our results agreed when the number of features was equal to sample size (G and UD ).
Results obtained using the G matrix as input to the network are consistent with those of
58
Gianola et al. (2011), who employed Bayesian regularized artificial networks to genome-
enabled predictions of milk traits in Jersey cows. Even with much larger data sets and
different training algorithm, we also found that increasing the number of neurons up to 6
neurons yielded slightly better predictions of yet-to-be-observed values, than when using very
simple architectures (1 or 2 neurons in the hidden layer) when the G matrix is used as input
to the network.
Furthermore, predictive abilities of the ANN models in the bull data sets were slightly better
when using the UD matrix than when using G as inputs to the network. Differences between
these two inputs in the cow data set were negligible. The same result was obtained by Tusell
et al. (2013) when radial basis function networks were used to predict litter size in pigs. This
might be due to the fact that the UD decomposition re-expresses the data taking its variance
into account. This mapping maybe is more accurate and numerically stable for predictions
than using G .
However, when the marker matrix X was fed into the ANNs, so the number of features
vastly exceeded sample size, Pearson's correlation coefficients in the cross-validation runs
were tightly dependent on the number of neurons used in the hidden layer of the network. A
similar pattern was obtained in all data sets and traits. Averaged over all three data sets, the
correlations achieved differed by 0.09 across different network architectures, when the X
matrix was used. The maximum range was achieved in the Holstein-Friesian dams and protein
yield (Figure 3h) in which r ranged between [0.13, 0.29] across different architectures. The
minimum range of 0.07 (r = [0.41, 0.48]) across different number of neurons in the hidden
layer of the ANN was obtained in the German Fleckvieh data set for milk and protein yield
(Figure 3d,g).
The results indicate, that when using X , so that the number of features is larger than sample
size, a relatively simple ANN architecture (2-6 neurons in the hidden layer) is not able to
learn specifications of the data, whereas with complex architectures (over 15 neurons in the
hidden layer) the ANN will learn irrelevant details of the data. These are phenomena called
under and over-fitting, respectively. Both types of architectures make prediction of future
target values worse. In such situations less parameterized models, like linear ones, might be
advantageous. Furthermore, prediction performance was substantially worse when using the
whole marker set as input into the network. This is probably due to over-parameterization of
the model, as the number of effective parameters grows quickly with the number of inputs.
The effect was independent of the data set used, and using as X increased computational cost
markedly. In the Holstein-Friesian dams the pattern described was slightly different, notably
59
when protein and fat yield were predicted (Figure 3f,i). This might be due to the lower sample
size in this data set. Since the ratio between number of markers and sample size was large,
network complexity grew quickly when the number of neurons increased. Further, in all runs
learning was stopped when the maximum number of iterations was reached, instead of
reaching the optimal aMSE in the training configuration.
Furthermore, when the X matrix was used, the results showed that even an architecture with
one neuron in the hidden layer performed good predictions, as it is approximately a multiple
linear regression on marker genotypes. These results confirm the strength of linear methods in
p>>n problems.
Predictive ability across different data sets and traits
The highest average correlations between observed and predicted phenotypes in the testing
sets were obtained with the German Fleckvieh data set, followed by the prediction of DYD in
the Holstein-Friesian bull data set. Prediction of future YD in Holstein-Friesian dams was the
worst. Using YD yielded lower correlation coefficients and the ANNs failed to give accurate
predictions of measured YD values. This is simply because DYD are always based on much
more information than YD on cows. However, this might be also influenced by the lower
number of animals in the cow data set.
Within data sets the predictive ability of the models varied only slightly between traits (Figure
3). This might reflect strong additive action for the traits examined, and the fact that the linear
pre-corrections used for processing the phenotype tend to normalize the distributions. Thus
the traits behaved similarly, as expected, because phenotypic and marker-based genetic
correlations between traits were high, as shown in Table 3.
60
Table 3 Phenotypic and marker-based genetic correlations between traits within data sets; on
diagonal of singular panels the marker-based heritability is shown; on the upper off-diagonal the
marker-based genetic correlation and on the lower off-diagonal the phenotypic correlation are
presented.
German Fleckvieh bulls
Milk yield Protein yield Fat yield
Milk yield DYD 0.87 (0.04) 0.58 (0.03) 0.73 (0.04)
Protein yield DYD 0.70 (0.01) 0.79 (0.04) 0.62 (0.03)
Fat yield DYD 0.89 (0.01) 0.81 (0.01) 0.77 (0.04)
Holstein Friesian bulls
Milk yield Protein yield Fat yield
Milk yield DYD 0.67 (0.05) 0.24 (0.04) 0.52 (0.04)
Protein yield DYD 0.43 (0.02) 0.82 (0.05) 0.42 (0.04)
Fat yield DYD 0.86 (0.01) 0.63 (0.01) 0.60 (0.04)
Holstein Friesian dams
Milk yield Protein yield Fat yield
Milk yield YD 0.61 (0.08) 0.24 (0.06) 0.51 (0.08)
Protein yield YD 0.48 (0.03) 0.67 (0.08) 0.31 (0.07)
Fat yield YD 0.92 (0.01) 0.60 (0.02) 0.51 (0.08) Standard errors (SE) are shown in brackets; DYD = Daughter yield deviation, YD = Yield deviation.
Predictive ability compared to a standard linear model
Additionally, we investigated a linear ANN, which performs as a quasi GBLUP, as a
benchmark model. As shown in Table 4 more complex non-linear architectures could not
outperform the linear ANN, i.e. a multiple linear regression on marker relationships.
61
Table 4 Model comparison of linear and non-linear models with same architecture and input
information (1 neuron in hidden layer and G matrix as input to the network) and best non-linear ANN.
linear ANN non-linear ANN best non-linear ANN
r r r
German Fleckvieh bulls
Milk yield DYD 0.68 (0.0007) 0.52 (0.0016) 0.68 (0.0008)
Protein yield DYD 0.68 (0.0006) 0.53 (0.0011) 0.67 (0.0005)
Fat yield DYD 0.66 (0.0005) 0.56 (0.0008) 0.65 (0.0005)
Holstein Friesian bulls
Milk yield DYD 0.60 (0.0006) 0.53 (0.0011) 0.58 (0.0008)
Protein yield DYD 0.59 (0.0009) 0.50 (0.0013) 0.57 (0.0009)
Fat yield DYD 0.57 (0.0009) 0.51 (0.0010) 0.56 (0.0009)
Holstein Friesian dams
Milk yield YD 0.47 (0.0031) 0.44 (0.0040) 0.47 (0.0027)
Protein yield YD 0.37 (0.0033) 0.35 (0.0039) 0.35 (0.0032)
Fat yield YD 0.46 (0.0037) 0.39 (0.0049) 0.47 (0.0028) DYD = Daughter yield deviation, YD = Yield deviation, r = average Pearson correlation coefficient of
the cross-validation runs, variance is shown in brackets.
Differences between linear ANN and best predictive ANN were very small. Nevertheless, the
linear ANN was superior to a non-linear ANN with the same architecture, even though they
were fed with the same input information ( G ). This pattern was consistent over all data sets,
independently of the trait investigated, and pronounced when no dimension reduction of input
to the network was made. Over all, the results indicate that linear methods are reliable when
working with large scaled data, and provide as good results as the much more computationally
intensive non-linear ANNs when milk traits are used as response variable in genome-enabled
prediction.
Conclusions
We used several ANN models for genome-enabled prediction using large scale SNP-panels
and investigated the influence of various inputs and architectures with milk traits in different
dairy cattle data sets. The aim was to assess the impact of data structure, type of genomic
information used as input to the network, and network architecture on the predictive
performance.
Our results indicated that dimension reduction yielded higher, more accurate and more
consistent predictions of future phenotypes, irrespective of trait and data set used. Thus, we
recommend feature selection methods and regularization in the training phase of an ANN (e.g.
weight decay (Kriesel, 2007)) for genome enabled-predictions on large SNP-panels. In this
62
context, the large number of parameters in a richly structured ANN impairs its predictive
power, and our results confirm the robustness of linear methods. However, we wish to
underline the potential of ANNs for mapping non-linear relationships between genotype and
phenotype. In particular this method is expected to show its strength in traits were evidence of
non-linearity is present, and where less parameterized linear methods fail. Perhaps ANNs are
maybe more useful for functional traits than for milk traits, which seem to behave additively
and can be predicted well with linear methods. Nevertheless, back-propagation with early
stopping is a useful learning algorithm for ANNs for genome-enabled predictions from large
scale SNP information, in the sense that a regularized back-propagation learning algorithm
keeps computational cost as low as possible, while maintaining good predictive performance,
when feature selection is used.
63
Author’s contributions
AE conceived the study, performed all computations and drafted the manuscript. DH provides
the computation program and supervised the computations. DG helped conceived the study
and provided critical insights and GT supervised the studies and revised the manuscript.
Acknowledgements
This research was supported by the German Federal Ministry of Education and Research
(BMBF) within the AgroClustEr “Synbreed – Synergistic plant and animal breeding”. The
authors thank Claas Heuer (Institute of Animal Breeding and Husbandry, Christian-Albrechts-
University, Olshausenstr. 40, 24098 Kiel, Germany) for the comments and suggestions to the
manuscript.
References
de los Campos, G., H. Naya, D. Gianola, J. Crossa, A. s. Legarra, E. Manfredi, K. Weigel,
and J. M. Cotes. 2009. Predicting quantitative traits with regression models for dense
molecular markers and pedigree. Genetics 182:375-385.
de los Campos, G., J. M. Hickey, R. Pong-Wong, H. D. Daetwyler, and M. P. L. Calus. 2013.
Whole-genome regression and prediction methods applied to plant and animal
breeding. Genetics 193:327-345.
Gianola, D., R. L. Fernando, and A. Stella. 2006. Genomic-assisted prediction of genetic
value with semiparametric procedures. Genetics 173:1761 - 1776.
Gianola, D. and J. van Kaam. 2008. Reproducing kernel hilbert spaces regression methods for
genomic assisted prediction of quantitative traits. Genetics 178:2289 - 2303.
Gianola, D., H. Okut, K. Weigel, and G. Rosa. 2011. Predicting complex quantitative traits
with Bayesian neural networks: a case study with Jersey cows and wheat. BMC
Genetics 12:87.
Gurney, K. 1997. An Introduction to Neural Networks. Taylor& Francis, Inc.
Hanrahan, G. 2011. Artificial neural networks in biological and environmental analysis. CRC
Press
Hastie, T., R. Tibshirani, J. Friedman, and J. Franklin. 2009. The elements of statistical
learning: data mining, inference and prediction. The Mathematical Intelligencer 27:83-
85.
64
Howie, B., C. Fuchsberger, M. Stephens, J. Marchini, and G. R. Abecasis. 2012. Fast and
accurate genotype imputation in genome-wide association studies through pre-
phasing. Nat Genet 44:955-959.
Kohavi, R. 1995. A study of cross-validation and bootstrap for accuracy estimation and model
selection. The Intrnational Joint Conference on Artificial Intelligence IJCAI, pp 1137-
1145
Kolmogorov, A.N. 1957. On the representation of continuous functions of many variables by
superposition of continuous functions of one variable and addition 114:953–956
Kriesel, D. 2007. A brief introduction to neural networks. Available at
http://www.dkriesel.com
Kurková, V. 1992. Kolmogorov’s theorem and multilayer neural networks. Neural Networks
5:501–506 (1992)
MacKay, D.J. 2003. Information theory, inference and learning algorithms. Cambridge
university press (2003)
Mandel, J. 1982. Use of the singular value decomposition in regression analysis. The
American Statistician 36:15–24
Meuwissen, T. H. E., B. J. Hayes, and M. E. Goddard. 2001. Prediction of total genetic value
using genome-wide dense marker maps. Genetics 157:1819 - 1829.
Okut, H., D. Gianola, G. J. M. Rosa, and K. A. Weigel. 2011. Prediction of body mass index
in mice using dense molecular markers and a regularized neural network. Genetics
Research 93:189 - 201.
Okut, H., X.-L. Wu, G. J. M. Rosa, S. Bauck, B. W. Woodward, R. D. Schnabel, J. F. Taylor,
and D. Gianola. 2013. Predicting expected progeny difference for marbling score in
Angus cattle using artificial neural networks and Bayesian regression models.
Genetics Selection Evolution 45:34.
Pausch, H., B. Aigner, R. Emmerling, C. Edel, K.-U. Gotz, and R. Fries. 2013. Imputation of
high-density genotypes in the Fleckvieh cattle population. Genetics Selection
Evolution 45:3.
Pérez-Rodríguez, P., D. Gianola, K. A. Weigel, G. J. M. Rosa, and J. Crossa. 2013. Technical
Note: An R package for fitting Bayesian regularized neural networks with applications
in animal breeding. Journal of Animal Science.
65
Rumelhart, D. E., G. E. Hinton, and R. J. Williams. 1986. Learning representations by back-
propagating errors. Nature 323:533-536.
R Development Core Team. 2011. R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing, Vienna, Austria. URL
http://www.R-project.org. R version 2.14.0.
Tusell, L., P. Pérez-Rodríguez, S. Forni, X. L. Wu, and D. Gianola. 2013. Genome-enabled
methods for predicting litter size in pigs: a comparison. animal 7:1739–1749
VanRaden, P. 2008. Efficient methods to compute genomic predictions. Journal of dairy
science 91:4414–4423
Wellmann, R. and J. Bennewitz. 2011. The contribution of dominance to the understanding of
quantitative genetic variation. Genetics Research 93:139
Werbos, P.J. 1994. The Roots of Backpropagation: from Ordered Derivatives to Neural
Networks and Political Forecasting vol. 1. John Wiley & Sons
Whittaker, J. C., R. Thompson, and M. C. Denham. 2000. Marker-assisted selection using
ridge regression. Genetics Research 75:249-252.
69
Use of genomic and metabolic information as well as milk performance records for prediction of subclinical ketosis risk via
artificial neural networks
A. Ehret1, D. Hochstuhl
2, N. Krattenmacher
1, J. Tetens
1, M. S. Klein
3, W. Gronwald
4,
and G. Thaller1
1Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Germany
2Institute for Theoretical Physics and Astrophysics, Christian-Albrechts-University, Germany 3 Faculty of Kinesiology, University of Calgary, Alberta, Canada
4 Institute of Functional Genomics, University of Regensburg, Germany
Submitted for publication in Journal of Dairy Science
Abstract
Subclinical ketosis is one of the most prevalent metabolic disorders in high producing dairy
cows during early lactation. This renders its early detection and prevention important for both
economical and animal welfare reasons. Construction of reliable predictive models is
challenging, because traits like ketosis are commonly affected by multiple factors. In this
context, machine learning methods offer great advantages due to their universal learning
ability and flexibility in integrating various sorts of data. Here, an artificial neural network
approach was applied to investigate the utility of metabolic, genetic and milk performance
data, alone and in various combinations, for the prediction of milk levels of BHBA, an
established marker of subclinical ketosis risk, within and across consecutive weeks
postpartum. Data were collected from 218 dairy cows during their first five weeks in milk. All
animals were genotyped with a 50k SNP-panel and weekly information on the concentrations
of the milk metabolites glycerophosphocholine (GPC) and phosphocholine (PC) as well as
milk composition data (milk yield, fat and protein percentage) was available. The
concentration of beta-hydroxybutyric acid in milk was used as the target variable in all
prediction models. Average correlations between observed and predicted target values up to
0.643 could be obtained, if milk metabolite and routine milk recording data were combined
for prediction at the same day within weeks. Predictive performance of metabolic as well as
milk performance based models was higher than of models based on genetic information. In
summary, performance of artificial neural networks is encouraging for predicting lowly
heritable traits, such as ketosis.
Keywords: Artificial neural network, prediction, ketosis, milk metabolite
70
Introduction
In dairy cattle, breeding for an increased milk yield has led to a pronounced energy deficit
postpartum (Veerkamp and Koenen, 1999). Cows experiencing a strong negative energy
balance in early lactation are more prone to metabolic disorders (Collard et al., 2000) and
related diseases. Ketosis or acetonaemia is one of the most prevalent metabolic disorders in
high producing dairy cows showing the highest incidence within the first weeks of lactation
(Duffield et al., 2009; Asl et al., 2011). While clinical cases are comparatively rare,
prevalence of up to 28.3% have been reported for subclinical ketosis in European countries
(Suthar et al., 2013). Subclinical stages of ketosis are associated with decreased milk
production and an increased risk of displaced abomasum, lameness and metritis (Dohoo and
Martin, 1984). Detection of the preclinical disease stages would allow for a timely treatment
of affected animals, which is beneficial, both, from an economical and an animal welfare
point of view (Asl et al., 2011). Subclinical ketosis (SCK) is characterized by an increased
production of ketone bodies, such as BHBA, acetone and acetoacetate in the absence of
clinical signs of ketosis (Andersson, 1984). Ketone body concentrations can be readily
measured in both, blood or milk, (Enjalbert et al., 2001) and BHBA levels are well-
established indicators of early stage subclinical ketosis (Dohoo and Martin, 1984; Duffield et
al., 1997; Suthar et al., 2013). A serum BHBA concentration of ≥ 1200-1400 µmol/L is
typically defined as SCK (Geishauser et al., 2000; LeBlanc, 2010). However, even cows with
serum BHBA concentrations exceeding 1000µmol/L in the first week of lactation are already
almost 14 times more likely to develop displaced abomasum (Seifi et al., 2011).
Concentrations of BHBA have been reported to be approximately ten times smaller in milk as
compared to plasma (Klein et al., 2013) and equivalently smaller threshold concentrations
have to be applied in ketosis detection (van der Drift et al., 2012a).
Exceeding a certain BHBA threshold indicates an already existing ketotic state, but in order to
reach the goal of ultimately preventing SCK, an early forecast of individual ketosis risk is
required. However, prediction of functional traits which are commonly affected by multiple
factors is extremely challenging. Typically, those traits are characterized by a low heritability
and phenotypic measurements are often costly and not part of routine recording. Thus, it is
reasonable to exploit new sources of information for the detection of preketotic metabolic
conditions. The rapidly evolving field of metabolomics is promising in terms of enabling a
direct readout of biological processes involved in disease progress. The consequences are
twofold: on the one hand decreasing costs of metabolic measurements allow obtaining an
increasing amount of valuable information on the metabolic status of individual animals. On
71
the other hand, this raises profound questions on how to utilize this sort of information for
disease prevention. Recently, Melzer et al. (2013) proposed the integration of metabolite
profiles in SNP-based prediction of milk traits. However, aiming at a more precise detection
of evolving ketosis, it is of high interest for breeders as well as for developers of processes
and control strategies in dairy production to find ways to integrate multiple sources of
information (e.g. performance records, metabolic and genomic information). For such
purposes, machine learning methods like artificial neural networks (ANN) are promising
tools. They are able to learn specific data patterns and especially ANNs are considered
universal approximators of any type of arithmetic function (Hornik et al., 1989). Therefore,
they have gained considerable popularity in predicting and forecasting complex systems,
including various applications in animal and plant sciences (Gianola et al., 2011). ANNs are
able to cover cryptic interactions between various predictors and response variables without a
need to explicitly specify a fixed model. For the prediction of ketosis risk this approach
allows for the simultaneous incorporation of different types of information to enhance
prediction accuracy and performance.
A recent study on potential predictor variables for ketosis showed, that dairy cows with ratios
of milk glycerophosphocholine (GPC) to milk phosphocholine (PC) equal or greater than 2.5
are significantly less prone to ketosis within and over subsequent lactations (Klein et al.,
2012). Raised GPC/PC ratios and GPC concentrations in milk are believed to reflect an
animal’s ability to mobilize fatty acids from the breakdown of blood phosphatidylcholines
rather than from body fat as a source for milk lipid synthesis (Klein et al. 2012). Additionally,
genome-wide association studies revealed marker genotypes associated with ketosis linked
factors (Buitenhuis et al., 2013; Tetens et al., 2013; Gladdis et al., 2014), underscoring the
potential contribution of genetic factors to disease development. Furthermore, changes in milk
composition are an indirect indicator of the health status of a cow (e.g. Hamann and Krömker
1997). The close relationship between metabolic disorders and energy balance allows for the
utilization of energy balance traits (like fat to protein ratio) in the prediction of disease risk
(Buttchereit et al., 2010). Performance records for milk yield and milk composition like fat
and protein percentage, protein to fat ratio, as well as dry matter intake (DMI) are promising
indicators for the susceptibility to metabolic disorders (Duffield et al., 1997).
The main objective of this study was to systematically assess the applicability of different
sources of information in predicting subclinical ketosis in early lactation. Prediction was
accomplished via ANNs for the same day within one week and for consecutive weeks
(forecast). The concentration of BHBA in milk is a direct measure of the ketotic state of a
72
cow and was used as the target variable. Several combinations of metabolic and genetic
information as well as performance records of cows from the first weeks of lactation were
used alone and in various combinations as predictor variables and their influence on
prediction accuracy was investigated.
Material and Methods
Data collection
Genomic and metabolic data as well as information on milk yield and milk composition of
218 dairy cows in early lactation were available. Data were collected on a weekly basis
between October 2008 and June 2010 on the dairy research farm Karkendamm, which is
affiliated to the Institute of Animal Breeding and Husbandry of Kiel University, Germany.
Milk samples were collected weekly during the first five weeks in milk. Eighteen animals had
full records for five consecutive weeks and 41 animals had complete records for four
consecutive weeks, but the vast majority of cows (N=159) had consecutive records for only
three or less weeks. While precluding the analysis of time series for predicting the SCK risk,
611 weekly records of 218 cows were available for the prediction within and across
consecutive weeks as shown in Table 1.
Table 1. Structure of data sets used for the prediction of milk BHBA within and across the
first five weeks in milk.
Lactation week, predictors1 were
recorded in
Lactation week, BHBA was
predicted for
No. of animals with
complete records
Within weeks
1 1 060
2 2 121
3 3 149
4 4 142
5 5 119
Across weeks (forecast)
1 2 034
2 3 085
3 4 101
4 5 083 1Milk concentrations of glycerophosphocholine and phosphocholine, milk performance
records
73
The milk levels of GPC and PC as well as their ratio were used as predictor variables, while
the concentration of BHBA in milk (see Figure 1) served as the target variable.
Figure 1 Boxplots of BHBA concentrations in milk samples across the first five Weeks in milk.
The three variables had been measured by nuclear magnetic resonance spectroscopy as
described previously (Klein et al., 2012). Additionally, for all cows Weekly milk performance
records of milk yield and protein and fat percentage were available.
All animals were genotyped using the Illumina BovineSNP50 BeadChip® (Illumina Inc., San
Diego, CA, USA) comprising a total number of 54,001 single nucleotide polymorphisms
(SNPs). SNPs with a minor allele frequency < 0.05 and missing genotype frequency > 0.05
were discarded from the analyses resulting in final set of 41,423 markers. Sporadically
missing genotypes were imputed using the population based imputing algorithms MaCH (Li
et al., 2010) and its extension Minimac (Howie et al., 2011).
74
Data pre-processing
Aiming at a reduction of model complexity and computational cost, genomic information was
incorporated in the prediction using the principal component scores, referred to as UD matrix,
of the original genotype matrix X. The UD matrix is of dimension n x n with n being the
number of animals and is obtained from the singular value decomposition of the original
genotype matrix X (Mandel, 1982):
'UDVX = .
Here, U is an orthogonal matrix and the columns are the eigenvectors of 'XX . D is a matrix
containing the square roots of the nonzero eigenvalues of XX' , and the columns of V are the
eigenvectors of XX' (Tussel et al., 2013). The UD decomposition re-expresses the data
taking its variance into account.
Additionally, feature scaling was performed for all metabolites, metabolite ratios and milk
performance records as well as for the obtained principle component score matrix of
genotypes ( UD ), to incorporate different variables measured on different scales into the
model. Basically, feature scaling is a normalization process where variables are linearly
transformed. In order to achieve the scaling of all elements to a range between -1 and 1, the
normalization function of Perez et al. (2013) was used.
Artificial neural networks
For prediction, a multilayer feed-forward artificial neural network (ANN) was used, which is
part of a wide range of machine learning methods for data mining processes and was firstly
introduced to image recognition processes. ANNs are very flexible prediction machines
allowing for the incorporation of several types of data. According to the learning rule, an
ANN can be trained for predicting unknown outcomes of the same task as training samples.
For further information, in particular to applications concerning animal and plant science we
refer to Gianola et al. (2011).
We implemented a single hidden layer feed-forward network with five neurons in the hidden
layer and a learning rate equal to 0.02. For further information regarding fitting ANNs we
refer to Kriesel (2007).
75
Further, we employed the non-linear hyperbolic tangent function to the hidden neurons,
aiming at giving the network a greater flexibility for capturing the relation between different
combinations of information sources and milk BHBA concentrations (in the following also
referred to as SCK risk).
In the output layer one neuron with a linear activation was used. For the training process a
regularized back-propagation algorithm was used, which corresponds to a non-linear least
squares optimization. All models were run using a C++ program written by the authors,
whereas data pre-processing was done in R (Version 2.14.1, R Core Team 2011).
Prediction models
We evaluated the predictive performance of several models for SCK risk. In detail, we
investigated different combinations of information to assess their impact on prediction ability
of BHBA concentrations, within and across consecutive Weeks. An overview of all chosen
prediction models is presented in Table 2.
Table 2. Schematic overview of selected combinations of information sources for the
different prediction models
GPC/PC1 GPC2+PC3 MPR4 UD5
Within Weeks
GPC/PC1 X x
GPC2+PC3 x x x
MPR4 x x x
UD5 x x x
Across Weeks
GPC/PC1
GPC2+PC3 x x x
MPR4 x x x
UD5 x x x 1 GPC/PC = ratio of glycerophosphocholine to phosphocholine, 2GPC =
glycerophosphocholine, 3PC = phosphocholine, 4MPR = milk performance records (milk
yield, fat and protein percentage), 5UD = principle component scores of genotypes, p.p. =
postpartum
76
Model validation
Taking into account the small sample size, we applied a five-fold cross validation with 100
individual repetitions (Kohavi, 1995) to properly assess the predictive ability of the different
models within and between consecutive Weeks. Cross-validation for prediction models was
performed by randomly dividing data sets into five subsets of predictor combinations and
corresponding BHBA status of individual animals, having no missing values for the desired
combination of information. One subset (testing set) was left out to evaluate the predictive
ability of the model, whereas the other four subsets were used as training set to configure the
ANN. This scheme was rotated, so that each subset served as testing set once. In each of the
five rounds the BHBA concentrations of the testing set were set to missing and Pearson’s
correlation between observed and predicted response were calculated. One hundred different
randomizations assigning the different combination of predictors and corresponding BHBA
status to five different subsets were used, yielding 500 independent cross-validation runs per
model. The predictive ability of each model was then calculated as Pearson’s correlation (r)
between the observed and predicted BHBA concentration averaged over all 500 cross-
validation runs. Further, a multiple contrast test (Tukey's Honest Significant Difference test
(HSD)) was applied to cross-validation results, to indicate possible significant differences
between single models (note that in the following results from HSD test are not completely
shown). The HSD was applied simultaneously to the set of all pairwise comparisons and
identifies differences between two means that are greater than the expected standard error.
Results and Discussion
Prediction of subclinical ketosis risk within Weeks
Table 3 shows the average Pearson’s correlation coefficient between observed and predicted
BHBA concentrations of 500 independent cross-validation runs for the different models to
predict yet-to-be observed concentrations of BHBA within the same Week.
77
Table 3. Results of predictions at the same day, within Weeks. Shown are the average
Person’s correlation coefficients between observed and predicted BHBA concentrations in
milk over all 500 CV runs (the variance of estimates is given in brackets).
Prediction model1 Week 1
p.p.
Week 2
p.p.
Week 3
p.p.
Week 4
p.p.
Week 5
p.p.
Average2
GPC/PC -0.027 0.005 0.278 0.140 0.193 0.118
(0.073) (0.037) (0.015) (0.017) (0.027)
GPC+PC -0.127 0.085 0.545 0.322 0.255 0.216
(0.066) (0.040) (0.065) (0.129) (0.050)
MPR 0.489 0.393 0.335 0.542 0.463 0.444
(0.064) (0.027) (0.023) (0.040) (0.037)
UD 0.453 -0.011 0.095 0.198 0.043 0.156
(0.060) (0.041) (0.024) (0.031) (0.035)
GPC/PC+UD 0.439 -0.046 0.108 0.212 0.088 0.160
(0.061) (0.036) (0.026) (0.030) (0.034)
GPC+PC+UD 0.408 0.032 0.259 0.274 0.170 0.229
(0.067) (0.035) (0.026) (0.040) (0.035)
MPR+UD 0.528 0.135 0.278 0.484 0.349 0.355
(0.059) (0.033) (0.020) (0.020) (0.033)
MPR+GPC+PC 0.439 0.212 0.519 0.643 0.371 0.437
(0.047) (0.045) (0.048) (0.068) (0.067)
MPR+GPC+PC+UD 0.526 0.186 0.341 0.510 0.368 0.386
(0.044) (0.030) (0.019) (0.026) (0.027) 1 GPC/PC = ratio of glycerophosphocholine to phosphocholine, GPC =
glycerophosphocholine, PC = phosphocholine, MPR = milk performance records (milk yield,
fat and protein percentage), UD = principle component scores of genotypes, p.p. = postpartum 2 Averaged over all five Weeks
Generally, due to the small number of animals with full records for all available sources of
information, variance in calculated correlations between observed and predicted phenotype in
CV runs were high (Table 3). Further, the very limited number of clinical ketosis cases with
very high BHBA levels (Figure 1) might in particular contribute to the high variance in
correlations obtained from CV runs. Depending on the distribution of the animals with
outstandingly high BHBA levels between training and testing set, predictive performance can
become poor for the upper concentration range. Using a data set with more clinical cases or
restricting the data set to a certain BHBA range, which was precluded by the small sample
size here, might well result in a much better performance.
78
The results indicate that prediction of SCK risk works sufficiently well in Weeks 1, 3, 4 and 5
postpartum. On average and considering all implemented models, Pearson’s correlations of
0.348, 0.306, 0.369, and 0.256, respectively, could be obtained for respective Weeks in milk.
In comparison, predictive performance was, for unknown reasons, worse in Week 2
postpartum (correlation averaged over all models r = 0.110). Against the background of the
small sample size, however, it is possible that animals with particular patterns of predictor
values might have biased predictions. Overall, the predictive performance of ANNs within
Weeks varied strongly with the model used as input to the network, i.e. the different
combination of predictors.
Impact of metabolic information
As shown in Table 3, the prediction performance was, irrespectively of the inclusion of
genetic information, significantly better when using the absolute abundances of GPC and PC
rather than using their ratio (GPC/PC) as input to the network. The only exception is lactation
Week 1, where the ratio combined with the UD matrix achieved an average predictive
performance of r = 0.439, whereas the single metabolites combined with the genetic
information resulted in a performance of r = 0.408. However, evidence is limited, because the
difference was not significant (HSD). These results lead to the assumption that the
relationship between metabolites is not linear, thus, the ANN is able to capture the
relationship better through separate information of single metabolites than through a linear
combination. However, when metabolic information represented the only source of input
information, predictions failed for the first two Weeks postpartum. This effect might be due to
the fast changes in metabolic constitutions of animals in the first Weeks of lactation. In
Week1, the prediction performance increased significantly when genetic information was
added to the metabolic models. Combinations of metabolic and genetic information as
predictors achieved no significant improvement, rather than a decline, in predictive ability in
all other Weeks. These results indicate that genetic information of 50K SNP-panels has low
informative character as predictor of BHBA status. This might, however, also be due to
statistical limitations, as the inclusion of genomic information in predictions increases the
number of parameters to be estimated and, thus, increases model complexity.
79
Overall, the highest prediction performances and lowest variances of cross-validation runs
were obtained with the use of both milk performance and metabolite data. This effect is
independent of the data set (Weeks postpartum) used for the analyses.
Impact of genetic information
Predictive ability of the genetic covariates as the only information to the networks was best in
the first Week and decreased with increasing number of Weeks postpartum, which might
indicate a decrease of heritability during lactation. This is in line with Oikonomou et al.
(2008), who reported a heritability estimates for serum BHBA concentrations to drop from
0.40 in the first lactation Week to 0.08 in Week seven. To analyze the trajectory of heritability
throughout lactation, random regression models can be used, which is hardly possible in the
current study owing to the limited amount of data. Moreover, it is known from literature that
the use of random regression for disease data often results in erratic estimates and
computational difficulties, especially when disease frequencies are low (Carlen et al., 2009).
Our results show that the benefit of genetic information to increase predictive ability of the
prediction model is low. The use of the genomic covariates led to a significant increase of
predictive performance of models only for Week 1, but for all other Weeks this was not true.
Generally, adding genomic information to the prediction models led to a significantly higher
predictive performance in 20% of the models. However, 60% of the models performed
significantly worse than the reduced model without genomic information. The results
therefore indicate that the exclusive use of genetic covariate information fails in the prediction
of SCK risk. This is in concordance with the low heritability of ketosis and BHBA
concentrations reported in literature (e.g. Gaddis et al., 2014 [h²=0.04-0.14 for clinical
ketosis], Heringstad et al., 2005 [h²=0.13-0.15 for clinical ketosis], Van der Drift et al., 2012b
[h²=0.16 and 0.17 for milk and serum BHB, respectively]). Furthermore, we suppose that a
higher number of animals would have been needed to make SNP-based predictions valuable.
And finally, there might be an effect of static distribution of genotypes, as they are not able to
capture the changes of BHBA concentrations of animals within Weeks. Furthermore, using
the matrix might have somewhat blurred genetic information, as single information got lost
upon transforming genotype information into principle component scores. Nevertheless,
transformation was needed to make regression with low sample size feasible.
80
Impact of milk performance records
The use of Weekly milk performance records (milk yield, protein and fat percentage) as input
information to the prediction model consistently led to high prediction performance of BHBA
status of cows in early lactation and, thus, to a good mapping of SCK risk. This was
irrespective of the Week that was studied. As mentioned before, the high negative energy
balance, which is determined by milk yield and can be described by milk composition in early
lactation, is one of the most critical precursors for metabolic disorders (e.g. Hamann and
Krömker 1997). Indeed, the milk related predictors showed an appropriate capability to
predict the BHBA status of cows properly within Weeks. Furthermore, a combination of milk
performance records with metabolic information seems to give a stable and well performing
predictive system.
Additionally, it would have been interesting to include also DMI as another variable with a
substantial effect on the energy status of a cow, to the predictive model, but due to technical
issues this information was not available to us. However, under field conditions milk
performance records such as milk yield, protein and fat percentage are routinely available,
whereas this is not the case for DMI.
Prediction of subclinical ketosis risk across Weeks
In the second part of the study we investigated the forecast of BHBA status. Table 4 shows
the impact of different prediction models on the performance of ANNs to predict
concentration of BHBA across consecutive Weeks. The average Pearson’s correlation
coefficient between observed and predicted BHBA concentration of 500 independent cross-
validation runs is shown. Predictions were achieved with combinations of information
originating from one Week, aiming to predict the BHBA concentrations of the following
Week for individual cows. Due to data structure, the number of animals included in the
different datasets was slightly lower than in the previous analyses. Therefore, variances of
calculated correlations in the CV runs were higher than those obtained in runs within Weeks
(Tables 3 and 4). Increasing the number of independent CV runs (>500) did not decrease
variation of correlations between observed and predicted phenotype. To improve this aspect, a
larger number of animals would be needed in single models, which were not available to us.
Additionally, based on the poor performance in the previous predictions, the GPC to PC ratio
was dropped from the analyses (Table 2).
81
Table 4. Results of predictions across consecutive Weeks. Shown are the average Person’s
correlation coefficients between observed and predicted BHBA concentrations in milk over
all 500 CV runs (the variance of estimates is given in brackets).
Prediction model1 Week 1 to 2
p.p.
Week 2 to 3
p.p.
Week 3 to 4
p.p.
Week 4 to 5
p.p.
Average2
GPC+PC -0.146 0.198 0.324 0.171 0.137
(0.140) (0.077) (0.094) (0.051)
MPR 0.053 0.285 0.262 0.285 0.221
(0.132) (0.063) (0.032) (0.087)
UD -0.144 -0.105 0.183 0.093 0.007
(0.141) (0.040) (0.066) (0.051)
GPC+PC+UD -0.133 -0.045 0.289 0.099 0.053
(0.140) (0.046) (0.048) (0.062)
MPR+UD -0.063 -0.035 0.253 0.310 0.116
(0.159) (0.038) (0.046) (0.058)
MPR+GPC+PC -0.020 0.046 0.369 0.133 0.132
(0.151) (0.063) (0.104) (0.061)
MPR+GPC+PC+UD -0.119 -0.011 0.270 0.294 0.109
(0.160) (0.041) (0.051) (0.055) 1GPC = glycerophosphocholine, PC = phosphocholine, MPR = milk performance records
(milk yield, fat and protein percentage), UD = principle component scores of genotypes, p.p.
= postpartum, 2 Averaged over all Weeks
Results indicate that the prediction of BHBA status across consecutive Weeks is generally
applicable from Week 3 to 4 and Week 4 to 5. Averaged over the seven implemented
predictive models a Pearson’s correlation between observed and predicted BHBA
concentrations of 0.279 and 0.198 could be obtained in the data sets, respectively. In
comparison, predictive performance was worse from Week 2 to 3 (over all models averaged
correlation r = 0.048) and predictions completely failed from the first to the second Week
postpartum (over all models averaged correlation r = -0.081). In the latter data set, no
correlations between predicted and true value could be achieved. Due to the rapid changes in
metabolism at the beginning of lactation, milk-related and metabolic measurements can vary
strongly in the first Weeks in milk. Our assumption that some factors are mapping the
concentration of BHBA across consecutive Weeks might not be appropriate in the first two
Weeks postpartum. Thus, a dynamic prediction model e.g. time-series analyses might be more
reasonable than a static one. However, this also could be biased through low effective sample
size as well as the limited number of animals with high milk BHBA concentrations.
82
Nevertheless, the results show that the forecast of SCK risk based on data from the previous
Week is possible with a reasonable quality from Week 2 to 5 postpartum. The best
combination of predictors is, however, varying between Weeks (Table 4). Results indicate no
significant advantage of milk performance records over metabolic information as information
sources. However, when milk performance records (milk yield, protein and fat percentage)
were provided as input to the ANN, more consistent results across different data sets were
achieved. Unfortunately, the use of milk performance records in conjunction with milk
concentrations of GPC and PC, both recorded one Week before, did not result in significantly
better prediction performances of the network. Moreover, the performance of using genomic
covariates as single models or in combination with other information was also low for Weeks
1 to 2 and 2 to 3, respectively, whereas adding genetic information to prediction models for
Weeks 3 to 4 and 4 to 5, respectively, seemed to achieve more consistent prediction
performances than single models.
Conclusions
The performance of incorporating metabolic, genetic, and milk performance data alone and in
various combinations was compared in predicting BHBA concentrations by means of ANNs
in individual dairy cows within and across consecutive Weeks postpartum. Predictive abilities
differed across models. Overall, metabolic and performance based models had a higher
predictive ability than models based on genetic information. However, further studies using
more comprehensive data sets have to be awaited to confirm these promising results.
Furthermore, there might be metabolites or metabolite profiles that possess a higher predictive
ability than those used in the current study.
83
Author’s contributions
AE conceived the study, performed all computations and drafted the manuscript. DH provided
the C++ code and supervised the computations. JT and NK participated in the conception of
the study and revised the manuscript. MK and WG revised the manuscript. GT supervised the
study and revised the manuscript
Acknowledgements
This research was supported by the German Federal Ministry of Education and Research
(BMBF) in the context of the Fugato-plus MeGA-M framework (grant no. 0315131B) and the
AgroClustEr “Synbreed – Synergistic plant and animal breeding” (grant no. 0315528F). The
genotyping of the cows was financially supported by the German Federal Ministry of
Education and Research (project FUGATO-plus GENOTRACK, grant no. 0315134A), the
KMSH (Kompetenzzentrum Milch – Schleswig-Holstein, Kiel, Germany), and the NOG
Nord-Ost Genetic GmbH & Co. KG (Verden, Germany). The authors would like to thank P.
Oefner for a thorough review of the manuscript and many helpful suggestions.
References
Andersson, L. 1988. Subclinical ketosis in dairy cows. The Veterinary clinics of North
America. Food Anim. Prac. 4:233-251.
Asl, A. N., S. Nazifi, A. R. Ghasrodashti, and A. Olyaee. 2011. Prevalence of subclinical
ketosis in dairy cattle in the Southwestern Iran and detection of cutoff point for NEFA
and glucose concentrations for diagnosis of subclinical ketosis. Prev. Vet.
Med.100:38-43.
Buitenhuis, A. J., U. K. Sundekilde, N. A. Poulsen, H. C. Bertram, L. B. Larsen, and P.
Sørensen. 2013. Estimation of genetic parameters and detection of quantitative trait
loci for metabolites in Danish Holstein milk. J. Dairy Sci. 96:3285-3295.
Buttchereit, N.; Stamer, E.; Junge, W. and G. Thaller. 2010. Relationship of energy balance
and fat protein ratio of milk to disease liability in dairy cattle. In Proc. W.C.G.A.L.P.
2010: 9, 315.
Carlen, M., K. Meletis, C. Goritz, V. Darsalia, E. Evergren, K. Tanigaki, M. Amendola, F.
Barnabe-Heider, M. S. Y. Yeung, L. Naldini, T. Honjo, Z. Kokaia, O. Shupliakov, R.
M. Cassidy, O. Lindvall, and J. Frisen. 2009. Forebrain ependymal cells are Notch-
84
dependent and generate neuroblasts and astrocytes after stroke. Nat Neurosci. 12:259-
267.
Collard, B. L., P. J. Boettcher, J. C. M. Dekkers, D. Petitclerc, and L. R. Schaeffer. 2000.
Relationships between energy balance and health traits of dairy cattle in early
lactation. J. Dairy Sci. 83:2683-2690.
Dohoo, I. R. and S. W. Martin. 1984. Subclinical ketosis: prevalence and associations with
production and disease. Can. J. Comp. Med. 48:1-5.
Duffield, T. F., D. F. Kelton, K. E. Leslie, K. D. Lissemore, and J. H. Lumsden. 1997. Use of
test day milk fat and milk protein to detect subclinical ketosis in dairy cattle in
Ontario. Can. Vet. J. 38:713-718.
Duffield, T. F., K. D. Lissemore, B. W. McBride, and K. E. Leslie. 2009. Impact of
hyperketonemia in early lactation dairy cows on health and production. J. Dairy Sci.
92:571-580.
Enjalbert, F., M. C. Nicot, C. Bayourthe, and R. Moncoulon. 2001. Ketone bodies in milk and
blood of dairy cows: relationship between concentrations and utilization for detection
of subclinical ketosis. J. Dairy Sci. 84:583-589.
Gaddis, K. L. P., J. B. Cole, J. S. Clay, and C. Maltecca. 2014. Genomic selection for
producer-recorded health event data in US dairy cattle. J. Dairy Sci. (in press)
Geishauser, T., K. Leslie, J. Tenhag, and A. Bashiri. 2000. Evaluation of eight cow-side
ketone tests in milk for detection of subclinical ketosis in dairy cows. J. Dairy Sci.
83:296-29.
Gianola D, Okut H, Weigel KA, Rosa GJ. 2011. Predicting complex quantitative traits with
Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genet
12:87.
Hamann, J. and V. Krömker. 1997. Potential of specific milk composition variables for cow
health management. Livest. Prod. Sci. 48:201-208.
Heringstad, B., Y. M. Chang, D. Gianola, and G. Klemetsdal. 2005. Genetic analysis of
clinical mastitis, milk fever, ketosis, and retained placenta in three lactations of
Norwegian Red cows. J. Dairy Sci. 88:3273–3281.
Hornik, K., M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are
universal approximators. Neural Networks 2:359-366.
85
Howie B., J. Marchini, and M. Stephens. 2011. Genotype imputation with thousands of
genomes. G3 1:457–470.
Klein M.S., M.F. Almstetter, N. Nürnberger, G. Sigl, W. Gronwald, S. Wiedemann, K.
Dettmer , and P.J. Oefner. 2013. Correlations between Milk and Plasma Levels of
Amino and Carboxylic Acids in Dairy Cows. J. Prot. Res. 12:5223-5232.
Klein, M. S., N. Buttchereit, S. P. Miemczyk, A.-K. Immervoll, C. Louis, S. Wiedemann, W.
Junge, G. Thaller, P. J. Oefner, and W. Gronwald. 2012. NMR metabolomic analysis
of dairy cows reveals milk glycerophosphocholine to phosphocholine ratio as
prognostic biomarker for risk of ketosis. J. Prot. Res. 11:1373-1381.
Kohavi, R. 1995. A study of cross-validation and bootstrap for accuracy estimation and model
selection. Pages 1137-1145 in the International Joint Conference on Artificial
Intelligence (IJCAI).
Kriesel D. 2007. A brief introduction to neural networks. Available at
http://www.dkriesel.com
LeBlanc, S. 2010. Monitoring metabolic health of dairy cattle in the transition period. J.
Reprod. Dev. 56:S29–S35.
Li, Y., C. J. Willer, J. Ding, P. Scheet, and G. R. Abecasis. 2010. MaCH: using sequence and
genotype data to estimate haplotypes and unobserved genotypes. Gen. Epidem.
34:816-834.
Mandel, J. 1982. Use of the Singular Value Decomposition in Regression Analysis. Am. Stat.
36:15-24.
Melzer, N., D. R. Wittenburg, and D. Repsilber. 2013. Integrating milk metabolite profile
information for the prediction of traditional milk traits based on SNP information for
Holstein Cows. PLoS ONE 8:e70256.
Oikonomou, G., G. E. Valergakis, G. Arsenos, N. Roubies, and G. Banos. 2008. Genetic
profile of body energy and blood metabolic traits across lactation in primiparous
Holstein cows. J. Dairy Sci. 91:2814–2822.
Pérez-Rodríguez, P., D. Gianola, K. A. Weigel, G. J. M. Rosa, and J. Crossa. 2013. Technical
Note: An R package for fitting Bayesian regularized neural networks with applications
in animal breeding. J. Anim Sci. 91:3522-3531.
86
R Development Core Team. 2010. R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing, Vienna, Austria. URL
http://www.R-project.org. R version 2.14.0.
Seifi, H. A., S. J. LeBlanc, K. E. Leslie, and T. F. Duffield. 2011. Metabolic predictors of
post-partum disease and culling risk in dairy cattle. Vet. J. 188:216–220.
Suthar, V. S., J. Canelas-Raposo, A. Deniz, and W. Heuwieser. 2013. Prevalence of
subclinical ketosis and relationships with postpartum diseases in European dairy cows.
J. Dairy Sci. 96:2925-2938.
Tetens, J., T. Seidenspinner, N. Buttchereit, and G. Thaller. 2013. Whole-genome association
study for energy balance and fat/protein ratio in German Holstein bull dams. Anim.
Genet. 44:1-8.
Tusell, L., P. Pérez-Rodríguez, S. Forni, X. L. Wu, and D. Gianola. 2013. Genome-enabled
methods for predicting litter size in pigs: a comparison. Animal 7:1739-1749.
Van der Drift, S.G.A., R. Jorritsma, J.T. Schoneville, H.M. Knijn, and J.A. Stegeman. 2012a.
Routine detection of hyperketonemia in dairy cows using Fourier transform infrared
spectroscopy analysis of ß-hydroxybutyrate and acetone in milk in combination with
test-day information. J. Dairy Sci. 95:4886-4898.
Van der Drift, S.G.A., K.J.E. van Hulzen, T.G. Teweldemedhn, R. Jorritsma, M. Nielen, and
H.C.M. van Heuven. 2012b. Genetic and nongenetic variation in plasma and milk ß-
hydroxybutyrate and milk acetone concentrations of early lactating dairy cows. J.
Dairy Sci. 95: 6785:6787.
Veerkamp, R. F. and E. P. C. Koenen. 1999. Genetics of food intake, live weight, condition
score and energy balance. Pages 63-73 in Occasional Publication British Society of
Animal Science, No. 24, Edinburgh
89
Extreme Learning Machine: A New Approach for Genomic Prediction of Complex Traits
Anita Ehret1, David Hochstuhl
2 and Georg Thaller
1
1 Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Germany
2 Institute for Theoretical Physics and Astrophysics, Christian-Albrechts-University, Germany
Published in Proceedings, 10th World Congress of Genetics Applied to Livestock Production
Abstract
A wide range of methods for predicting phenotypes based on genomic data has become
available. Increasingly, the focus is also being set to machine learning methodology. Despite
this progress, the prediction of complex traits from high density SNP-panels remains an
extremely demanding task, particularly in terms of the computational effort required in
calibration and prediction. We present a fast learning algorithm for artificial neural networks
which was introduced by Huang et al. in 2004. Our experimental results show that this
approach is able to achieve good generalization performance with much less computational
effort while outperforming the traditional gradient-based learning in artificial neural networks,
which is a great advantage when analyzing high dimensional data. We demonstrate the
capabilities of the new approach to genomic predictions in animal and plant breeding.
Keywords: Genomic selection, Extreme learning machine, Complex traits
90
Introduction
Today, in animal and plant breeding, non- and semi-parametric, as well as machine learning
methods, have become increasingly important in performing genome-enabled predictions of
complex traits. At the current times of high-density panels and sequence information,
prediction become challenging in terms of computational costs and efficiency. The number of
markers now vastly exceeds the number of records, with the effect that typical fitting
methods, such as least squares regression, often require some prior variable selection or
shrinkage estimation procedure. When using machine learning approaches, one is usually
facing a large number of parameters, non-linear training procedures and the problem of over-
fitting. Therefore, the computational costs are usually even larger than those of standard
parametric methods, which only require the solution of linear systems. A notorious example is
the back-fitting algorithm for the training of artificial neural networks, which must be
processed iteratively. However, the additional effort is usually compensated by an improved
generalization performance. To overcome this computational challenge, a fast learning
algorithm for single-hidden-layer feed-forward neural networks, called extreme learning
machine (ELM), was proposed by Huang and coworkers (Huang et al., 2004; Huang et al.,
2006).
Since its invention, the ELM has had a tremendous impact in the field of machine learning as
well as on its close relatives, such as data mining and pattern recognition. For a survey on its
various applications see Huang et al. (2011). The reason for its success is twofold. On one
hand, the underlying algorithms are almost trivial when compared to more sophisticated
methods such as support vector machines (SVM) or artificial neural networks trained by back-
propagation (to which we refer here as ANN). On the other hand, despite this simplicity,
several studies have revealed that its predictive performance is generally comparable to that of
the standard methods mentioned (SVM and ANN), with often diminished training times. In
this contribution, we employ the ELM to predict milk traits from dairy cattle data with
molecular marker information, which is to our knowledge the first application of this kind,
and compare the results to an ANN approach with the same input and output configuration.
Material and Methods
The theory of the Extreme Leaning Machine may be approached from two directions, starting
either from ANN, or from basis function expansion methods. As the neural network
perspective seems to be common in the literature, we consider the second alternative in the
brief summary that follows.
91
For this, assume we are given a data set (��, ��)���� of size N , where �� ∈ ℝ� denotes an
M-dimensional predictor and �� ∈ ℝ is the corresponding response variable. A common
method to describe the relationship between the predictors and the observations is to assume a
linear expansion into a number of K basis functions f�:R� → R , �(�|�; ß) = �(�; �) = ∑ ����(�)���� (1)
Given this model, the task is then to minimize the residual sum of squares of the data set,
min#∑ $�% − �'�%; ��, … , ��)*+%�� (2)
which, through variation with respect to β quickly leads to the usually over-determined (since
, ≤ .) linear system
/��(��) ⋯ ��(��)⋮ ⋱ ⋮��(�) ⋯ ��(�)3/��⋮��3 = /
��⋮�3 (3)
Denoting the components by F, β, and y, the least-squares solution of this equation is given
by # = 456 , (4)
where 45 stands for the Moore-Penrose pseudo-inverse (Golub and van Loan, 2012). Though
there are several methods to calculate this pseudo-inverse, the singular value decomposition is
usually a convenient choice due to its good stability properties (Press, 2007). Solving the
least-squares problem for the basis expansion model therefore requires a SVD of a matrix of
size KN × , which for NK ≤ scales as ( )2KNσ .
Note that the predictor dimension M has not entered the derivation so far, and will
eventually do so only in the function evaluation step for the construction of the coefficient
matrix. Thus, at least in principle, this allows for large dimensions of the predictor space to be
treated, as is the case in this work.
The crucial point in model (1) is the selection of an appropriate basis set, and several
statistical methods exist which differ only in this respect, such as linear regression, logistic
regression, spline interpolation, and many more. The particular choice leading to the ELM is
given by
��(�) = 7(∑ 8�9�9�9�� + ;�) (5)
92
Here, kmw and kb are randomly chosen real numbers (according to any random distribution),
whereas g denotes a function which in the ELM is usually taken to be the sigmoid function.
In combination with model (1), equation (5) allows for its interpretation as a single-hidden
layer artificial neural network with randomly chosen connections and biases. More precisely,
this network consists of M input neurons, K hidden neurons with sigmoid activation function
and a single output neuron with linear activation function. The kmw stands for the connection
strength between input neuron m and hidden neuron k, kb for the corresponding bias
(intercept), and the coefficients kß are the connection strengths between the thk − hidden
node and the output layer, as shown in Figure 1.
Figure 1 Architecture of a single-hidden layer artificial neural network with ELM approach
93
With the model being the same as a conventional neural network, the major difference that
makes up the ELM is its training. The standard method proceeds by optimizing all involved
parameters, i.e., kkm bw , and kß . Due to their non-trivial dependencies, the training needs to
be done layer by layer in a process called back-propagation. In contrast, the ELM randomly
fixes a large number of coefficients in advance, and leaves only the task to determine the
output layer connection strengths. Due to the linear activation function in the output node, this
can be accomplished by the least-squares solution outlined in Eqs. (2) to (4). Further, the
method presented can easily be extended to include a regularization parameter (via
modification of Eq. (2)) or to use kernels such as radial Gaussian functions (via modification
of (5)).
Benchmark model
To compare the predictive ability of the new algorithm with a standard network we used a
single-hidden layer artificial neural network with back-propagation algorithm as benchmark.
Here, the connections and biases were chosen optimally using a given training set. To avoid
poor generalization ability early stopping was implemented in training. Both models, ANN
and ELM, were set up with a sigmoid activation function in the hidden layer and a linear
function in the output layer. For the ANN, an architecture of ten hidden neurons achieved the
best predictive ability when dealing with the entire marker set.
Data
Genotypic and phenotypic information of 3,341 German Fleckvieh bulls were employed in
the machine. All animals were genotyped with a 50k SNP-panel and after quality control
39,344 SNP markers remained in the analyses. Quality control included the elimination of
SNPs with minor allele frequency < 0.05 and missing genotype frequency > 0.95. For the
remaining loci, missing genotypes were imputed using the population based imputing
algorithm Minimac (Howie et al., 2012), and haplotypes were inferred using MaCH (Li et al.,
2010).
DYD of three milk traits (milk yield, fat yield and protein yield) were used as phenotype
records in the analysis. To account for ANN and ELM peculiarities feature scaling was
applied to both phenotypic and genomic data, so the data was normalized to the [-1,1] range,
to enhance numerical stability.
To assess the predictive performance of the ELM and of the standard ANN, a 10 fold cross-
validation scheme was used. The dataset was randomly split into ten equal folds of
94
phenotypes and genotypes. Then, nine folds were used for training and one for testing. This
was rotated ten times so that every fold served as test set once. The average correlations
between predicted and true phenotype in the testing sets of all runs then were used to evaluate
predictive ability.
Results and Discussion
The results represent our first and recent application of the ELM and serve as an illustration of
the advantages of this promising approach. Results are given in Table 1, showing the
correlation obtained in the cross-validation runs for a ELM with 1000 hidden nodes (the
number of hidden nodes needs to be much larger than for the ANN, in order to compensate
for the random initialization).
Table 1 Estimated predictive abilities for three milk traits
Trait ELM ANN
r r
Milk yield 0.468 0.469
Fat yield 0.453 0.453
Protein yield 0.450 0.477 r = average Person correlation of cross-validation runs, ELM = Extreme learning machine with 1000
hidden neurons, ANN = Artificial neural network with 10 hidden neurons and back-propagation
The ELM was able to give predictions with accuracy well comparable to that of the ANN
approach. The crucial point here is that ELM needed significantly less computation time. On
average, for a ten-fold cross-validation, the ELM required 1.3 hours, whereas the ANN with
ten neurons in the hidden layer needs 11.2 hours for the same task. Another advantage is that,
with increasing numbers of hidden neurons, the prediction performance of the ELM was
prone to increase as well. As an example, we applied the cross-validation procedure to an
ELM with 10000 hidden neurons for the trait protein yield. This resulted in a largely
increased average correlation coefficient of 0.618 between true and predicted phenotype
(computation time for a ten-fold cross-validation: 26.6 hours). As shown in our recent work
(Ehret et al. to be published), it is much more difficult to tune the standard ANN to such
levels of prediction accuracy, when the entire genotype matrix is used for predictions.
95
Conclusion
With a low number of neurons the ELM matches the performance of the ANN in predicting
milk traits from genomic information, while using the whole marker information but
drastically reducing the computational cost. Increasing the number of hidden neurons
produced promising results, as shown for the trait protein yield. Although the study was based
on limited data, our results suggest that the ELM is likely to be useful for predicting complex
traits using high-dimensional genomic information, a situation where the number of
coefficients that need to be estimated exceeds sample size. The ELM, in the same way as
SVM or ANN, has the ability of capturing nonlinearities, but its great advantage is in keeping
the computational cost at a reasonable level. The predictive ability seemed to be enhanced by
using a larger number of neurons in the ELM, which is in contrast to the standard ANN
approach. However, further studies are needed to confirm these results and explore the
capabilities of the ELM.
In summary, the ELM is a promising method for prediction of future phenotypes from high
dimensional genomic data sets, as has been suggested by the presented study. Hence, a
detailed investigation of its capabilities will be an important component in our future work.
96
Author’s contributions
AE and DH conceived the study and wrote the manuscript. AE performed computations. DH
provides the computation program. GT supervised the study and revised the manuscript.
Acknowledgements
This research was supported by the German Federal Ministry of Education and Research
(BMBF) within the AgroClustEr “Synbreed – Synergistic plant and animal breeding”.
References
Golub, G. H. and C. F. Van Loan. 2013. Matrix computations (Vol. 4). Johns Hopkins
University Press.
Huang, G.-B., Q.-Y. Zhu, and C.-K. Siew. 2004. Extreme learning machine: a new learning
scheme of feedforward neural networks. Proceedings 2004 IEEE International Joint
Conference on. Pages 985-990 vol.982
Huang, G.-B., Q.-Y. Zhu, and C.-K. Siew. 2006. Extreme learning machine: Theory and
applications. Neurocomputing 70:489-501.
Huang, G.-B., D. Wang, and Y. Lan. 2011. Extreme learning machines: a survey.
International Journal of Machine Learning and Cybernetics 2:107-122.
Howie, B., C. Fuchsberger, M. Stephens, J. Marchini, and G. R. Abecasis. 2012. Fast and
accurate genotype imputation in genome-wide association studies through pre-
phasing. Nat Genet 44:955-959.
Li, Y., C. J. Willer, J. Ding, P. Scheet, and G. R. Abecasis. 2010. MaCH: using sequence and
genotype data to estimate haplotypes and unobserved genotypes. Genetic
Epidemiology 34:816-834.
Press, W. H. 2007. Numerical recipes 3rd edition: The art of scientific computing. Cambridge
university press.
99
General Discussion
The present thesis comprises investigations on machine learning, with a focus on artificial
neural networks (ANNs), at the interface between statistics, breeding and genetics, to enhance
accuracy of genome-enabled prediction of complex traits. The purpose was to provide an
overview of published research in this area, insights on ANN modelling issues, applications to
real data problems in cattle and future research directions in the field of animal breeding. In
this Chapter the main findings and additional topics are elucidated and discussed separately.
Firstly the main properties of ANNs regarding genome-enabled predictions are discussed and
the method is ranked in the wide area of common methods used for this purposes. Secondly,
the advantages and possible pitfalls in the construction of ANNs for genome-enabled
prediction of complex traits using large-scale marker data are evaluated. Thirdly, the impact
of the genetic architecture of traits and data representation on the performance of ANNs in
empirical applications to genome-enabled prediction on real data problems is specified.
Finally, the potential and the limitations of using ANNs for genome-enabled prediction of
complex traits on large-scale genomic data are detected and further developments and
prospects for genome-enabled prediction using ANNs are propounded.
Properties of ANNs in genome-enabled prediction of complex traits in animal breeding
Today, several methods exist that allow genome-enabled prediction of complex traits with
reasonable accuracy for yet-to-be-observed phenotypes. Many statistical and mathematical
methods can be adapted for the purpose of prediction. Traditionally, model-based methods
(linear or other parametric models) are used to estimate genetic values of individuals using
genomic data. Here, the basis functions used to regress phenotypes on markers are assigned to
a small class of particular functions a priori, which restricts the type of genetic patterns
(genetic architecture) that these models can capture from data (de los Campos et al., 2013).
The genetic architecture of a complex trait is described by the number of loci affecting trait
variation, mode of inheritance, and distribution of allelic effects (Hayes et al., 2010). Hence,
this may include a variety of genetic interactions which are not necessarily linear, which is
supposed to influence prediction accuracy of yet-to-be-observed trait values. Usually, the
highest accuracy is achieved with a model which best fits the genetic architecture of the trait
under investigation. The formulation of a non-linear parametric model to a particular data set
is a very demanding task, since there are a huge number of possible nonlinear patterns. It
would be difficult to choose suitable non-linear basis functions, and modelling all patterns
simultaneously is hardly feasible. Therefore, a pre-specified non-linear model may not be
100
general enough or misspecified to capture all important features of data samples. Contrary to
model-based parametric methods, ANNs are data-driven self-adaptive methods. There is no
need for assumptions about models being studied. ANNs learn from data examples and
capture functional relationships among data even if the underlying relationship is unknown or
hard to describe. ANNs can be considered as multivariate non-linear, non-parametric models
(Chen and Titterington, 1994). This means, that there is a wide class of non-linear basis
functions (used to regress the phenotype) available to be inferred from data by a supervised
learning algorithm (e.g. back-propagation algorithm). Here the basis functions are not fixed a
priori and their particular shape is inferred during learning from data. This in turn leads the
network to a higher flexibility than standard parametric models. Hence, ANNs are well suited
for problems, where a solution is difficult to specify, but enough data is available (Zhang et
al., 1998). Additionally, ANNs are very flexible machines with regard to their application
ability and to extensions of the basic concept. However, many of the commonly used methods
in animal breeding can be presented by an analogue specified ANN. Table 1 shows a
comparison of important properties of ANNs in comparison to other methods used for
genome-enabled predictions in animal breeding.
101
Table 1 Comparison of methods used for genome-enabled predictions on marker data (modified
according to de los Campos et al., 2013)
Method linear*
non-
linear
penalize-
able
parametric non-
parametric
Bayesian ridge regression1 x x x Ridge regression BLUP2/ Genomic BLUP 3 x x x BayesA4 x x BayesB5 x x BayesC6 x x Bayesian LASSO 7 x x LASSO8 x x x Reproducing kernel hilbert spaces 9 x x x x Support vector regression 10 x x x x Random forest11 x x x Elastic net12 x x x Partial least squares 13 x x x semi Principal component regression14 x x x semi Artificial neural networks15 x x x x * Model is linear on parameters (SNP effects) 1 Meuwissen et al. 2001; 2 Moser et al. 2009, Meuwissen et al. 2001, Piepho et al. 2009 3; VanRaden et
al. 2008; 4 Meuwissen et al. 2001; 5 Meuwissen et al. 2001; 6 Habier et al. 2011; 7 Park and Casella
2008, de los Campos et al. 2009, Legarra et al. 2011; 8 Usai et al.2009; 9 Gianola et al. 2006; 10 Vapnik
1999, Moser et al. 2009; 11 Breiman et al. 2001, Gonzalez-Recio and Forni 2011; 12 Croiseau et al.
2011; 13 Moser et al. 2009; 14 Solberg et al. 2009; 15 Gianola et al. 2011, Okut et al. 2011;
Construction of specified ANNs for genome-enabled predictions
Despite the encouraging properties of ANNs, building a neural network predictor for whole-
genome regression on large-scale genomic data is not a trivial task. The field of possible
applications of ANNs is huge and there are many different ways to construct and implement
neural networks for prediction problems. The selection of an appropriate ANN model using
large-scale genomic data in animal breeding is often a question of available computing
resources and infrastructure. This thesis focuses on multilayer feed-forward networks (see
Chapter 3 and 4) and a modification (Extreme learning machine, see Chapter 5), since large-
scaled data often need too large computational effort (i.e. computational time) if alternative
network types, e.g. Bayesian Regularized Networks or Radial Basis Function Networks, are
used.
There are no general recommendations available to ensure good predictive ability of ANNs
and the optimal settings of an ANN always depend on the specific task. Hence, fitting ANNs
to particular data sets usually results in a trial and error approach, in which different training
settings are tested and the best fit is chosen afterwards to learn the intrinsic structure of the
102
present data. However, there is no guarantee of finding the optimal configuration. One critical
task is to determine an appropriate network architecture (number of hidden layers and number
of hidden neurons per layer) to establish a well performing ANN predictor. Several authors
have shown that one hidden layer in a feed-forward network is sufficient for ANNs to
approximate any non-linear function with any desired accuracy (Hornik et al., 1989), given
that the number of neurons is sufficiently large. The drawback is a detrimental effect on
generalization ability which occurs likely due to over-fitting when the number of hidden
neurons is increased. However, in preliminary studies it was empirically found that higher
number of hidden layers only increased computation time, while no significant improvement
in predictive performance of the ANN could be achieved.
The number of hidden neurons determines the networks flexibility by approximating arbitrary
functions. Unfortunately, there is no simple method for selecting this parameter from theory.
The empirical results in this thesis show that a reasonable architecture of an ANN for
genome-enabled prediction purposes particularly depends on the dimensionality of genotypes.
The number of inputs (i.e. genotypes) in a completely linked ANN is directly related to model
complexity. Considering the findings presented in Chapter 3, the number of input nodes is
probably the most critical variable to be decided in regression-based prediction problems,
since it contains the important information about the complex correlation structure in data and
is directly related to model selection tasks. Therefore, an understanding of network
configurations and data representations is essential for making efficient use of ANN’s
features, as well as it can also be the reason for ANNs to fail.
Furthermore, the choice of an appropriate neural network training algorithm is also important
for the successful implementation of ANNs to genome-enabled prediction. Usually, training is
an unconstrained nonlinear optimization problem, and there is no algorithm available that
guarantees the global optimum solution for this kind of problem in a reasonable amount of
time (Zhang et al., 1998). The findings reported in Chapter 3 suggest that back-propagation is
an appropriate learning algorithm of ANNs for genome-enabled prediction on large scale
SNP-data sets, because it provides reasonable solutions in an appropriate amount of time.
In summary, there are a variety of possible configurations and misconfigurations of ANNs
with a huge impact on success of application. The extent between potential and failure of
ANNs used for genome enabled prediction depends to a large extent on the knowledge and
experience of the user as well as on data quality and computational aspects.
103
Impact of genetic architecture of traits and data representation on performance of
ANNs in genome-enabled prediction on empirical data
There is general agreement in the scientific community of animal breeders that data structure
mainly influences prediction performance (4th International Conference of Quantitative
Genetics: Understanding Variation in Complex Traits, 2012) Data structure in genome-
enabled predictions includes different aspects, like sample size, sample selection (e.g.
relatedness between individuals), type of phenotypes (e.g. complex traits), genetic
architecture of traits under investigation, type of genetic information (e.g. type and number of
markers), quality of measurements, as well as form of representation (e.g. pre-selection of
variables, compression methods) and pre-corrections (e.g. use of linear models to form the
phenotype, quality controls of marker data). Most of the traits studied in cattle breeding have
a quantitative genetic background which means that the observed phenotypes are measured on
a continuous scale and the observed genetic variance is caused by many genes. It is assumed
that e.g. the phenotype of an individual is generated from linear processes, as a summation of
thousands of various gene variants, in which each of the contributions to the phenotypic
variant is assumed to be small. However, in the last few years, a paradigm shift becomes
apparent. Due to the discovery of the effect of "missing heritability", researchers in animal
and plant breeding as well as in the field of human genetics revealed that phenotypic variance
in traits under investigation maybe is affected by various inter and intra genetic as well as
epigenetic effects, and therefore maybe is of more complex nature than just a linear
approximation of genotype effects. A problem is that the underlying rules which create a
phenotype in most cases are not evident. This is especially true in complex traits, since
observations of phenotypes and genotypes are additionally masked e.g. by environmental
noise. Thus using an ANN approach is a good choice but may lead to over-fitting, when
ANNs are not properly specified for the given mapping problem. The findings in the
empirical application (see Chapter 3) using several data sets could not confirm the theoretical
superiority of ANNs over classical linear approaches. In detail, the expectations for the
application capability of ANNs were not met when non-linear ANN models were used for
prediction of milk traits in cattle breeding. Maybe a mismatch of assumed genomic
architecture was the limiting factor for the poor performance of inferring milk traits from
genomic data via complex non-linear ANN models. The great flexibility of ANNs to map any
non-linear continuous function comes along with a high number of parameters to be estimated
and the tendency to over-fit linear problems. This is related to the so called “no free lunch
theorem” (Wolpert et al., 1997) which affects commonly all machine learning algorithms. The
104
fundamental statement of this theorem is that no machine is learning "for free" just by looking
at training samples. With respect to ANNs that means, some network models may have such
strong biases that they can only approximate certain kinds of functions. For instance, a
powerful non-linear network architecture is biased to learn only complex non-linear functions.
This might result in a better performance compared to a simple linear architecture if the
problem is linear, but can also be a weakness in cases where the input is more complex. This
issue may finally end in a poor performance of non-linear ANNs in prediction of milk traits
on genomic data. Additionally, findings in this thesis reveal that the use of ANNs in
prediction of traits (see Chapter 4) for which the relationship of predictor variables is not
known but expected to be complex, as it was the case in Ketosis prediction (see Chapter 5),
seems to be a good advice.
Potential and Limitations of using ANNs for genome-enabled prediction of complex
traits on large-scale genomic data
Using ANNs in genome-enabled prediction in animal breeding are promising in terms of
flexibility and capability which regards to capture non-linear SNP-marker relationships
through data-driven learning. However, they also implicate some uncertainty, since
researchers do not have sound knowledge about the effect of key factors on prediction
performance of ANNs and data representation issues. The attractiveness of ANNs comes from
the remarkable information processing characteristics of the biological system such as non-
linearity, high parallelism, robustness, fault and failure tolerance, learning and their capability
to generalize (Jain et al., 1996). As addressed in Chapter 2 and extensively discussed above,
the great potential of this method used for genome-enabled prediction is, that they do not
require most of the prior assumptions that commonly underlie parametric statistical models,
because they can be non-linear in either features or on parameters. If nothing is known about
the underlying relationships between genotypes and phenotypes, ANNs might be a good
choice. If ANNs training is properly specified, they may be able to capture complex signals
from the data and result in a better predictive accuracy than a standard linear model used in
genome-enabled prediction.
Despite their great potential findings in the present thesis indicate two main limitations in
using ANNs for genome-enabled prediction. The first is called 'black box' behavior, the
second model complexity in conjunction with their tendency to over-fit training data.
105
'Black box' behavior
Per definition, the “black box” nature of artificial neural networks prevents systematic testing
of theories and communication of results into practical settings. With respect to animal
breeding, ANNs can mainly be used for predictions of e.g. future genetic or phenotypic values
rather than for association studies between e.g. marker genotypes and quantitative traits. If the
researcher is not merely interested in prediction but is aiming at inferring causal relationships
or interactions between genomic covariates and traits, the use of ANNs may is not be
appropriate. Typically, there is no direct way to interpret the parameters learned by a network,
since they cannot be fully translated into a meaningful set of rules of marker contributions to
quantitative traits. The learned weight distribution mimics the function between input and
output data dependent on a certain network constitution, rather than describing a causal
relationship between marker genotypes and phenotypes. In fact, ANNs do not allow
interpreting the physical process which formed the phenotype. As mentioned in Chapters 2
and 3, the only exception is the use of an ANN with one neuron in the hidden layer and a
linear activation function in both, hidden and output layer, as such a network presents a
multiple linear regression. Here, the obtained weights can be interpreted as usual regression
coefficients estimated by a linear regression of marker genotypes on the given phenotype. In
this sense, practical implementation of ANN is limited to cases where no biological
interpretation of the parameters is needed, such as breeding value estimation for selection
purposes (see Chapters 3, 4 and 5). Commonly, there is e.g. no interpretation of causal
relationships between different input factors on the risk of developing a subclinical ketosis
(see Chapter 4). However, ANNs are able to incorporate different information simultaneously
without making assumptions about their relationship a priori, while at the same time
providing reasonable prediction accuracy for yet-to-be-observed phenotypes of this
multifactorial trait (see Chapter 4).
Model complexity and over-fitting
Findings in this thesis revealed that in genome-enabled prediction on large scaled SNP
markers the high number of parameters in a strongly structured ANN may reduce its
predictive power compared to the one obtained with less parameterized prediction models.
Results show, that when the entire number of markers is used in predictions an ANN with
back-propagation fails to provide reasonable predictions of yet-to-be-observed phenotypes.
Unfortunately, when the input dimensions of the network increases markedly, as is the case
with increasing number of marker genotypes (input values) the number of parameters usually
106
grows quickly. Therefore, as long as the sample size is much lower than the number of
markers, ANN analyses might be limited to a certain number of inputs which may be far away
from what the high-dimensionality genomic data demands. In cases where the number of
input variables (e.g. marker genotypes) exceeds sample size, the use of generalization
approaches, like Bayesian methods, or adapted algorithms like “weight decay” (Kriesel, 2007)
are preferable. Otherwise, dimension-reduction or variable selection methods applied to
genotype data improve performance of ANNs in genome-enabled prediction. Further, the
employed extreme learning machine (ELM), a learning architecture for ANNs that replaces
the parameter intensive supervised learning algorithms (i.e. back-propagation algorithm)
revealed reasonable results (Chapter 5), without losing information through dimension-
reduction or variable selection methods. The ELM is a less parameterized predictive machine,
in which the weights of an ANN are not adjusted via learning rather than randomly generated.
This approach is also able to handle model-complexity and over-fitting and seems to be a
good way forward when the use of dimension-reduction or variable-selection methods should
be omitted, in terms of using full information of data.
In summary, ANNs offer a great potential in predictive ability but proper knowledge of
network peculiarities is required for a successful implementation in genome-enabled
prediction of complex traits. Model complexity and over-fitting are main reasons why, instead
of using an entire marker dataset, SNP selection, genome-derived relationships or other
dimensionality reduction techniques are used to produce ANNs with a reasonable predictive
ability. Also, prediction performance of ANNs depends strongly on the peculiar
characteristics of the data and the problem to be solved. Genome-enabled prediction has long
been the domain of linear statistics (Zhang et al., 1998). Linear models have the advantage
that they can be understood and analyzed in great detail and they are easy to explain and to be
implemented. However, they may be totally inappropriate if the underlying relationship
between genotype and phenotype is nonlinear. Hence, ANN analysis cannot replace
conventional statistical methods in genome-enabled prediction but may be applicable in
specific cases such as prediction of traits in which complex relationships between different
information sources are evident and no biological interpretation is needed.
107
Future directions and prospects for the use of ANNs for genome-enabled prediction
In future, the analysis of whole genome sequencing data may provide very useful information
for genome-enabled predictions, because of the possible hypotheses to allow for inferring
whole genetic variance and to discover the secrets of genomics. Initiatives to sequence large
panels are under way for humans (The 1000 Genomes Project Consortium) and cattle (The
1000 Bull Genomes Project). However, up to now it has not been possible to incorporate all
the desired data into genome-enabled prediction systems using ANNs. Therefore, the
requirement for much larger data samples as well as powerful computation systems has to be
met. However, we are currently in the age of “omics”, where a lot of different amounts of data
are generated. Thus, ANNs may be valuable in times, a huge amount of available data is
revolutionizing the animal breeders live. Today, especially, in functional traits the use of
ANNs will be favourable, since observations of phenotypes are costly and masked by noise
and the underlying relationship between several possible predictor variables is hardly
tangible.
At a glance, understanding complex genetic systems and their impact on phenotypic variance,
requires research at the interface between different disciplines, like computer science,
statistics, bioinformatics, animal breeding and genetics. Developments of new technologies
and availability of powerful computers have created a promising platform. Hence, it is
necessary that different fields in academics progressively work together to create more
effective solutions to enhance performance in genome-enabled predictions.
108
References
Breiman, L. 2001. Random Forests. Machine Learning 45:5-32.
Croiseau, P., A. S. Legarra, F. O. Guillaume, S. B. Fritz, A. L. Baur, C. Colombani, C. L.
Robert-Granie, D. Boichard, and V. Ducrocq. 2011. Fine tuning genomic evaluations
in dairy cattle through SNP pre-selection with the Elastic-Net algorithm. Genetics
Research 93:409-417.
de los Campos, G., D. Gianola, and G. J. M. Rosa. 2009. Reproducing kernel Hilbert spaces
regression: A general framework for genetic evaluation. Journal of Animal Science
87:1883-1887.
de los Campos, G., J. M. Hickey, R. Pong-Wong, H. D. Daetwyler, and M. P. L. Calus. 2013.
Whole-genome regression and prediction methods applied to plant and animal
breeding. Genetics 193:327-345.
Gianola, D., R. L. Fernando, and A. Stella. 2006. Genomic-assisted prediction of genetic
value with semiparametric procedures. Genetics 173:1761-1776.
Gianola, D., H. Okut, K. Weigel, and G. Rosa. 2011. Predicting complex quantitative traits
with Bayesian neural networks: a case study with Jersey cows and wheat. BMC
Genetics 12:87.
Gonzalez-Recio, O. and S. Forni. 2011. Genome-wide prediction of discrete traits using
bayesian regressions and machine learning. Genetics Selection Evolution 43:1-12.
Habier, D., R. Fernando, K. Kizilkaya, and D. Garrick. 2011. Extension of the bayesian
alphabet for genomic selection. BMC Bioinformatics 12:186.
Hornik, K., M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are
universal approximators. Neural Networks 2:359-366.
Jain, A. K., M. Jianchang, and K. M. Mohiuddin. 1996. Artificial neural networks: a tutorial.
Computer 29:31-44.
Kriesel D. 2007. A brief introduction to neural networks. Available at
http://www.dkriesel.com
Legarra, A. S., C. L. Robert-Granie, P. Croiseau, F. O. Guillaume, and S. B. Fritz. 2011.
Improved Lasso for genomic selection. Genetics Research 93:77-87.
Meuwissen, T. H. E., B. J. Hayes, and M. E. Goddard. 2001. Prediction of total genetic value
using genome-wide dense marker maps. Genetics 157:1819-1829.
109
Okut, H., D. Gianola, G. J. M. Rosa, and K. A. Weigel. 2011. Prediction of body mass index
in mice using dense molecular markers and a regularized neural network. Genetics
Research 93:189 - 201.
Park, T. and G. Casella. 2008. The Bayesian Lasso. Journal of the American Statistical
Association 103:681-686.
Piepho, H. P. 2009. Ridge Regression and Extensions for Genomewide Selection in Maize.
Crop Sci 49:1165 - 1176.
Solberg, T., A. Sonesson, J. Woolliams, and T. Meuwissen. 2009. Reducing dimensionality
for prediction of genome-wide breeding values. Genetics Selection Evolution 41:29.
Usai, M. G., M. E. Goddard, and B. J. Hayes. 2009. LASSO with cross-validation for
genomic selection. Genetics Research 91:427-436.
Van Raden, P. M. 2008. Efficient methods to compute genomic predictions. J Dairy Sci
91:4414-4423.
Vapnik, V. N. 1999. An overview of statistical learning theory. Neural Networks, IEEE
Transactions on 10:988-999.
Wolpert, D. H. and W. G. Macready. 1997. No free lunch theorems for optimization.
Evolutionary Computation, IEEE Transactions on 1:67-82.
Zhang, G., B. Eddy Patuwo, and M. Y. Hu. 1998. Forecasting with artificial neural networks:
The state of the art. International Journal of Forecasting 14:35-62.
113
General Summary
Genome-enabled predictions have become an important tool in many animal and plant
breeding programs. However, the application to predictions of complex traits based on
thousands of molecular markers faces several statistical and computational challenges.
Enhancing prediction performance and accuracy is demanding, because models have to cope
with the curse of dimensionality, co-linearity between markers and the complexity of
quantitative traits. The aim of the present thesis was to investigate a suitable, well performing
and broadly applicable method considering various relationships between markers in high
dimensional data for genome-enabled prediction of complex traits. A state of the art survey
and applications of artificial neural networks (ANNs) to real problems of genome-enabled
prediction in animal breeding are presented. ANNs are a part of a wide range of machine
learning methods that are characterized by their general learning ability, especially for non-
linear problems, their generalization capability and noise tolerance.
The focus of Chapter 1 and 2 was to provide an overview of artificial neural network
methodology and recent applications in animal and plant breeding. A comprehensive
literature review demonstrates the great potential of ANNs as well as it points to possible
problems in utilization. Several network types and training algorithms were analyzed and
major preconditions that have to be met for proper application were identified. Aspects, such
as data pre-processing, choice of a proper network architecture and setting of parameter
values of the training algorithm were prepared carefully. Furthermore, recent studies
employing genome-enabled prediction via ANNs indicate that the superiority of properly
specified ANNs over other prediction machines especially depends on the species, target trait,
environmental factors, sample size, and on the genetic relatedness between individuals in the
training and testing sets.
In Chapter 3, the theoretical considerations from the previous section have been implemented
real data applications, aiming to infer empirical evidence for the expected suitability of ANNs
to genome-enabled prediction. An important aspect when using ANNs for genome-enabled
prediction concerning large-scale data are the computational costs, which have to be kept
feasible. Programs were written and implemented in C++ to facilitate computational effort. A
regularized back-propagation algorithm was used to train a multilayer feed-forward network.
In order to assess the major influencing factors on the predictive performance of yet-to-be-
observed phenotypes of three milk traits (milk yield, fat and protein percentage), three dairy
cattle data sets were analyzed. Inputs to the network were data on 3,341 German Fleckvieh
bulls, 2,303 Holstein-Friesian bulls, and 777 Holstein-Friesian dams. All animals were
114
genotyped with a 50k SNP panel. Different non-linear network architectures as well as the
impact of data structure were evaluated. A linear network model was used as a benchmark.
Results showed that predictive abilities of different ANN models varied markedly, whereas
differences between data sets were small. Dimension reduction methods enhanced prediction
performance and consistency in all data sets. The implemented learning algorithm as training
rule to the ANN in conjunction with dimension reduction methods kept computational cost as
low as possible, while maintaining a good predictive performance. However, the large number
of parameters in a richly structured ANN seems to prevent its predictive power, if full
dimension of SNP-data is used as input to the network. Moreover, if the number of markers
vastly exceeds sample size (p>>n problem), the results confirm the robustness of linear
methods in genome-enabled prediction. ANNs may proof their superiority in traits for which
evidence of non-linearity is present, and where less parameterized linear methods fail. Hence,
ANNs may be more useful for predictions of functional traits than for milk traits, which seem
to be additively inherited and can be well predicted with linear methods.
This issue is addressed in Chapter 4. An ANN approach was used for the prediction of risk for
developing ketosis, a trait where non-linear relationship between predictors and response
variable is plausible. Building reliable predictive models of functional traits is in general
challenging, because such traits are commonly affected by multiple factors. In this study,
several combinations of metabolic, genetic and milk performance records were tested
simultaneously with respect to their influence on prediction performance. Data were collected
from 218 dairy cows during the first five Weeks of lactation. Predictions were computed
within and across Weeks. All animals were genotyped with a 50k SNP-panel. Metabolic
information includes two milk metabolites (GPC and PC) and milk yield, fat and protein
percentage, as milk performance records, were available. The betahydroxybutyric acid
(BHBA) concentration in milk samples was used as target variable in all prediction models,
since it reflects the risk of suffering from subclinical ketosis. Average correlation between
observed and predicted target values of up to 0.643 could be obtained, if combinations of
metabolic information with test-day records were used for prediction within Weeks. The
predictive performance of metabolic as well as of milk performance based models was higher
than of models based on genetic information. In summary, performance of ANNs is
encouraging for traits, which are influenced by several factors.
The aim of Chapter 5 was to find an adequate method, which maintains important advantages
of ANNs without losing too much information of the entire complexity through for example
alternatively required variable pre-selection methods (as shown in Chapter 3). The extreme
115
learning machine (ELM), a modified ANN approach, was implemented for genome-enabled
prediction using the entire marker set of large scaled data. It is a fast learning architecture for
ANNs, which replaces the parameter intensive learning. In this Chapter, important properties
of the ELM method are presented and the relationship to ANNs and other machine learning
methods, like support vector machines (SVM), were evaluated. The ELM, analogous to SVM
or ANN, has the ability of capturing nonlinearities, whereas the great advantage lies in
keeping the computational costs at a low level. Experimental results show that the ELM
approach is able to achieve good generalization performance with much less computational
effort. With a low number of neurons, the ELM matches the results of the ANN in predicting
milk traits from genomic information. Increasing the number of hidden neurons leads to
promising results as shown for the trait protein yield. In summary, the results indicated that
the ELM architecture is able to outperform the traditional gradient-based learning in ANNs
when analyzing the entire information of large-scale genomic data.
119
Zusammenfassung
Genomisch basierte Leistungsvorhersagen sind zu einem wichtigen Instrument in Tier- und
Pflanzenzuchtprogrammen geworden und werden heutzutage z.T. routinemäßig für die
Selektion der Eltern für die nächste Generation benutzt. Entsprechende Vorhersage-Modelle
sollten mit dem großen Umfang genomischer Daten, den Abhängigkeiten zwischen Markern
und der Komplexität von quantitativen Merkmalen bestmöglich umgehen können. Um die
Genauigkeit der genomischen Leistungsvorhersage zu verbessern, ist die Suche nach
Methoden, die diesen Anforderungen angemessen Rechnung tragen, von besonderem
Interesse.
Das Ziel der vorliegenden Arbeit war es, unter Berücksichtigung komplexer genetischer
Zusammenhänge, eine geeignete und breit anwendbare Methode zur genomischen
Leistungsvorhersage in der Tier- und Pflanzenzucht zu finden und auf ihre praktische
Anwendbarkeit hin zu untersuchen. Künstliche Neuronale Netze (KNN) stellen eine
vielversprechende Methode für die Vorhersage von komplexen Merkmalen dar, da sie in der
Lage sind, eine Vielzahl von verschiedenen Geninteraktionen zu berücksichtigen. KNNs
zählen zu den Verfahren des maschinellen Lernens und zeichnen sich durch eine universelle
Lerneigenschaft, ihre Verallgemeinerungsfähigkeit und die Toleranz gegenüber Rauschen in
den Daten aus.
Im ersten und zweiten Kapitel dieser Arbeit wurde ein umfassender Literaturüberblick über
die Methode und ihre Anwendungen im Bereich der Pflanzen- und Tierzucht gegeben.
Verschiedene Netzwerktypen und -topologien, sowie Lernalgorithmen und
Datenvoraussetzungen wurden analysiert und im Hinblick auf die Anwendung für genomisch
basierte Leistungsvorhersagen in der Tierzucht bewertet. Dabei wurde ein Fokus auf die
sogenannten mehrschichtigen Netzwerke, sowie Radialbasis-Funktionen-Netzwerke gelegt,
da diese für den Einsatz in Zuchtwertschätzungen besonders vielversprechend erscheinen. Bei
der Vorhersage von unbekannten Phänotypen spielen Aspekte wie die Aufbereitung der
Daten, die Anpassung der Parameter des Lernalgorithmus, als auch die optimale Auswahl der
Netzwerktopologie für den erfolgreichen Einsatz von KNNs eine große Rolle. Bisher
veröffentlichte Studien belegen, dass die Überlegenheit von KNNs gegenüber anderen
Methoden stark von der jeweiligen Datenstruktur, der Tierart, dem Zielmerkmal, den
Umweltfaktoren, der Stichprobengröße, sowie von der genetischen Verwandtschaft zwischen
den Individuen in den Trainings- und Testdatensätzen abhängt.
Im dritten Kapitel wurden die Erkenntnisse aus der Literaturrecherche in die praktische
Anwendung umgesetzt. Dafür wurde ein eigenes Programm in C++ geschrieben, welches
120
auch die Auswertung von größeren Datensätzen ermöglichte. Ein regularisierter Back-
Propagation-Algorithmus wurde als Lernregel für ein mehrschichtiges KNN eingesetzt. Um
die Haupteinflussfaktoren auf die Vorhersagegüte von drei Milchmerkmalen (Milchleistung,
Fett-%, Eiweiß-%) erfassen zu können, wurden drei verschiedene Datensätze ausgewertet.
Daten von 3341 Fleckvieh Bullen, 2303 Holstein Friesian Bullen und 777 Holstein Friesian
Kühen konnten verwendet werden. Alle Tiere wurden mit einem 50k SNP-Array
genotypisiert und nach der Qualitätskontrolle wurden 39.344, 41.995 und 41.718 SNP-Marker
für die Vorhersage benutzt. In den Analysen wurden verschiedene nicht-lineare
Netzwerktopologien und unterschiedliche Dateneingabestrukturen hinsichtlich der
Vorhersagegüte untersucht. Die Ergebnisse zeigten, dass dimensions-reduzierende
Maßnahmen zu wesentlich genaueren und konsistenteren Vorhersagegenauigkeiten in den
untersuchten Merkmalen führten und dass der Einsatz der gesamten genetischen Information
die Vorhersagekraft verminderte. Bei wesentlich mehr Markern als Tieren in der Analyse
konnte die Robustheit von linearen Methoden bestätigt werden. Zudem wiesen die Ergebnisse
darauf hin, dass das Potential von KNNs besonders im Rahmen der Vorhersage von
Merkmalen zum Tragen kommt, bei denen auch nicht-additive Genwirkungen eine Rolle
spielen (z.B. funktionelle Merkmale). Die Methode ist demnach v.a. dann zu bevorzugen,
wenn die genomisch basierte Leistungsvorhersage mit linearen, weniger parametrisierten
Methoden fehlschlägt.
Auf diese Ergebnisse aufbauend behandelt das vierte Kapitel die Vorhersage des Ketose-
Risikos bei Milchkühen, da dieses multifaktoriell bedingt ist und die Vorhersage mit linearen
Methoden nur bedingt möglich ist. In der Analyse wurden mittels eines KNN-Ansatzes
mehrere unabhängige Variablen und Variablenkombinationen simultan auf ihre
Vorhersageeignung für das Merkmal Ketose-Risiko getestet. Unter anderem wurden
verschiedene Metabolitenkonzentrationen in der Milch, SNP-Genotypen, sowie
Milchleistungsmerkmale bezüglich ihres Einflusses auf die Genauigkeit der Vorhersage hin
evaluiert. In allen Vorhersagemodellen wurde die in Milchproben gemessene
Betahydroxybuttersäure (BHBA)-Konzentration als Biomarker für das Risiko einer Kuh, an
einer subklinischen Ketose zu erkranken, verwendet. Dabei konnten durchschnittliche
Korrelationen zwischen beobachteten und vorhergesagten Merkmalswerten von bis zu 0,643
erzielt werden, wenn Kombinationen von Stoffwechselinformationen und
Milchleistungsdaten für die Vorhersage verwendet wurden. Stoffwechsel-, sowie
milchleistungsbasierte Modelle erzielten eine höhere Vorhersagegenauigkeit als Modelle, die
genetische Informationen verwendeten. Die Verwendung von KNN zur Vorhersage von
121
niedrig erblichen, komplexen Merkmalen (wie beispielsweise der Ketose) ist somit
vielversprechend.
Bezugnehmend auf die Ergebnisse des dritten Kapitels wurde im fünften Kapitel die Eignung
einer sogenannten „Extreme Learning Machine“ (ELM) zur genomisch basierten
Leistungsvorhersage von Milchmerkmalen untersucht. Es handelt sich hierbei um eine
schnelle Lernarchitektur für KNNs, die das sehr parameterintensive Lernen ersetzt und somit,
im Gegensatz zu einem Lernalgorithmus, die Nutzung der gesamten Marker Informationen
gewährleisten kann. Wichtige Besonderheiten des ELM-Verfahren wurden vorgestellt und die
Beziehung zu KNNs und anderen Methoden des maschinellen Lernens, wie den „Support
Vector Machines“ (SVM), herausgearbeitet. Die ELM hat, in der gleichen Weise wie SVM
oder KNN, die Möglichkeit der Erfassung von Nichtlinearitäten in den Daten. Experimentelle
Ergebnisse zeigten, dass der ELM-Ansatz in der Lage ist, eine gute Vorhersagegüte zu
erreichen, während der Rechenaufwand gering gehalten wird. Mit einer niedrigen Anzahl von
Neuronen stimmten die Ergebnisse des ELM-Verfahrens mit den Ergebnissen des KNN-
Ansatzes hinsichtlich der Vorhersagekraft bezüglich der Milchmerkmale überein, wobei die
gesamten SNP-Genotyp-Informationen genutzt wurden. Das Erhöhen der Anzahl der
versteckten Neurone führte zu vielversprechenden Ergebnissen für das Merkmal
Proteinmenge. Die Ergebnisse zeigten, dass die ELM in der Lage ist, das traditionelle
Gradienten-basierte Lernen in KNNs in der Analyse groß skalierter genomischer Daten zu
ersetzen.
123
Acknowledgements At this point I would like to thank all those who have contributed in various ways to my
research.
First of all I want to thank my supervisors Prof. Georg Thaller and Prof. Daniel Gianola for
their constant support, their never-ending encouragement and their belief in me. Thank you
for always being available for my questions, even in busy times, and for giving me advice
whenever I requested it. I am grateful for giving me the opportunity to spend four month with
the wonderful animal science group at the University of Wisconsin-Madison to work on my
first paper.
I warmly thank David Hochstuhl for drawing my attention to artificial neural networks and
his steady support in all questions of computations and related fields. Thank you for sharing
the ANN program with me and for all the encouraging discussions.
Thanks to Llibertat Tusell for making my stay in Madison as comfortable as possible and
your steady cross-border support way beyond Madison. Thank you for sharing your scientific
creativity and for your continuous interest in my first manuscript drafts.
I am also indebted to my coauthors and colleagues from my working group for joint work on
my papers and all the lively discussions. Above all, I would like to mention Nina
Krattenmacher, Jens Tetens, Dirk Hinrichs, Jennifer Salau and especially Claas Heuer, being
my number one contact person for all questions related to animal breeding and genetics.
My research was made possible through the financial support I received from the German
Federal Ministry of Education and Research (BMBF) within the AgroClustEr “Synbreed –
Synergistic plant and animal breeding”.
I also want to thank my fellow PhD students for regular lunch times, for quality times on
courses and conferences, for frequent cheering up, for their patient companionship and for
their friendship. My special thanks go to Marvin Gertz, Lukas Roos, Christoph Scheel, Karo
Reckmann, Kathrin Büttner and Danica Sindt.
Finally, I would like to thank my family who always encouraged me to pursuit my aims and
supported me wherever they could. Last but not least this PhD would not have been possible
without Anna Eigen and Anja Feddern. Thank you for your deep friendship and your
assistance in all situations of life. Ultimately, I want to thank Achim Seidel for his love and
his never-ending patience and support in the finish of the PhD.
125
Curriculum Vitae
Anita Ehret
Born, December 26th, 1985 in Hannover, Germany
Education
11/2010- 07/2014
PhD student at the Institute of Animal Breeding and Husbandry Christian Albrechts University Kiel
10/2010 MSc Agricultural Sciences Master Thesis: Ehret, Anita (2010) Analyse von Defensintranskripten im Knochenmark des Pferdes.
01/2009 BSc Agricultural Sciences Bachelor Thesis: Ehret, Anita (2009) Datenbank-basierte Kandidatengensuche für equine Osteochondrose.
10/2005 – 10/2010
Study of Agricultural Sciences at Christian Albrechts University Kiel
2005 Higher education entrance qualification at Kronwerk Gymnasium, Rendsburg
Qualifikation
08/2012- 12/2012
Research stay at the University of Wisconsin-Madison, Madison, USA
10/2011 Course: Statistical Learning Methods For DNA-based Prediction Of Complex Traits, Gianola
03/2011 Course: Programming in Animal Breeding, Misztal
top related