integer programming to evaluate operational impact of
TRANSCRIPT
Integer Programming to Evaluate
Operational Impact of Penetration Rate
Predictive Model
Sebastian Arenas Bermúdez
Department of Mining and Materials Engineering
McGill University, Montreal, Quebec, Canada
December 2020
A thesis submitted to McGill University in partial fulfillment of the
requirements of the degree of Master of Engineering
© Sebastian Arenas Bermúdez, 2020
1
Acknowledgements
First, I would like to express my deepest gratitude to professors Alessandro Navarra
and Roussos Dimitrakopoulos; they were both extraordinary mentors during this
graduate program and I could never have done it without their guidance. I would
also like to thank Barbara Hanley, and Deborah Frankland for the outstanding
administrative support during my time in McGill University.
I would also like to share my gratitude with those who where not only my partners
but my counselors during this journey: Daniel, Zachary, Zeyneb, Christian,
Matheus, Joao, Luiz, Ashish, Lingquing. I would like to give a very special thank
you to Dr. Zeyneb Brika for always taking the time to assist me during the
development of my recent research as well as her diligent efforts correcting and
suggesting changes to this thesis.
In addition, I would like to thank all the professors and colleagues at Universidad
Nacional de Colombia for helping me grow personally and professionally. In
particular, I would like to extend my thanks to Professor Jorge Martin Molina
Escobar for his guidance and advice during my undergraduate program.
Lastly and most importantly, I would like to thank our almighty God and my family,
Gloria, Otoniel, Susana, Mercedes, Sinforiano, Otoniel, and Luz Dary; their tireless
support and unconditional love makes this achievement also theirs.
2
Contributions
All of the chapters in this thesis have been entirely written by the student in
question, Sebastian Arenas Bermúdez.
3
Abstract
Short-term mine production scheduling for underground mines is considered to be
both complex and crucial not only for having successful operations but also for
accomplishing targets stated for the medium- and long-term mine plans. Moreover,
underground operations are also known for having a significant amount of
independent decisions concerning the available resources, tasks to be accomplished,
and technical aspects of the underground mining openings. Consequently, for
having high productivity within the operations, it is necessary to utilize a decision-
making tool to build short term production plans. Furthermore, incorporating
predictions of performances based on historical operational data into the decision-
making tool allows the decision-makers to accumulate more knowledge about the
reality of the operation. This thesis explores the benefits that performance
predictions can offer when incorporating them into the generation of drilling plans
within the context of underground mines.
In particular, an artificial neural network (ANN) is trained with operational data
from an underground gold mine. Consequently, the trained network is used to
predict the rate of penetration (ROP) for all possible combinations between the
available drilling machines and operators, activities to be performed, and openings
or destinations where these tasks can be executed. Moreover, an integer program
formulation is constructed to demonstrate the impact of incorporating these
predictions into operational decision-making. The integer programming formulation
aims to maximize the drilled meters per day while respecting physical and
operational constraints. A comparison between the initial drilling plan of operations
and the plans given by the optimization model with and without accounting for the
predictions is performed. The plan generated by the model which utilizes the
predictions resulting from the neural network outperforms both the initial plan of
drilling activities and the plan generated by the optimization model without
accounting for the predictions during the first 15 days of operations.
4
Résumé
La planification minière à court terme des opérations souterraines est considérée complexe
et à la fois cruciale, non seulement pour la réussite des opérations, mais aussi pour la
réalisation des objectifs fixés par la planification à moyen et long terme. De plus, les
opérations souterraines sont également connues pour avoir un nombre important de décisions
indépendantes concernant les ressources disponibles, les tâches particulières à accomplir et
les aspects techniques des tunnels. Pour avoir une productivité élevée, il est donc nécessaire
d'utiliser un outil d’aide à la décision pour établir des plans de production à court terme. Par
ailleurs, l'intégration de prédictions d’efficacité basées sur des données opérationnelles
historiques dans le outil d’aide à la décision permet aux décideurs d'accumuler davantage de
connaissances sur la réalité de l'opération. Dans ce contexte, cette thèse explore les avantages
que ces prédictions peuvent offrir lors de leur intégration dans la génération de plans de
forage dans le contexte des mines souterraines.
En particulier, la formation d’un réseau neuronal artificiel (RNA) est exécutée avec des
données industrielles provenant d'une mine d'or souterraine. Ce réseau est ensuite utilisé pour
prédire le taux de pénétration (TP) pour toutes les combinaisons possibles de machines de
forage et d’opérateurs disponibles, l’assignation de tâches et les tunnels ou destinations où
ces tâches devront être exécutées. De plus, une formulation de programme en nombre entier
est construite pour démontrer l'impact de l'incorporation de ces prédictions dans la prise de
décision opérationnelle. Cette formulation vise à maximiser les mètres forés par jour, tout en
respectant les contraintes physiques et opérationnelles. Une comparaison entre le plan
d'opérations de forage initial et les plans améliorés par le modèle d'optimisation avec et sans
prédictions de TP est effectuée. Au cours d’une période de 15 jours d'exploitation, le plan
généré par le modèle d’optimisation qui utilise les prédictions résultant du réseau neuronal
surpasse à la fois le plan initial et le plan généré par le modèle d'optimisation qui n’utilise
pas les prédictions.
5
Contents
Acknowledgements ...................................................................................................... 1
Contributions .............................................................................................................. 2
Abstract ....................................................................................................................... 3
Résumé ........................................................................................................................ 4
List of Figures ............................................................................................................. 7
List of Tables ............................................................................................................... 9
List of Terms ............................................................................................................. 10
Chapter 1 - Introduction ...................................................................................... 12
1.1 Objectives ......................................................................................................... 14
1.2 Thesis outline .................................................................................................. 15
Chapter 2 – Literature Review ........................................................................... 16
2.1 Rate of penetration (ROP) predictive models ................................................. 17
2.1.1 Traditional models .................................................................................... 17
2.1.2 Artificial intelligence-based models ......................................................... 21
2.2 Mathematical programming in the short-term planning of mining activities
for underground mines .......................................................................................... 26
Chapter 3 - Methods .............................................................................................. 28
3.1 Artificial neural networks (ANN) ................................................................... 28
3.1.1 Relation between least-squares regression and neural net training ...... 32
3.1.2 Adjustment of parameters ........................................................................ 34
3.1.3 Gradient descent optimization ................................................................. 36
3.1.4 Error back-propagation ............................................................................ 37
3.2 Integer linear programming ............................................................................ 40
3.2.1 Solution approaches .................................................................................. 41
Chapter 4 - Case Study – Underground Gold Project ................................... 43
4.1 Datasets ........................................................................................................... 43
4.2 Underground operation outline....................................................................... 44
4.2.1 Mine layout ............................................................................................... 45
4.2.2 Mining method .......................................................................................... 46
4.2.3 Drilling operation...................................................................................... 48
4.3 Prediction of rates of penetration (ROP) ........................................................ 48
6
4.3.1 Data preprocessing ................................................................................... 49
4.3.2 Testing different architectures of neural networks ................................. 52
4.3.3 Performance of the final architecture on the testing set ......................... 61
4.4 Incorporation of predictions into an integer program .................................... 63
4.4.1 Integer program formulation .................................................................... 63
4.4.2 Evaluating the impact of the ANN predictions in the ILP results ......... 67
Chapter 5 - Conclusions and Future Work ....................................................... 69
5.1 Conclusions and objectives met ...................................................................... 70
5.2 Future work ..................................................................................................... 73
References ................................................................................................................. 74
7
List of Figures
Figure 3.1 - Network diagram for a two-layer neural network (Bishop & Nabney,
2008) .......................................................................................................................... 31
Figure 3.2 - Geometrical view of the error function E(w) as a surface sitting over a
weight space (Bishop & Nabney, 2008) .................................................................... 35
Figure 3.3 - Forward and backward propagation of error information (Bishop &
Nabney, 2008)............................................................................................................ 39
Figure 4.1 - Flowchart of the proposed approach ..................................................... 44
Figure 4.2 – Boomer 282 (Epiroc, 2020) ................................................................... 45
Figure 4.3 – Mine layout (Red Eagle Mining, 2014) ................................................ 46
Figure 4.4 - MSDF: first and second lifts (Red Eagle Mining, 2014) ...................... 47
Figure 4.5 – Subsequent lifts and mucking (Red Eagle Mining, 2014) ................... 47
Figure 4.6 - Correlation matrix, including operators and performance measures . 49
Figure 4.7 - Correlation matrix, including shift, team, rock types and performance
measures .................................................................................................................... 49
Figure 4.8 - Left: ROP histogram. Right: normalized ROP histogram ................... 51
Figure 4.9 - Left: sample from training data. Right: sample from training data after
one-hot encoding of the team attribute .................................................................... 52
Figure 4.10 – Architecture and variables ................................................................. 53
Figure 4.11 - Comparison of training loss evolution with different optimizers for a
one hidden layer model ............................................................................................. 57
Figure 4.12 - Comparison of training loss evolution with different optimizers for a
two hidden layers model. .......................................................................................... 57
Figure 4.13 - Comparison of training loss evolution with different optimizers for a
three hidden layers model ........................................................................................ 57
Figure 4.14 - Comparison of training loss evolution with different batch sizes for a
two hidden layers model ........................................................................................... 58
Figure 4.15 - Comparison of training loss evolution with different batch sizes for a
three hidden layers model ........................................................................................ 58
8
Figure 4.16 - Training and validation performance ................................................. 60
Figure 4.17 - Features and label for the ANN model ............................................... 61
Figure 4.18 – Results of the case study .................................................................... 68
9
List of Tables
Table 4.1 - Model features and target ...................................................................... 43
Table 4.2 - Dimension of parameters for the ANN. Individual parameters are
referenced by the dimension letter in lowercase as a subscript .............................. 53
Table 4.3 - Combinations during hyperparameter optimization ............................. 59
Table 4.4 - Comparison with different neural network architectures ..................... 60
Table 4.5 - Drilling plan example for one day of operations .................................... 61
Table 4.6 - Computational performance of the proposed model, number of decision
variables and number of constraints ........................................................................ 67
Table 4.7 - Cumulative results for the case study ................................................... 67
10
List of Terms
ANN Artificial neural network
AI Artificial intelligence
AR Advance rating
BTS Brazilian tensile strength
CDF Cumulative distribution function
DNN Deep neural network
DPS Distance between planes of weakness
FL Fuzzy logic
FPI Field penetration index
GPU graphics processing unit
ICA Imperialism competitive algorithm
IMS Integrated mass system
IP Integer programming
LOM Life of mine
LP Linear programming
LTMPS Long-term mine production schedule
MIP Mixed-integer programming
MLP Multilayer perceptron
MSDF Mechanized shrinkage with delayed fill
PDF Probability density function
RCS Rock compressive strength
11
RMR Rock mass rating
ROP Rate of penetration
RPM Revolution per minute
RQD Rock quality designation
RSR Rock structure rating
SGD Stochastic gradient descent
STMPS Short-term mine production schedule
SVM Support vector machines
SVR Support vector regression
TBM Tunnel boring machine
USC Uniaxial compressive strength
WOB Weight on bit
12
Chapter 1
_________________________________________________
Introduction
Large capital investments are generally necessary when a company initiates a
mining project. High costs are encountered at the beginning of the operations and
returns on investment are not seen before developing enough openings to connect to
the orebody. On the other hand, mining is also known for being a very lucrative
industry when the operations involved have an efficient execution. Effectively
managing long-, medium-, and short-term mine planning is crucial in order to
achieve successful and highly productive operations, which, in the end, translates
into profits. Moreover, operations can be analyzed and scheduled differently
according to the timeframe of the given planning exercise. In fact, numerous
operational activities to be executed during the life of mine (LOM) are typically
planned considering the short-term state of the mine (Campeau & Gamache, 2019).
This is often seen as a problem when one is required to move from the long-term to
shorter and more precise operational plans. This thesis proposes a mathematical
formulation aiming to optimize and test multiple scenarios for the scheduling of
drilling operations while incorporating predictions of rates of penetration (ROP) in
a mechanized underground gold mine.
Furthermore, the performance of drilling activities is an essential indicator in
underground mining operations with a high correlation to production targets. In
mechanized projects where drilling machines are not autonomous, the efficiency of
performing a drilling activity is measured by considering both the performance of
the operator and the equipment. Other aspects such as the duration of the shifts,
the particular team of miners working on the job, as well as the geometry, the
13
geology, and geomechanical characteristics of the openings also have an important
impact on the output of drilling activities (Awuah-Offei, 2016; Kahraman, 2006).
Scheduling drilling operations in an underground project is known for being a highly
demanding and complex activity. This complexity is often related to the great
number of independent decisions to be made, involving the available operational
resources and accounting for their respective restrictions. Factors affecting this
decision-making process include analyzing previous relations between those
resources and the geological and geotechnical characteristics of the excavation, as
well as the high demand for openings to be drilled, among others. Consequently, the
implementation of a decision-making tool in order to perform the planning and
scheduling of short-term activities is imperative to maintain a high level of
productivity within the operations. Furthermore, incorporating data-driven
predictions of operational features in terms of performance allows the decision-
makers to be more informed about the process’ reality making any decision.
Therefore, this thesis contains a proposed optimization model for planning short-
term drilling activities while accounting for predictions of operational
performances.
In this thesis, the proceeding case study strategically determines and utilizes the
predicted performances from the collected data by using a feedforward network
approach. Then, these predictions are included in the parameters of the integer
programming model in order to generate an optimal drilling plan for a short-term
horizon of time. One initial experiment is performed by only testing the optimization
model without incorporating the predictions given by the neural network.
Consequently, a second experiment is executed, and in this case, the optimization
model is tested with the addition of the predictions. Finally, results found in both
scenarios are compared to the conventional drilling plan provided by the mining
project. This thesis thus demonstrates an approach for justifying the
implementation of novel ROP predictors within short-term underground planning.
In particular, the thesis develops a DNN predictor, but the approach could be
14
adapted for other predictors including traditional regressions or specialized machine
learning formulations.
In the following chapters, both the artificial neural network approach and the IP
model are outlined, followed by a comprehensive case study. Subsequently,
conclusions and future work are presented.
1.1 Objectives The goal of this thesis is to investigate and test frameworks that predict and
optimize penetration rates as well as short-term mine drilling schedules based on
historical data coming from an underground gold mine. In order to accomplish this
target, the subsequent tasks and objectives must be achieved.
1. Develop and demonstrate an artificial neural network (ANN) approach to
predict rates of penetration for an underground mining operation. These
predictions must be based on drilling information coming from the
underground operation.
2. Develop an integer programming (IP) formulation for the optimization of the
drilling activities in order to maximize the drilled meters per day in an
underground operation.
3. Demonstrate by running the IP optimization model for scenarios with and
without the incorporation of the ROP predictions, it is possible to evaluate
and determine if these predictions should be implemented and included in
the computation of future drilling plans or not.
15
1.2 Thesis outline
1. The current chapter presents the motivation, goals, objectives, and the
outline of the thesis.
2. The second chapter contains the literature review regarding different
methods used to predict the rate of penetration (ROP) values. This entails
covering developments in both traditional and machine learning
approaches. This chapter also presents the literature review corresponding
to different mathematical programming approaches in the mining industry,
specifically in applications that depend on the optimal short-term
scheduling of underground activities.
3. The third chapter outlines the methods used in this thesis. First, concepts
on artificial neural networks (ANN) are reviewed, followed by key concepts
in mathematical programming, including integer programming techniques.
4. The fourth chapter presents a case study where both methods are applied.
First, different artificial neural networks (ANN) are implemented and
tested, and an optimal architecture is found for predicting rates of
penetration. Subsequently, the proposed integer programming formulation
is tested on two different scenarios.
5. The fifth chapter summarizes the discoveries of this thesis and suggests
options for future work.
16
Chapter 2
_______________________________________________
Literature Review
This chapter outlines the literature pertinent to the subjects of artificial neural
networks for the prediction of drilling productivity, as well as some mathematical
programming approaches applied to different optimizations, within different
industries, focusing on the mining sector. More specifically, this literature review is
structured into two sections.
1. Section 2.2 reviews past works associated with the prediction of penetration
rates in the civil, petroleum, and mining industries. It covers traditional
approaches proposed since the early sixties. The transition between the
earlier empirical and analytical models is detailed in this section. Moreover,
it contains the development of machine learning techniques from more
conventional methods to algorithms that provide important improvements in
terms of the accuracy of the results. Some data-driven approaches to predict
rates of penetrations are covered.
2. Section 2.3 reviews the literature focusing on different mathematical
programming approaches used in the mining industry, more specifically in
the successful planning of activities for underground mines. Following this,
optimization models regarding short-term mine plans are covered.
17
2.1 Rate of penetration (ROP) predictive models
The rate of penetration (ROP) can be expressed as the quotient between the distance
excavated and the operating time in the course of an uninterrupted underground
development phase. This ROP can be used to measure the performance of the
underground drilling machines, which is widely known for being a challenging and
crucial task in the development and success of mining excavations. A precise
prediction of the ROP facilitates efficient and accurate planning. In recent decades,
several studies have been developed in order to formulate models with higher
accuracies when determining parameters involved in the ROP predictions, which
are not exclusive to underground mining operations. In fact, most of the applications
found in the literature belong to industries such as tunneling and oil and gas.
Indeed, the following subsections review some applications where the ROP is
predicted for underground mining as well as for tunneling and petroleum projects.
2.1.1 Traditional models
The planning of underground projects, a successful cost control, and the decision of
the construction approach require effective prediction of underground drilling
machines’ performances. Theoretically, there is a complex relationship between the
rock mass and the drilling machines. This complexity makes it hard to obtain a
reliable estimation of performances for the underground drilling machines. Many
earlier works aim to employ empirical and field models where the relationship
between the ROP and parameters involved in the underground drilling operations
is utilized by implementing mathematical functions. Early literature documenting
the prediction of ROP by employing analytical models was published in the early
sixties and corresponds to Maurer (1962) and Bingham (1965), two researchers
belonging to the petroleum industry. Maurer (1962) proposed a mathematical model
focused on the estimation of the ROP for rolling cutter bits. This analytical model is
known for utilizing the rock cratering approach while being based on input features
such as the rock compressive strength (RCS), the weight on bit (WOB), the diameter
of the drill bit, and the rotating speed revolution per minute (RPM). Moreover, an
18
empirical coefficient was added in this theoretical model in order to include the type
of rock where the drilling operation is performed. Bingham (1965) established a
mathematical model in order to perform ROP predictions based on only the WOB,
RPM, and bit diameter. As can be expected, these initial applications carried several
limitations that compromise the accuracy of the predictions (Soares C, 2016).
Another model was developed by Eckel (1967), aiming to utilize a Reynolds number
function to establish the relationship between the ROP and the characteristics of
the mud, considering the effects of the latter as an additional feature. Almost one
decade later, Bourgoyne & Young (1974) proposed an additional mathematical
model employing multiple regression analysis with new parameters in order to
incorporate different physical and geological features ignored by previous
investigations.
The earliest attempts to develop prediction models in the tunneling and mining
industries sought to estimate the performances of tunnel boring machines (TBMs)
(Graham, 1976, Nelson et al., 1985, Hughes, 1986) and specifically the disc cutters
in mining excavations (Farmer & Glossop, 1980, Ozdemir, 1978) based on the
features having the highest impact in the drilling operations. The model developed
by Graham (1976) only requires parameters such as the Uniaxial compressive
strength (UCS) and the average thrust per cutter when being computed. Farmer &
Glossop (1980) and Nelson et al. (1985) correlated the ROP with the rock fracture
toughness and the rock tensile strength, respectively. Ozdemir (1978) adopted new
parameters such as the UCS according to the diameter of a disc, the radius of the
cutter, the penetration of the disc, the spacing between disc grooves, shear and
compressive strength, and the angle corresponding to the cutter edge. McFeat-
Smith & Tarkoy (1979) proposed a model where different relations are used to
perform ROP predictions. Even though this study considers several machines and
geologies, in practice the established model cannot be used since it was developed
for a particular excavation. In the model presented by Hughes (1986), parameters
such as the number of cutters per kerf (groove) as well as the radius of the discs
were considered. Although these empirical and theoretical equations are well known
19
for being easy to implement, they have the limitation of only being developed for
homogeneous and isotropic environments. In general, the predictions performed
with these models are underestimated as a result of the lack of joint parameters,
making these models limited in terms of their applicability.
Posterior to the simple models listed above, the ROP predictive models continued to
account for multiple parameters while considering both rock mass and drilling
machine features. Models developed by the Colorado School of Mines (Rostami &
Ozdemir, 1993, Rostami, 1997), and Norwegian University of Science and
Technology (Blindheim, 1979, Bruland, 1998) are the most known ROP predictive
models using multiple parameters. Tests performed by Blindheim (1979) and
Bruland (1998) allowed the authors to derive parameters related to indentation,
drillability, and boreability in order to build an improved predictive model for cutter
penetration with different correlations between the drilling machine and the rock
mass, including most of the relevant influencing features. These models were
established through a multivariate regression approach. In the investigations
presented by Rostami & Ozdemir (1993) and Rostami (1997), rock fragmentation
produced by the disc cutters was recreated in the full-scale linear cutting tests.
These tests permitted the authors to produce predictions with higher accuracies due
to their capability to adapt to field conditions. Other studies focusing on models with
multiple parameters were developed in Barton (1999) and Barton (2000), wherein
the proposed QTBM model was based on the Q rock mass classification system. The
QTBM model is able to predict the ROP and the advance rate of the corresponding
drilling machine after computing the value of QTBM. It is important to mention that
new parameters such as joint conditions, Rock Quality Designation (RQD), the
quartz content, stress condition, intact rock strength, and the TBM thrust were
adopted in order to make the model adaptable to be used in drilling operations.
There was an improvement in terms of the robustness of the datasets being utilized
to build these models with multiple parameters. However, too many features are
considered for practical applications, and special laboratory tests are still required.
In addition, these models expect data from specific zones, meaning that in areas
20
where it is impossible to perform the test, the predictions must be based on data
from areas with similar properties, generating a considerable error within the
method (Farrokh et al., 2012).
More recent work has focused on developing probabilistic-based models like the ones
presented in Laughton (1998), Nelson et al. (1999), and Al-Jalil (1998), which
consider more elaborate statistical approaches to predict the ROP. Probabilistic
models have the advantage of accounting for the uncertainty inherent in the data
when executing a performance assessment. However, these types of models are not
very common when predicting features such as the ROP since they suffer from one
major drawback: they rely on information such as the probability density function
(PDF) for each of the features involved in the method. These PDFs must be gathered
from projects with similar characteristics in order to support the new predictions.
Nevertheless, this type of information is often hard to obtain, and the conditions
between one project and the others often do not have the similarity required for
these methods, resulting in critical errors. Another disadvantage of these
probabilistic models is based on the fact that the iteration between the drilling
machine and the rock mass is ignored. In general, probabilistic models are
considered to be more complex to be implemented, and therefore they are less used
than deterministic multi-parameter models.
Other authors aimed to correlate the ROP with the rock mass classification systems
(Cassinelli et al., 1982; Innaurato et al., 1991; Grandori et al., 1995; Sundaram et
al., 1998; McFeat-Smith, 1999; Sapigni et al., 2002). Rock classification systems
such as the rock mass rating (RMR), the rock structure rating (RSR), the Q system,
and the integrated mass system (IMS) were utilized in these models. For instance,
Cassinelli et al. (1982) utilized the RSR system in conjunction with the performance
of a drilling machine in order to estimate the ROP. Subsequently, Innaurato et al.
(1991) adopted the uniaxial compressive strength UCS to the model proposed by
Cassinelli et al. (1982). McFeat-Smith (1999) and Grandori et al. (1995) formulated
a model by generating a correlation between the IMS and features such as the
21
utilization of the drilling machine, the ROP, and the advance rate (AR). A model
presented in Sundaram et al. (1998) showed different correlations with features
such as the ROP, field penetration index (FPI), massic energy, torque, and
utilization. In addition, Sapigni et al. (2002) studied the correlation between the FPI
and the RMR system. Moreover, all the studies listed above concluded that the ROP
decreases when the rock being drilled is considered competent. However, results
also showed that the ROP is low in cases with poor rock qualities and constant
discontinuities.
2.1.2 Artificial intelligence-based models
Regression techniques are extensively employed in applications where the goal is to
predict continuous values. Moreover, machine learning methods such as
feedforward neural networks have become common tools to perform both
classification and regression tasks. A feedforward neural network is a mathematical
representation of computation mimicking the behavior of the human brain. The
model consists of many basic computing devices known as the neurons that are
interconnected within a dense system, where highly complex computations are
carried. The approach of learning with neural networks was first suggested in the
mid-20th century. Learning with neural nets is an effective paradigm to accomplish
advanced and innovative reasoning sequences within several learning applications.
In terms of mathematics, a neural network can be defined as a directed graph with
nodes acting as the neurons and edges acting as neural connections. At each node,
a weighted sum of the outputs is taken as an input linked to its incoming edges
(Shalev-Shwartz & Ben-David, 2014; McCulloch & Pitts, 1943; Widrow & Hoff,
1960; Rosenblatt, 1962; Rumelhart & McClelland, 1986).
Modern machine learning methods offer a very robust outline in terms of supervised
learning. Supervised and unsupervised learning are two common terms within the
machine learning field. The main difference between these two can be explained by
the fact that supervised learning methods are performed having prior knowledge of
what the output values should be, and unsupervised methods do not incorporate the
22
labeled outputs in order to infer the behavior given by the dataset in concern. The
step from a feedforward to a deep feedforward architecture is based on the addition
of layers and units within a layer. More than two layers constitute a deep network
that is able to characterize functions of escalating complexity regarding the high
dimensionality of the data. Deep neural nets are the foundation for so-called deep
learning. Most applications based on mapping an input vector of features to an
output vector of labels can be achieved with the implementation of a deep learning
model (Goodfellow et al., 2016; Bishop, 2006; Murphy, 2013). Although neural
networks are sometimes costly to train, they are comparatively robust and
adaptable in comparison to other mathematical representations. The structure of
multi-layered neural nets involves several parameters that must be fitted through
non-convex optimization problem which can result in more than one optimum value.
However, due to their universal function approximation property, neural nets are
allowed to be broad enough to represent most of the complex distributions
(Goodfellow et al., 2016).
There have been several advancements to determine more accurate predictions,
consisting of basic regressions and classifications that involve linear combinations
of predetermined basis functions, with some advantageous analytical and
computational attributes but low applicability, restricted by the so-called curse of
dimensionality that refers to the numerous phenomena encountered when studying
and organizing data in high dimensional spaces (Bishop & Nabney, 2008, Bishop,
2006, Goodfellow et al., 2016). Recent frameworks can adapt the basis functions to
the data in order to develop representations that consider with a large number of
features. Support vector machines (SVMs) focus on this by identifying basis
functions that are centered on the points corresponding to the chosen training
dataset and subsequently picking a subset of these while the model is training. The
adjustment of the fit, i.e., the learning algorithm, depends on the convexity of the
loss function which is to be minimized as part of the training process. The number
of basis functions that are active in the final models are usually far less than the
number of points corresponding to the training dataset (Vapnik, 1995; Burges, 1998;
23
Cristianini & Shawe-Taylor, 2000; Müller et al., 2001; Schölkopf et al., 2000;
Herbrich, 2002). Arguably, the most effective and robust approach among the
supervised learning frameworks for pattern identification is the artificial neural
network (ANN); this approach is considered to be the most similar to the human
approach of learning. In many applications, the final model is substantially more
efficient, and subsequently, the speed of the evaluation process is higher compared
to support vector machine having the same performance on the validation and
testing sets (Shalev-Shwartz & Ben-David, 2014; McCulloch & Pitts, 1943; Widrow
& Hoff, 1960; Rosenblatt, 1962; Rumelhart & McClelland, 1986).
The adaptable qualities of artificial intelligence (AI) models have allowed many
engineering problems with complex and nonlinear relationships between features to
be captured in an effective manner. Prediction of underground drilling
machine performances, and specifically of ROPs, has indeed been observed to
involve complex and nonlinear relationships between features.
Similar to the traditional models, artificial intelligence (AI) based models were
suggested first within the petroleum industry. Bilgesu et al. (1997) proposed an
ANN with three hidden layers and 27 hidden units; the layer structuring of ANN
will be described in the following chapter. This neural net was trained to predict
ROP values based on data coming from different formation types and drilling
features. This study aimed to provide a more accurate solution by modeling the
complex patterns involved in the drilling operations that other traditional models
and statistical approaches failed to represent. Consequently, several studies were
developed in the petroleum industry in order to demonstrate how AI techniques can
outperform the empirical and theoretical formulations as well as the mathematical
models previously proposed (Al-AbdulJabbar et al., 2018; Amar & Ibrahim, 2012;
Arabjamaloei & Shadizadeh, 2011; Bataee & Mohseni, 2011; Salaheldin, 2018;
Hegde et al., 2018; Elkatatny et al., 2017).
Likewise, several investigations were developed within the tunneling and the
mining industries in order to utilize the robustness of the AI-based models when
24
predicting the drilling performance of underground machines. Both Bruines (1988)
and Grima et al. (2000) proposed hybrid neuro-fuzzy approaches where both ANN
and fuzzy logic (FL) are combined to generate ROP predictions while accounting for
the uncertainty and imprecision of the data, and performing inference and decision
making. Okubo et al. (2003) presented an expert system approach aimed to predict
TBM performances for competent rocks for various underground projects in Japan.
In this study, the performance of the drilling machines is determined in terms of the
ROP, advance rate (AR), thrust force, rolling force, rotational speed, and other
selected features. Additionally, the approach presented in Okubo et al. (2003)
evaluates the predictions obtained by the model with respect to different approaches
presented in the literature.
Benardos & Kaliampakos (2004) trained an ANN model by using the Athens Metro
dataset. Different than traditional studies, the aim of this research was to predict
the advance rate (AR) of the tunnel boring machines by including different
geomechanical and geological features. The architecture of the ANN presented in
Benardos & Kaliampakos (2004) consists of four layers: eight units in the input layer
corresponding to the number of features, and thirteen units in the remaining hidden
and output layers for a total of twenty-one neurons. Following the mathematical
formulation showed in Yagiz (2008), an ANN model was proposed by Yagiz et al.
(2009) in order to predict the cutting performance of TBMs for competent rocks in a
tunneling project in the USA. In this study, only four features (UCS, Brazilian
tensile strength (BTS), the distance between planes of weakness (DPW), and the
angle between the plane of weakness and the direction of the TBM) were used to
train the model and to predict the ROP. This ANN model presented by Yagiz et al.
(2009) was built with only one hidden layer composed of eight units. Moreover, the
performances between the ANN model and a nonlinear multivariable regression
method were compared. Zhao et al. (2007) developed model to predict the ROP of
TBMs for a tunneling project. This model was trained only with data coming from
two tunnels with only one type of rock and one type of drilling machine; therefore,
this model is not suitable for scenarios with different characteristics. Additionally,
25
this model does not take into account the effect of the in situ stress in the ROP
values. Another ANN model was presented by Gholamnejad & Tayarani (2010),
seeking to predict the ROP of one TBM. The architecture used for this model
consisted of a five-layer neural network with three input features (UCS, DPW, and
RQD), three hidden layers with nineteen units, and one output layer.
More recent AI approaches have been developed by hybridization. A particle swarm
optimization (PSO) approach for predicting ROP values from a competent rock
dataset was presented by Yagiz & Karahan (2011). Moreover, a support vector
regression (SVR) and fuzzy-logic models were proposed by Mahdevari et al. (2014)
and Ghasemi et al. (2014), respectively, aiming to predict rates of penetration for a
hard rock dataset. A more robust work was presented by Armaghani et al. (2017)
aiming to investigate and test new AI-based models for estimating the performance
of TBMs in terms of rates of penetration. Several input features were utilized in this
research to test different combinations of intelligent systems. The hybrid system
used were combinations between imperialism competitive algorithm (ICA) and
ANN, PSO and ANN, and a separate ANN. Finally, Zhou et al. (2019) and
Koopialipoor et al. (2020) developed two hybrid models (ANN-genetic programming
and ANN-firefly algorithm) aiming to predict the performance of TBMs.
The results of the majority of the studies reviewed in this section showed that AI-
based models have higher accuracy compared with traditional models. However, it
is important to notice that all of this research was developed for predicting the
performance of nearly autonomous drilling machines, as is the case for the TBMs.
Furthermore, no studies regarding high-skilled human-operated equipment are
found.
26
2.2 Mathematical programming in the short-term
planning of mining activities for underground
mines
Many of the mathematical programming models developed for the mining industry
have been focused on long-term mine production scheduling (LTMPS). Nevertheless,
the attention on short-term mine production scheduling (STMPS), has significantly
increased in the past decades. The acceleration of scheduling tasks and the
reduction of operational costs are some of the reasons for the increasing interest in
the optimization of STMPS. Modeling real operational conditions is considered
fundamental and complex, given the wide variety of conditions from one mine to the
other. Indeed, the formulation of an optimization model for short-term mine
planning is often linked with the nature of the operations taking place in the
particular mine site that is under investigation. Therefore, the optimization goals
are also highly related to the operations themselves and the mining projects.
However, almost all the mathematical formulations for optimizing both LTMPS and
STMPS are directed to open-pit mines. Newman et al. (2010) stated that the layout
of the underground mines is more complex and is limited by more aspects than those
of open-pit mines. Furthermore, there is a broad range of underground mining
methods, making the models developed to optimize short-term mine plans particular
for each of the methods. Therefore, the optimization goals are even more sparse and
diverge from one underground project to the other.
Some of the research developed regarding the mathematical programming of
STMPS for underground projects are directed to the growth of real-time monitoring
and optimization systems, as is outlined in Song et al. (2013). These models consider
problems that have been widely investigated in open-pit mining, such as the
dispatching of mobile equipment. On the other hand, Nehring et al. (2012) proposed
an integrated short-and medium-term model for underground mine production
scheduling. This integrated scheduling tool proved to be useful and effective in
providing globally optimal scheduling while considering all the connections that
27
appear between both the medium and the short-term planning. Moreover,
O’Sullivan & Newman (2015) developed a model aiming to optimize the scheduling
of complex operations in short-term mine planning. In this research, a heuristic is
implemented in order to enhance the tractability of the given problem. Finally, a
larger research is presented by Campeau & Gamache (2019), where a preemptive
mixed-integer program was used to generate optimal short-term scheduling of
several underground activities while accounting for multiple independent decisions
involved within the operation. Multiple tests were run within this research while
also considering scenarios that simulate different stages of the project. Different
than previous works, this investigation aims to simplify the transition from medium
to short-term mine planning by assessing different operational scenarios and
guaranteeing an optimal allocation of assets. Furthermore, this research does not
follow the assumption supported by works outlined above, where the certain activity
that starts at any location of the project will be finished for a predetermined period
of time while waiting for it to be ended.
28
Chapter 3
________________________________________________
Methods
This chapter summarizes the methods used for building the artificial neural
network implemented for predicting the ROP as well as the IP model for optimizing
the drilling activities.
3.1 Artificial neural networks (ANN)
Artificial neural networks are widely recognized for their inherent and efficient
manner of approaching nonlinear problems. In contrast, logistic regression and
linear regression, are usually found attractive due to their efficiency and reliability
to fit given the data, for both the closed-form solution and the case of using iterative
optimization. However, linear models also suffer due to their restricted power given
by their linear functions, which is often translated in models that are not able to
capture the relationship between any two input features that have nonlinear
interdependence. In order to use linear approaches to model nonlinear functions of
𝑥, one can utilize the linear model on the converted input 𝜑(𝑥), with 𝜑 as the
nonlinear transformation as is presented in Equation 3.1.
𝑦(𝒙, 𝒘) = 𝑓 (∑ 𝑤𝑗𝜙𝑗(𝒙)
𝑀
𝑗=1
) 3.1
For this particular case, 𝑓 stands for a nonlinear activation function 𝑦 = 𝑓(𝑥) when
having a classification task, mapping an input 𝑥 to a category or label 𝑦, and the
identity in the case of regression tasks. The goal in a feedforward neural network is
29
defining a mapping 𝑦 = 𝑓(𝒙; 𝒘) by developing a model that allows making the basis
functions 𝜑𝑗(𝒙) dependent on some adaptive parameters and then learning the value
of those parameters, along with the coefficients {𝑤𝑗}, resulting in the best function
approximation.
A more basic neural network model can be explained as a series of functional
transformations. Initially, 𝑀 linear combinations of the input variables 𝑥1, . . . , 𝑥𝐷 are
assembled.
𝑎𝑗 = ∑ 𝑤𝑗𝑖(1)
𝑥𝑖 + 𝑤𝑗0(1)
𝐷
𝑖=1
3.2
with 𝑗 going from 1 to 𝑀, and (1) indicating the layer in which the related
parameters are. The parameters 𝑤𝑗𝑖(1)
and 𝑤𝑗0(1)
stand for the weights and the biases.
The magnitudes 𝑎𝑗 refer to the activations. Each of the activations is transformed
using a nonlinear activation function ℎ as is shown in Equation 3.3.
𝑧𝑗 = ℎ(𝑎𝑗) 3.3
The values taken by 𝑧𝑗 are the corresponding outputs of the basis functions
presented in Equation 3.1, also known as the hidden units. The differentiable,
nonlinear activation functions ℎ are commonly selected to be sigmoidal functions.
However, recent developments have shown alternative activation functions as ReLu
(short for Rectified Linear Unit) to be more efficient in some cases. Following
Equation 3.1, these values are one more time linearly combined with the output unit
activations in the second layer.
𝑎𝑘 = ∑ 𝑤𝑘𝑗(2)
𝑧𝑗 + 𝑤𝑘0(2)
𝑀
𝑖=1
3.4
30
In Equation 3.4, 𝑀 represents the total number of outputs, going from 1 to 𝑀.
Ultimately, the output unit activations are again transformed, applying a suitable
activation function to output a set of network results 𝑦𝑘.
In the case of standard regression, it is common to choose the identity as the
activation function (𝑦𝑘 = 𝑎𝑘 ). Correspondingly, in the case of binary classification,
the activation function commonly used is the sigmoid function presented in
Equation 3.6.
𝑦𝑘 = 𝜎(𝑎𝑘) 3.5
𝜎(𝑎) =1
1 + exp(−𝑎)3.6
It is possible to merge the previously discussed stages to provide the overall network
function, as in the case of sigmoidal output unit activation functions presented in
Equation 3.7.
𝑦𝑘(𝒙, 𝒘) = 𝜎 (∑ 𝑤𝑘𝑗(2)
ℎ (∑ 𝑤𝑗𝑖(1)
𝑥𝑖 + 𝑤𝑗0(1)
𝐷
𝑖=1
) + 𝑤𝑘0(2)
𝑀
𝑗=1
) 3.7
It is important to note that Equation 3.7 shows a set of all weights and biases
gathered into a vector 𝒘; the bias parameters have a zero as the second subindex
and are discussed below. Subsequently, the neural network model is no more than
a nonlinear function from a set of input features {𝑥𝑖} to a set of output labels {𝑦𝑘}
parametrized by a vector 𝒘 of adaptable parameters. This function can be illustrated
as a network graph, as is shown in Figure 3.1.
31
Figure 3.1 - Network diagram for a two-layer neural network (Bishop & Nabney, 2008)
The bias parameters in Equation 3.2 can be merged into the set of weight
parameters by identifying an additional input feature 𝑥0, with 𝑥0 = 1. Then
Equation 3.2 can be re-expressed, as is shown in Equation 3.8.
𝑎𝑗 = ∑ 𝑤𝑗𝑖(1)
+
𝐷
𝑖=0
𝑥𝑖 3.8
Likewise, it is possible to merge the biases corresponding to the second layer so that
the network function can be expressed as:
𝑦𝑘(𝒙, 𝒘) = 𝜎 (∑ 𝑤𝑘𝑗(2)
ℎ (∑ 𝑤𝑗𝑖(1)
𝑥𝑖
𝐷
𝑖=0
)
𝑀
𝑗=0
) 3.9
Note that Equation 3.9 is equivalent to Equation 3.7 but presented in a more concise
manner. Moreover, Figure 3.1 presents a standard configuration of a two-layer
neural network where the bias contributions are represented as 𝑥0 and 𝑧0, for the
first and second layers, respectively.
32
3.1.1 Relation between least-squares regression and
neural net training
In finding the best value of parameters for the network, a comparison can be made
with the polynomial curve fitting approach, in which the sum of the squares error
function is minimized. For a chosen training dataset containing several feature
vectors {𝒙𝒏}, for 𝑛 = 1, . . . , 𝑁, and the corresponding set of label vectors {𝒕𝒏}, the error
function can be minimized as follows:
𝐸(𝒘) =1
2∑‖𝑦(𝑥𝑛, 𝑤) − 𝑡𝑛‖2
𝑁
𝑛=1
3.10
In the particular case of regression applications, a single target variable 𝑡 can be
considered to be able to have any real value. It is important to note that 𝑡 can be
assumed to have a Gaussian distribution with 𝑦(𝒙, 𝒘) as the related mean and 𝛽 as
the inverse variance or also known as the precision of the noise. The distribution of
𝑡 is presented in Equation 3.11.
𝑝(𝑡|𝒙, 𝒘) = 𝑁 (𝑡|𝑦(𝒙, 𝒘), 𝛽−1) 3.11
For the conditional distribution shown in Equation 3.11, it is enough to consider the
resulting output unit function as the identity since the neural network under
consideration is able to map the approximation of any continuous function from 𝒙 to
𝑦. Moreover, given a dataset of 𝑁 independent, identically distributed (i.i.d.)
features 𝑿 = {𝒙𝟏, . . . , 𝒙𝑵}, and the corresponding labels 𝒕 = {𝒕𝟏, . . . , 𝒕𝑵}, it is possible
to build the subsequent likelihood function as follows:
𝑝(𝒕|𝑿, 𝒘, 𝛽) = ∏ 𝑝(𝑡𝑛|𝒙𝒏, 𝒘, 𝛽)
𝑁
𝑛=1
33
Then, by computing the negative logarithm, the error function can be obtained, such
as:
𝛽
2∑{𝑦(𝒙𝒏, 𝒘) − 𝑡𝑛}2 −
𝑁
2𝑙𝑛
𝑁
𝑛=1
𝛽 +𝑁
2ln(2𝜋) 3.12
From Equation 3.12, it is now possible to learn the adaptable parameters 𝒘 and 𝛽.
Indeed, considering that 𝛽 and 𝒘 are constants, the minimization of Equation 3.12
is equivalent to the minimization of 𝐸(𝑤) that was described for polynomial fitting.
Thus, if one considers first the determination of 𝒘, increasing the likelihood will
decrease the sum of squares error function as is shown in Equation 3.12.
𝐸(𝑤) =1
2∑{𝑦(𝒙𝒏, 𝒘) − 𝑡𝑛}2
𝑁
𝑛=1
3.13
The resulting 𝒘 computed after decreasing the error 𝐸(𝒘) is then represented by
𝒘𝑴𝒊𝒏, since it is equivalent to the solution coming from the max likelihood. In fact,
the nonlinearity implicated in the network function 𝑦(𝒙𝒏, 𝒘) produces the
nonconvexity of the error 𝐸(𝒘) , meaning that there can be local maxima of the
likelihood, equivalent to the local minima of 𝐸(𝒘). After computing 𝒘𝑴𝒊𝒏, 𝛽 is
calculated by minimizing the negative log-likelihood, as is shown in Equation 3.13.
1
𝛽𝑀𝑖𝑛
=1
𝑁∑{𝑦(𝒙𝒏, 𝒘𝑴𝒊𝒏) − 𝑡𝑛}2
𝑁
𝑛=1
3.14
In applications where several labels are implicated, it is possible to assume that the
labels or target variables are independent and conditional on 𝒙 and 𝒘 with the same
𝛽. The cumulative distribution function (CDF) of the labels can be computed as in
Equation 3.15.
34
𝑝(𝑡|𝒙, 𝒘) = 𝑁 (𝑡|𝑦(𝒙, 𝒘), 𝛽−1𝐈) 3.15
Subsequently, for 𝐾 number of labels, the weights corresponding to the max
likelihood can be computed by minimizing the sum of the squares error function,
and 𝛽 can be found as follows:
1
𝛽𝑀𝑖𝑛
=1
𝑁𝐾∑ ||𝑦(𝒙𝒏, 𝒘𝑴𝒊𝒏) − 𝑡𝑛||2
𝑁
𝑛=1
As presented in Bishop & Nabney (2008), Bishop (2006), and Goodfellow et al.
(2016), the negative log-likelihood produces a combination of the error function
being used and the last activation function. In the case of classical regressions, one
might consider the network providing an identity activation function where 𝑦𝑘 = 𝑎𝑘
implying that:
𝜕𝐸
𝜕𝑎𝑘
= 𝑦𝑘 − 𝑡𝑘 3.16
The training of a neural net involves the minimization of the error function 𝐸, which,
when using the identity activation functions, corresponds exactly to least-square
regression. From this perspective, neural nets are a generalization of classical
regression.
3.1.2 Adjustment of parameters
During the training, the goal is indeed to find a weight vector 𝒘 that minimizes the
error function 𝐸(𝒘), also called the loss function, which is depicted in Figure 3.2. It
is important to mention that minor moves in the weight space from 𝒘 to 𝒘 + 𝛿𝒘
produce a variation in the error function such that 𝛿𝐸 ≈ 𝛿𝒘𝑻𝛻𝐸(𝒘), with 𝛻𝐸(𝒘) as
the gradient vector which points toward the direction of highest rate of increase of
the error function. The error 𝐸(𝒘) can be seen as a soft continuous function of the
35
weights, which is bounded below. Therefore, the lowest magnitude of the error
occurs at a location in the weight space where 𝛻𝐸(𝒘) = 0, at one of the critical points.
Figure 3.2 - Geometrical view of the error function E(w) as a surface sitting over a
weight space (Bishop & Nabney, 2008)
The aim of the parameter optimization during the training is to find the weights for
which the error is indeed the lowest. Nevertheless, there is typically a nonlinear
dependence between the error and both the weights and bias parameters, meaning
that there can be several critical points in the weight space where the variation of
the error is locally minima (Bishop, 2006; Goodfellow et al., 2016; Murphy, 2013).
Since finding an analytical solution for the gradient of the error is not trivial, several
iterative numerical procedures can be implemented. Most techniques for the
optimization of continuous nonlinear functions involve the initialization of the
weight vector with zero values and consequently stepping through weight space as
is presented in Equation 3.17, with 𝜏 as the current step in the iteration procedure.
𝒘(𝜏+1) = 𝒘(𝜏) + ∆𝒘(𝜏) 3.17
36
3.1.3 Gradient descent optimization
During the gradient descent optimization, the information gathered at each step of
the gradient is used to update the upcoming weights in the direction of the negative
gradient, as is presented in Equation 3.13.
𝒘(𝜏+1) = 𝒘(𝜏) − 𝜂𝛻𝐸(𝒘(𝜏)) 3.18
where 𝜂 stands for the learning rate parameter with a positive value. After each
update, the gradient is re-evaluated for the new weight vector, and the process is
repeated. It is important to point out that at each 𝜏, the entire set is fitted and
evaluated (Bishop, 2006).
Furthermore, several runs of the gradient descent with different random 𝒘 values
must be executed in order to find the lowest local minimum, and ideally the true
global minimum. The results from the gradient descent runs must be validated with
a separate chosen validation dataset; this dataset must be distinct from the training
dataset to give a fair (unbiased) evaluation of the fittings (Bishop, 2006).
Le Cun & Boser (1989) introduced an online version of the gradient descent, also
known as the stochastic gradient descent (SGD), that has been widely used,
especially for large datasets. In this case, for each data point, error functions based
on maximum likelihood having the following general form:
𝐸(𝑤) = ∑ 𝐸𝑛(𝑤)
𝑁
𝑛=1
in which 𝐸𝑛 is the error contribution associated with the 𝑛𝑡ℎ term. Stochastic
gradient descent performs the weight updates based on one data point at a time, as
is presented in Equation 3.19.
37
𝒘(𝜏+1) = 𝒘(𝜏) + 𝜂𝛻𝐸𝑛(𝒘(𝜏)) 3.19
which facilitates the updating of 𝐸(𝑤), as
𝐸(𝒘(𝜏+1)) = 𝐸(𝒘(𝜏)) − 𝐸𝑛(𝒘(𝜏)) + 𝐸𝑛(𝒘(𝜏+1)) 3.20
This SGD weight update is reiterated by cycling through the data and choosing
random points with replacement (Bishop & Nabney, 2008). Nowadays, SGD powers
most of the current deep learning models. Indeed, a frequent challenge for current
machine learning implementations is the size of the datasets, for which SGD and its
variations are especially effective.
3.1.4 Error back-propagation
Back-propagation is an effective technique for evaluating the gradient of an error
function 𝐸(𝒘) for a neural network. The effectiveness of this technique is based on
the so-called ‘local message-passing scheme’, where the data travels consecutively
forwards and backward through the network (Rumelhart & McClelland, 1986).
The back-propagation algorithm will be explained in consideration of the least-
square error function presented in Equation 3.12 for the case of a simple linear
model:
𝑦𝑘 = ∑ 𝑤𝑘𝑖𝑥𝑖
𝑖
3.21
resulting in an error function 𝐸𝑛(𝒘) with an input pattern 𝑛, such that:
𝐸𝑛 =1
2∑(𝑦𝑛𝑘 − 𝑡𝑛𝑘)2
𝑘
3.22
38
with 𝑦𝑛𝑘 taking the place of 𝑦𝑛𝑘 (𝒙𝒏, 𝒘). The gradient of this particular error
function 𝐸𝑛(𝒘) with respect to a weight 𝑤𝑖𝑗 can be observed in Equation 3.23.
𝜕𝐸𝑛
𝜕𝑤𝑖𝑗
= (𝑦𝑛𝑘 − 𝑡𝑛𝑘)𝑥𝑛𝑖 3.23
A more general formulation considers so-called feedforward neural nets, in which
the error function depends on the weight 𝑤𝑖𝑗 only via the summed input 𝑎𝑗 entering
a unit 𝑗. This favors a particular application of the chain rule for the partial
derivatives
𝜕𝐸𝑛
𝜕𝑤𝑖𝑗
=𝜕𝐸𝑛
𝜕𝑎𝑗
𝜕𝑎𝑗
𝜕𝑤𝑖𝑗
3.24
𝛿𝑗 =𝜕𝐸𝑛
𝜕𝑎𝑗
3.25
in which 𝛿𝑗 is the unit error 𝑗, or simply unit 𝑗. One can show that in a feedforward
network, a weighted sum of the inputs is calculated for all the units individually,
such that
𝑎𝑗 = ∑ 𝑤𝑗𝑖𝑧𝑖
𝑖
3.26
Then, by replacing Equation 3.21 in Equation 3.19
𝜕𝐸𝑛
𝜕𝑤𝑖𝑗
= 𝛿𝑗𝑧𝑖 3.27
Thus 𝛿𝑗 can be interpreted as the unit resulting from the weights 𝑤𝑖𝑗 when the
transformed input 𝑧𝑖 is equal to 1.
39
Figure 3.3 - Forward and backward propagation of error information (Bishop &
Nabney, 2008)
From Equation 3.27, one can observe that the multiplication between the errors 𝛿𝑗
and the values of 𝑧𝑖 provides the value of the derivative without computing it
directly. Furthermore, instead of computing the derivatives, one can calculate the
value of 𝛿𝑗 for each hidden and output unit as follows:
𝛿𝑗 ≡ 𝜕𝐸𝑛
𝜕𝑎𝑗
= ∑𝜕𝐸𝑛
𝜕𝑎𝑘
𝜕𝑎𝑘
𝜕𝑎𝑗𝑘
3.28
Then, by substituting Equations 3.3, 3.25 and 3.26,
𝛿𝑗 = ℎ′(𝑎𝑗) ∑ 𝑤𝑘𝑗𝛿𝑘
𝑘
3.29
Finally, Equation 3.29 and Figure 3.3 show that the value of the errors 𝛿 for any
hidden unit can be computed by propagating the errors backward from the following
units in the net (The function ℎ′(𝑎𝑗) emerges from the summation as a common
factor within the 𝜕𝑎𝑘
𝜕𝑎𝑗 contributions; it is written with the apostrophe, with the
intention that ℎ describes a common function, such that 𝑎𝑘 = ℎ(𝑎𝑗)).
40
3.2 Integer linear programming
Mathematical programming (MP) is the organized listing of (i.e., the programming
of) variables, sets, constraints, objective functions (to be minimized or maximized),
and potentially other mathematical constructs. The following paragraphs outline
basic concepts of mathematical programming techniques such as linear
programming (LP) and integer programming (IP).
In scenarios with linear objective function and constraints, linear programming (LP)
is a widely used technique to state and ultimately solve optimization problems.
“Linear” stands for the required linearity of the constraints and objective function
that the mathematical models must have when representing a problem. This linear
nature of LP is often satisfied in classical operation research contexts such as in the
so-called task assignment problems where the user needs to allocate limited
resources among activities of interest in an optimal way (Hillier & Lieberman,
2016).
In the general LP scenarios where the objective function and all constraints are
linear, the feasible domain is often an enclosed area bordered by linear hyperplanes,
and the optimum value is located in one of the intersections between lines in the
feasible domain. However, LP methods are not only used for solving linear
optimization problems but also for recursively solving nonlinear applications with
nonlinear objective function and/or constraints by reconstructing the model in a
proper way (Hillier & Lieberman, 2016).
Unless otherwise specified, the variables within an LP are presumed to have
continuous domains. However, in reality, resources such as operators, drilling
machines, or openings only make sense if they are expressed as integer values.
Therefore, problems that require decision variables to take integer values should be
formulated as integer programming (IP) problems. Moreover, integer programs in
which all of the objective functions and all of the constraints are linear, are often
called integer linear programs (ILP), although it is common to use IP and ILP
41
interchangeably. The only difference between LP and ILP problems is that the latter
include an additional restriction that the decision variables must take integer
values. In addition, models dealing with both integer and non-integer values for the
decision variables are known as mixed integer programming (MIP) models; the
terminology mixed integer linear program (MILP) is used exclusively for problems
having a linear objective function and constraints (Hillier & Lieberman, 2016).
3.2.1 Solution approaches
3.2.1.1 Simplex method
The simplex method is the most popular techniques for solving linear programming
problems. The method is based on an iterative search process that shifts through
the collection of extreme points (i.e., vertices) within the feasible domain, one by one,
up until reaching the optimal value or determining that the problem is unbounded
or infeasible. A test is performed at each iteration, to determine if a neighboring
vertex can offer an improvement in the objective function, or if there is an
unbounded direction in which the improvement would be unbounded; if there are no
directions that offer improvement, then the current vertex corresponds to an optimal
solution (Hillier & Lieberman, 2016).
3.2.1.2 Branch-and-bound algorithm
The approach provided by the branch-and-bound algorithm is commonly applied for
discrete optimization problems as an alternative to combinatorial enumeration, and
especially ILP and MILP. This algorithm works under the so-called ‘divide and
conquer’ concept, where the initial (large) problem is divided into smaller problems.
Subsequently, a selection of the small problems is subdivided and/or solved. Each of
these subproblems are a branch along which a search for the optimal integer
solution can be located. Indeed, this procedure of dividing and subdividing the initial
problem is also known as the branching step of the algorithm. Furthermore, the
selection or fathoming step involves bounding how good the best solution along a
branch of a smaller problem can be, and subsequently abandoning the branches
42
along which there is no possibility of finding an optimal integer solution for the
original problem.
3.2.1.1 The cutting-plane method
A cut or cutting plane for an integer programming problem is a new functional
constraint that diminishes the feasible region for the LP relaxation without
removing feasible solutions for the IP problem. This method works under the
philosophy that any IP program can have several LP formulations, which translates
into different sets of linear inequalities illustrating the original set of points (Hillier
& Lieberman, 2016).
3.2.1.2 The branch-and-cut approach
The branch-and-cut approach is generally an augmentation of the branch-and-
bound algorithm and is commonly used for ILP and MILP. More specifically, the
bounding stage is enhanced by introducing additional constraints, known as cutting
planes. In each of the subproblems, these cutting planes reduce the feasible domain
of solutions for the continuous LP relaxation but are formulated so as not to exclude
integer solutions (i.e., solutions that are genuine candidates for the original
problem). In practice these additional constraints can eliminate a considerable
number of branches, in addition to those that are eliminated through standard
bounding. In fact, state of the art MILP solvers employs branch-and-cut approaches.
43
Chapter 4
____________________________________________
Case Study – Underground Gold
Project
In this section, the proposed methods are applied to historical data from a
mechanized underground gold mine to demonstrate the potential effectiveness
artificial neural networks in the prediction of ROP. After training a neural net model
with the historical data, its predictions are used to reparametrize an ILP thereby
observing its impact on operational decision-making. The outline of the
underground operation and the details of the dataset and input parameters are
presented in the following subsections. Some labels corresponding to operational
parameters have been modified for confidentiality purposes.
4.1 Datasets
Operational data gathered from drilling and support activities in an underground
mechanized gold mine was used for training the artificial neural network (ANN).
The training data is composed of 10000 instances and 11 independent variables. The
dependent variable to be used as the target for the prediction task is the ROP. Table
4.1 presents the model features and target used in the predictive model.
Table 4.1 - Model features and target
Team Geology Shift Equipment Rock
Type
Opening
Name Activity Drillings
Length
of
Drillings
Operator Meters ROP
(m/h)
44
Furthermore, a monthly plan developed by the mine planners was provided in order
to obtain the ROP predictions with the ANN approach. The drilling plan from a
specific part of the mine was extracted. More information on the datasets used for
both training and testing can be found in the following sections. Figure 4.1 presents
the proposed steps to be developed during this case study.
Figure44.1 - Flowchart of the proposed approach
4.2 Underground operation outline
This mining project uses underground jumbo boomer drills to perform the drilling
activities in both the development and production openings of the project.
Conventional scoop-trams and dumpers are utilized in order to load and transport
the material from the underground mining operation to the processing plant.
However, the only activities within the operation that are relevant for this study are
those where jumbo boomer drills are directly involved. Furthermore, openings to be
developed are planned to be drilled and supported every shift in order to guarantee
the continuity of the mining production. Figure 4.2 shows the boomer 282, a
45
hydraulic controlled mining face drilling rig used within the mining project under
consideration.
Figure54.2 – Boomer 282 (Epiroc, 2020)
4.2.1 Mine layout
The underground project begins with the opening of a single portal entrance,
followed by the development of the main ramp that goes through weathered
saprolite and schist. As the depth of the ore body is reached, and more acceptable
rock is encountered, the main ramp is partitioned into secondary development
ramps. Ventilation drifts, shafts, and ventilation raises are also constructed to
ensure a proper ventilation system within the excavation. In addition, permanent
galleries such as muck-bays, workshops, drill stations, and safety bays are dispersed
along the openings. Once the development areas for secondary development become
available, haulage drifts and attack ramps are excavated. Attack ramps are
designed to access the stopes, and many of them link to the sublevel development.
As the attack ramps are opened, the stopes are mined in accordance with the mining
46
method. Figure 4.3 shows the underground development design, as was previously
discussed.
Figure64.3 – Mine layout (Red Eagle Mining, 2014)
4.2.2 Mining method
The mining method used in the underground project is the Mechanized Shrinkage
with Delayed Fill (MSDF). This method is identical to the ‘mechanized cut and fill’
with an additional breast blasting of the back between ore accesses. Furthermore,
instead of mucking the ore and backfilling instantly, the ore is left in place in the
stope, only cleaning enough material from the stope to remove swell. For every ore
access or lift, the attack ramp must be drilled and blasted in order to open new
access for the next lift. An important part of the MSDF is that even though cleaning
is not mandatory, support plays a key role during the exploitation of the stopes for
each lift.
47
Figure74.4 - MSDF: first and second lifts (Red Eagle Mining, 2014)
Figure84.5 – Subsequent lifts and mucking (Red Eagle Mining, 2014)
After all the lifts have been developed, then the ore is mocked out entirely, followed
by the bottom-up backfilling of the stopes and finally the backflling of the attack
ramp.
48
4.2.3 Drilling operation
Operations are carried out over two twelve-hour shifts in which the different
activities programmed in the openings must be achieved to accomplish the goals
proposed for the shift, and consequently for the day and week. The drilling activities
are performed by trained and experienced operators. Each operator has one
assistant that helps during the entire shift. Available jumbos must perform drilling
activities in order to accomplish blasting or support in the openings. Not finishing
an activity affects not only what was proposed for the current shift but also all the
upcoming activities that were planned in the following shifts. In fact, failing to drill
an opening that is programmed to be blasted means leaving the opening inactive for
the entire following shift since the blastings only occur after finishing every shift.
Consequently, blasting more than any other factor is what limits the progress of
development; thus, the time that an operator and its jumbo need to complete a
drilling task during a shift plays a crucial role when trying to achieve development
targets. Moreover, Campeau & Gamache (2019) pointed out that assuming that
tasks that are started at any of the openings of a project will be completed in a fixed
time only works when looking at these activities from a long-term point of view. In
addition, this assumption does not hold for scenarios similar to the one presented in
this report, where the duration of the activities can have significant variations
creating disparities between the planned start and end of the tasks themselves.
4.3 Prediction of rates of penetration (ROP)
With the ultimate goal of enhancing the decision-making tool, predictions of ROP
values were obtained. In this case, the predictions were achieved after training an
artificial neural network (ANN) with historical data from the mine site. In
particular, data related to the drilling operations were gathered, preprocessed, and
used for training the AI-model. The following subsections outline the preprocessing
of the data as well as the different architectures of neural networks that were
implemented and tested to determine the optimal arrangement which obtains the
lowest error in the validation set.
49
4.3.1 Data preprocessing
The dataset with the information gathered from the mine site was first read in a
pandas Data-Frame in order to compute basic statistics and comprehend the
subsequent inputs of the model. Figures 4.6 and 4.7 show portions of the correlation
matrix between features and labels using the Pearson method.
Figure94.6 - Correlation matrix, including operators and performance measures
Figure104.7 - Correlation matrix, including shift, team, rock types and performance
measures
Figures 4.6 and 4.7 show an important correlation between some jumbo operators
and their performance in terms of penetration rates and drilled holes done per
activity. Moreover, the correlation matrix reveals a logical and strong correlation
between the number of drillholes, rock types and geology classes, establishing how
important it is to have a proper classification of the geology, and geomechanics of
the rock for developing a production schedule.
50
The inputs, weights, and gradients were all implemented as TensorFlow tensors to
take advantage of the Graphics Processing Units (GPUs) available through Google
Colab for computational efficiency. In addition, the batches for SGD were selected
using a TensorFlow DataPreprocessing, and the predict function took as input the
dataset under study. The dataset with the information gathered from the mine was
subsequently transfered to Dataloaders where the iteration over batches of input
data was done automatically, being memory and speed efficient. Even when
calculating the training error on all 10000 examples in the training set, the runs
never took more than 6 minutes.
4.3.1.1 Batch-normalization
Batch normalization is widely used for optimizing artificial neural networks. Batch
normalization is a method of adaptive reparametrization applied to architectures
comprising an arrangement of multiple activation functions or layers. As it was
explained before, the gradient informs how to update each adaptable parameter by
assuming that the remaining layers do not vary. Ultimately, all of the layers are
updated simultaneously, and unexpected outputs can be detected as several
activation functions are altered, while the other functions stay constant.
For this implementation, batch normalization was applied to the batch of input
features described in the matrix 𝐻 as follows:
𝐻𝑖𝑗′ =
𝐻𝑖𝑗 − 𝜇𝑗
𝜎𝑗 4.1
in which the individual elements of 𝐻 are described by 𝐻𝑖𝑗, and 𝜇𝑗 and 𝜎𝑗 correspond
to the column means and standard deviations, respectively. The resulting elements
𝐻𝑖𝑗′ are assembled into the batch normalized matrix 𝐻′.
51
Figure114.8 - Left: ROP histogram. Right: normalized ROP histogram
4.3.1.2 One-Hot Encoding
One-hot encoding was used to handle categorical features in this implementation to
avoid higher weights being specified for higher numbers produced in methods, which
is a common problem for label encoding. The one-hot representation can be captured
by a binary vector with 𝑛 bits that are mutually exclusive (only one is allowed to be
active). In this type of encoding, the representations contain many entries but
without significant meaningful separate control over each entry (Goodfellow et al.,
2016; Nair & Hinton, 2009).
The one-hot code vectors can be defined with the variable 𝑐, where 𝑐𝑦 = 1 and 𝑐𝑖 =
0 for all other values of 𝑖. The one-hot code involves some statistical benefits when
treating all instances within a similar cluster. Moreover, one-hot encoding produces
a computational improvement when an entire representation may be captured by a
single integer. During the implementation, binary columns were created for each
category, deriving categories based on the unique values in each feature.
52
Figure124.9 - Left: sample from training data. Right: sample from training data after
one-hot encoding of the team attribute
4.3.2 Testing different architectures of neural networks
4.3.2.1 Neural network with one hidden layer
First, as a baseline, a feedforward neural network class and its methods were
written. The first feedforward neural network implemented has a single hidden
layer and an adjustable number of hidden units. The inputs correspond to the 64
features resulting after normalizing and one-hot encoding the dataset. Parameters
used in the model are defined in Figure 4.10, with the dimensions defined in Table
4.2. The final layer of the feedforward neural net presented in Figure 4.10 includes
one neuron in charge of returning a continuous numerical value. In the following
section, it will be possible to understand the accuracy of the prediction by comparing
it with the true value, corresponding to the penetration rate, which is indeed
continuous.
53
Figure134.10 – Architecture and variables
The ReLu activation on the hidden layer was chosen because of its well-behaved
gradient and proven robustness (Shalev-Shwartz & Ben-David, 2014). More
traditional choices of activation function such as the sigmoid have gradients that
saturate at large activations, which impedes a gradient descent based optimization
algorithm from working efficiently. In the following sections, the behavior of the
different activation functions will be observed, as well as the different
hyperparameters involved in the model.
Table24.2 - Dimension of parameters for the ANN. Individual parameters are
referenced by the dimension letter in lowercase as a subscript
Input, x D features + bias 64
Input weights, W M × (D+1) M × 64
Hidden units, z M units + bias M +1
Hidden weights, V C × (M+1) 1
Output, �̂� Continuous Output, C 1
54
4.3.2.2 Back-propagation
By adapting the results of Section 3.1.4, the gradient of the loss 𝐸 with respect to
the parameter matrix 𝑉 is given by:
𝜕𝐸
𝜕𝑉𝐶×𝑀
=𝜕𝐸
𝜕�̂�𝐶
𝜕�̂�𝐶
𝜕𝑢𝐶
𝜕𝑢𝐶
𝑉𝐶×𝑀
4.2
The individual partial derivatives can be calculated by first evaluating the values of
each unit in the network at the current epoch in a "forward pass", then substituting
these values into the analytic forms for the partial derivatives. The analytic form of
the partial derivatives depends on the activation functions chosen at each layer and
the loss. With the linear activation on the output layer, the partial derivative of the
loss function 𝐸 with respect to 𝑉𝐶×𝑀 can be computed as follows:
𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 {�̂� = 𝑔(𝑢) = 𝑉𝑧
𝐸(𝑦, �̂�) =1
2‖𝑦 − 𝑉𝑧‖2
2
Then, by substituting and calculating the derivative:
𝐸(𝑦, 𝑧) =1
2‖𝑦 − 𝑉𝑧‖2
2 4.3
𝜕𝐸
𝜕𝑉𝐶×𝑀
= (𝑦�̂� − 𝑦𝐶)𝑧𝑀 4.4
For numerical stability, the maximum value of the linear input 𝑢 was subtracted
from all 𝑢𝐶, which does not change the output but prevents overflows. The partial of
the loss with respect to 𝑊 can be derived similarly. Equation 4.6 presents the partial
derivative of the ReLu activation function.
55
𝜕𝐸
𝜕𝑊𝐶×𝑀
=𝜕𝐸
𝜕�̂�𝐶
𝜕�̂�𝐶
𝜕𝑢𝐶
𝜕𝑢𝐶
𝜕𝑧𝑀
𝜕𝑧𝑀
𝜕𝑞𝑀
𝜕𝑞𝑀
𝜕𝑊𝐶×𝑀
4.5
Already computed in Equation 4.4
Note that:
𝑅𝑒𝐿𝑢 {0, 𝑞𝑀 ≤ 01, 𝑞𝑀 > 0
Then,
𝜕𝐸
𝜕𝑊𝐶×𝑀
= ∑(𝑦�̂� − 𝑦𝐶)𝑊𝐶×𝑀
𝑐
𝜕𝑅𝑒𝐿𝑢(𝑞𝑀)
𝜕𝑞𝑀
𝑥𝐷 4.6
Where the derivative of the ReLu activation is zero for inputs less than or equal to
zero and one for inputs greater than zero. Note that the partial derivatives of the
loss with respect to the parameters 𝑉 are part of this equation. This aspect makes
back-propagation efficient: the required values for each step of the calculation are
pre-stored.
4.3.2.3 Performance and adjustment of
hyperparameters
The model originally was predicting only one or zeros outputs for every input, which
was solved by implementing Xavier initialization (Glorot & Bengio, 2010) on the
initial weights. They were initialized from a normal distribution with mean zero and
standard deviation of 1/√𝐷 + 1. This roughly conserved the variance of the inputs
as they propagated through the layers for better performance and helped prevent
numerical issues.
A stochastic gradient descent (SGD) algorithm was written as a baseline to perform
the weight adjustments. Within each epoch, the following updates are performed:
𝑊 ← 𝑊 − 𝜏 ∙ 𝑑𝑊 4.7
56
𝑉 ← 𝑉 − 𝜏 ∙ 𝑑𝑉 4.8
where 𝜏 is the learning rate and 𝑑𝑊 and 𝑑𝑉 are the gradients of the chosen cost
function calculated by the back-propagation algorithm. Since the task is a
regression, the mean square error (MSE) was used as the loss function, averaged
over a batch of 100 sampled training instances. The weights were adjusted until the
error on the validation set did not change by more than 0.01 for three iterations or
until the maximum number of iterations was reached. The early stopping helped to
prevent overfitting and wasteful use of computation time. The base-line optimizer
was compared to four different and widely used algorithms that employ the SGD
with variants such as adaptive learning rates and momentum. AdaGrad or adaptive
gradient is an optimizer that adapts the learning rates for each of the parameters
aiming to be updated. Learning rates are controlled in AdaGrad by dividing the
parameters by the square root of the summation corresponding to their past squared
values. Consequently, learning rates corresponding to parameters with high
gradients are usually rapidly decreased while learning rates corresponding to
parameters with low gradients receive lower decay (Duchi et al., 2011). AdaDelta is
an extension of AdaGrad, where decay is applied over the learning rate without any
requirement of manual adjusting by the user. In AdaDelta, only the first-order
information is employed to dynamically adjust the learning rate (Zeiler, 2012).
Adam or adaptive moments optimizer employs the principle of the root mean square
propagation (Tieleman et al., 2012) with an addition of momentum. In Adam, both
exponential decaying expectation over the historical squared and linear gradients
are employed (Kingma et al., 2014). Finally, the Nadam optimizer works under the
same principle as Adam, with the only difference of incorporating the Nesterov
Momentum (Dozat, 2016).
The performance of the five optimizers was assessed during the training of three
different neural network architectures, and the progression of the loss is presented
in Figures 4.11, 4.12, and 4.13.
57
Figure144.11 - Comparison of training loss evolution with different optimizers for a
one hidden layer model
Figure154.12 - Comparison of training loss evolution with different optimizers for a
two hidden layers model.
Figure164.13 - Comparison of training loss evolution with different optimizers for a
three hidden layers model
58
As outlined above, five different optimizers were compared: SGD, AdaGrad,
AdaDelta, Adam, and Nadam. For each optimizer, three networks were trained (one,
two, or three hidden layers) ten times during 50 epochs with the following
combination of parameters. Overall, one can note that the best training performance
regarding loss values and convergency is achieved by Adam.
Moreover, after selecting Adam as the optimizer for the model, the training
performance for different batch sizes (16, 32, 64, and 128) was evaluated, and the
results are presented in Figures 4.14 and 4.15.
Figure174.14 - Comparison of training loss evolution with different batch sizes for a
two hidden layers model
Figure184.15 - Comparison of training loss evolution with different batch sizes for a
three hidden layers model
59
Larger batch sizes yielded lower errors and faster convergence, potentially because
they were using more information by studying more examples. However, the larger
the batch size, the longer the optimization algorithm would take, as well as an
increase in the GPU memory utilization. As a consequence, a batch size of 64 was
chosen for all subsequent experiments. The following parameters were tuned by
implementing grid search optimization:
Table 4.3 - Combinations during hyperparameter optimization
Parameter Options Best Performance
Activation
Function ReLu, Sigmoid and Tanh ReLu
Number of
Units
100, 200, 500, 1000, 1500, 2000,
3000 and 5000 1500
Optimizer Adam, AdaGrad, AdaDelta,
SGD, Nadam Adam
Batch
Normalization True or False True
Batch Size 16, 32, 64 and 128 64
The validation loss decreased with more hidden units (a more expressive model).
The danger with expressivity is that the model will overfit; however, implementing
early stopping helped to prevent this eventuality. Moreover, it is common wisdom
in machine learning that increasing the depth is more beneficial than increasing the
width, so one can be wary of creating more hidden units than input features
(Goodfellow et al., 2016). However, Figure 13 shows that augmenting the depth of
the model produces more hyperparameters to be optimized, and therefore, a higher
computational power is required. Additionally, it was shown that implementing
batch normalization not only for the initial batches but also for all the hidden layers
produced better loss values, increasing the learning speed and the stability of the
model. Table 4.4 presents the performance of four different architectures, whereas
60
the model with three hidden layers provides better results in terms of both training
and validation.
Figure 4.16 shows how the training and validation losses are close; then, there is no
evidence of overfitting. With the tuned hyperparameters, the evolution of the
training and validation losses is shown in Figure 4.16. Note that the training loss is
over the entire 10000 training instances.
Table 4.4 - Comparison with different neural network architectures
Figure194.16 - Training and validation performance
Hidden
layers val_mse val_mape mse Mape Activation
Batch
Size
Last
Activation Loss Optimizer
1 0.000889 198.45 0.000166 1266.38 relu 32 linear mse Adam
2 0.000917 178.26 0.000174 19198.97 relu 64 linear mse Adam
3 0.000141 228.68 0.000125 12184.30 relu 64 linear mse Adam
4 0.000152 209.53 0.000134 14922.94 relu 64 linear mse Adam
61
4.3.3 Performance of the final architecture on the testing
set
The optimal artificial neural network model obtained in the previous subsection was
tested with a monthly drilling plan provided by the mine planners of the project
with the drilling tasks distributed between shifts. The drilling plan used for testing
the model includes information such as the geometry of the opening, the activity to
be performed whether it is drilling for blasting (i.e., for advancing the opening) or
only for support, the operator performing the task, the equipment used for the
activity, the corresponding geology and rock type of the rock to be drilled, and the
shift whether it is the day or the night shift. This information is illustrated in Table
4.5.
Table 4.5- Drilling plan example for one day of operations
Team Geology Shift Equipment Rock
Type
Opening
Name Activity Drillings
Length
of
Drillings
Operator Meters ROP
(m/h)
C 2 DAY A 2 AR25 ADVANCE 54 3.6 OP_1 3.3 0.55
C 2 DAY B 1 SR09 ADVANCE 75 4.2 OP_2 6.6 0.38
B 2 DAY C 1 SR10 ADVANCE 75 4.2 OP_3 6.6 0.38
B 4 DAY A 2 AR12 ADVANCE 54 3.6 OP_4 3.3 0.55
A 4 NIGHT A 3 AR05 ADVANCE 54 3.6 OP_5 3.3 0.55
A 2 NIGHT B 2 AR25 SUPPORT 54 3.6 OP_7 3.3 0.55
A 2 NIGHT C 3 AR23 ADVANCE 54 3.6 OP_9 3.3 0.55
A 2 NIGHT C 1 AR13 ADVANCE 54 3.6 OP_8 3.3 0.55
Figure204.17 - Features and label for the ANN model
62
Figure 4.17 also clarifies what are the independent variables to be used as input
features within the model and the dependent variable to be used as the target for
the prediction task. It is important to mention that the feature denominated
‘opening name’ indicates the geometry of the opening, where for instance, SR and
MB stand for secondary ramps and muck bays with dimensions of 5m x 5m. The
rock type corresponds to the rock mass rating (RMR) from one to four, where one
stands for very good rock, while four corresponds to very poor rocks. Team stands
for the couple performing the drilling task, referring to the operator using the
drilling machine and the assistant. As it is presented in the training set, each of the
examples is linked to the length of the drillings made during the drilling task as
well as the number of drillings, which is directly related to the dimension of the
openings. Moreover, the meters planned per shift gives a reference for what the
planers consider as ROP values for each of the tasks. The ROP values in this initial
drilling plan are no more than constant values since the meters planned for the
secondary and attack ramps are always constant as well. These values are used by
the mine planners as references and are used for every single operator, machine,
type of rock, etc. Finally, the drilling plan provides the number of meters planned
for each specific opening for each of the shifts.
Consequently, with the ANN model already trained, part of the monthly drilling
plan. The loss value found in the test set was equivalent to 0.000169. After
predicting the ROP from the neural net, predicted meters were computed from each
of the openings scheduled in the monthly plan. By comparing both ROP values, it is
notable that the predicted ROP values diverge from the ones scheduled initially in
the project. It is important to note that the predictions are calculated based on
historical data collected in the underground mine, and the predicted ROP values are
adjusted to the reality of the daily operations. Different than predictions obtained
with traditional regression methods, these ANN predictions are able to model the
nonlinear relationships presented in the training data.
63
Moreover, once the predictions for all the available combinations are obtained, the
decision-maker is able to start assessing the correct combination of operational
variables that will maximize the targets within the drilling plan. Additionally,
different decisions can be taken based on the predictions by focusing on how to re-
organize the distribution of operational variables to achieve the additional meters
that the model is predicting. On the other hand, it is also important to focus on the
openings that the model is predicting higher ROP values compared to the
conventional plan since these areas can be experiencing an overestimation of the
assigned operational resources, which can be used in the more critical parts of the
mine. In addition, these predictions can be strategically applied to a wide range of
activities within the preparation of drilling plans. In cases where the mine planners
assume a continuous state as is the case of the type of rock or the geology of the
opening, the proposed model can be useful to determine what are the real
performances to be achieved regarding the different types of rocks or geologies that
can be found during the development of the openings.
4.4 Incorporation of predictions into an integer
program
4.4.1 Integer program formulation
The model presented in this section is formulated as an integer linear programming
problem (ILP) that optimizes the planning of drilling activities while considering
historical data from an underground mining operation. The objectives are to
maximize the drilled meters per day and generate a more realistic drilling plan that
can be accomplished with greater confidence by taking into consideration historical
data.
In the following subsections, indices, parameters, and decision variables are defined.
Subsequently, the ILP formulation, including the objective function and operational
constraints, is introduced, followed by the solution approach.
64
4.4.1.1 Indices 𝒔 is a shift, 𝒔 = 𝟏, … , 𝑺
𝒐 is an operator, 𝒐 = 𝟏, … , 𝑶
𝒎 is a drilling machine, 𝒎 = 𝟏, … , 𝑴
𝒕 is a opening, 𝒕 = 𝟏, … , 𝑻
𝒂 is an activity, 𝒂 = 𝟏, … , 𝑨
4.4.1.2 Model parameters
𝑫𝒔 is the demand in terms of the number of openings 𝒕 to be drilled during a shift 𝒔
𝑳𝒕 is the length of the opening 𝒕 to be drilled
𝒅𝒕,𝒔,𝒂𝒐,𝒎
is the duration of doing an activity 𝒂 at the opening 𝒕 using a drilling machine
𝒎 during a shift 𝒔 by operator 𝒐
𝑼 is the duration of the shift 𝒔
𝒓𝒕𝒔𝒐𝒇𝒕
is a binary parameter that is equal to 1 if the opening 𝒕 is of soft rock, and 0
otherwise
𝑷 is the portion of soft rock openings to be drilled
4.4.1.3 Decision variables
𝒙𝒕,𝒔,𝒂𝒐,𝒎 = {
𝟏 if operator 𝒐 uses 𝒎 to drill the opening 𝒕 during shift 𝐬 for activity 𝒂𝟎 Otherwise
𝒚𝒔𝒐 = {
𝟏 if operator 𝒐 works during shift 𝒔𝟎 Otherwise
65
4.4.1.4 Objective function – Maximize:
∑ ∑ ∑ ∑ ∑ 𝑳𝒕 ∙ 𝒙𝒕,𝒔,𝒂𝒐,𝒎
𝑨
𝒂=𝟏
𝑺
𝒔=𝟏
𝑴
𝒎=𝟏
𝑻
𝒕=𝟏
𝑶
𝒐=𝟏
𝟒. 𝟖
The objective function is to maximize the drilled meters during a whole day of
drilling operations. Similar to what is presented in Campeau & Gamache (2019),
the objective function developed in this thesis does not consider any maximization
in terms of money since, at this short and specific level of planning (one month), the
main economic decisions have been already taken. However, resources such as
drilling machines and operators available are already determined, the mining
method was chosen, and the layout of the underground project should not be exposed
to any alteration. Therefore, changes at a scale that can influence the revenue of the
underground mining project are not many nor representative. Moreover, the mining
method used by the project implies multiple blastings to be done along the
production openings without requiring the material to be mocked out immediately,
and only one blasting is scheduled for the development openings (see the following
section). Therefore, accomplishing the targets proposed by the mine planners in
terms of openings to be drilled during an available time is the goal that this thesis
is aiming to achieve.
4.4.1.5 Model constraints
(𝑴 ∙ 𝑻 ∙ 𝑨) 𝒚𝒔𝒐 ≥ ∑ ∑ ∑ 𝒙𝒕,𝒔,𝒂
𝒐,𝒎
𝑨
𝒂=𝟏
∀𝒐, 𝒔
𝑻
𝒕=𝟏
𝑴
𝒎=𝟏
𝟒. 𝟗
𝒚𝒔𝒐 + 𝒚𝒔+𝟏
𝒐 ≤ 𝟏 ∀ 𝒐 Є 𝑶 𝒂𝒏𝒅 ∀ 𝒔 = 𝟏, … , (𝑺 − 𝟏) 𝟒. 𝟏𝟎
∑ ∑ ∑ 𝒙𝒕,𝒔,𝒂𝒐,𝒎
𝑺
𝒔=𝟏
𝑴
𝒎=𝟏
𝑶
𝒐=𝟏
≤ 𝟏 ∀ 𝒕, 𝒂 𝟒. 𝟏𝟏
66
∑ ∑ ∑ ∑ 𝒙𝒕,𝒔,𝒂𝒐,𝒎
𝑻
𝒕=𝟏
𝑨
𝒂=𝟏
𝑴
𝒎=𝟏
𝑶
𝒐=𝟏
≥ 𝑫𝒔 ∀ 𝒔 𝟒. 𝟏𝟐
∑ ∑ ∑ 𝒅𝒕,𝒔,𝒂𝒐,𝒎 ∙ 𝒙𝒕,𝒔,𝒂
𝒐,𝒎 ≤ 𝑼 ∀ 𝒔, 𝒐
𝑴
𝒎=𝟏
𝑨
𝒂=𝟏
𝑻
𝒕=𝟏
𝟒. 𝟏𝟑
∑ ∑ ∑ 𝒅𝒕,𝒔,𝒂𝒐,𝒎 ∙ 𝒙𝒕,𝒔,𝒂
𝒐,𝒎 ≤ 𝑼
𝑨
𝒂=𝟏
𝑻
𝒕=𝟏
𝑶
𝒐=𝟏
∀ 𝒔, 𝒎 𝟒. 𝟏𝟒
∑ ∑ ∑ ∑ 𝒓𝒕𝒔𝒐𝒇𝒕
∙ 𝒙𝒕,𝒔,𝒂𝒐,𝒎 ≤ 𝑷 ∑ ∑ ∑ ∑ 𝒙𝒕,𝒔,𝒂
𝒐,𝒎
𝑻
𝒕=𝟏
𝑨
𝒂=𝟏
𝑺
𝒔=𝟏
𝑶
𝒐=𝟏
𝑻
𝒕=𝟏
𝑨
𝒂=𝟏
𝑺
𝒔=𝟏
𝑶
𝒐=𝟏
∀ 𝒎 𝟒. 𝟏𝟓
𝒙𝒕,𝒔,𝒂𝒐,𝒎 ∈ 𝟎 or 𝟏 ∀ 𝒐, 𝒎, 𝒕, 𝒔, 𝒂 𝟒. 𝟏𝟔
𝒚𝒔𝒐 ∈ 𝟎 or 𝟏 ∀ 𝒐, 𝒔 𝟒. 𝟏𝟕
Constraints 4.9 and 4.10 ensure that an operator cannot work more than one shift.
Constraint 4.11 enforces that an activity cannot be done more than one time during
a shift. Constraint 4.12 sets the demand for openings that must be drilled. Setting
a demand ensures having a lower bound of openings planned to be advanced in order
to accomplish the needs of the project shift by shift without any deviation that might
occur during the operation. Constraints 4.13 and 4.14 ensure that both drilling
machines and operators cannot work more than the duration of the shift,
respectively. Limiting the work periods for both operators and machines allows the
planned development operations to be accomplished while respecting the shift
deadlines for blasting and bolting, among other activities that must be completed in
time in order to not compromise future work. Constraint 4.15 guarantees the
distribution of operations between both soft and hard rocks. This constraint avoids
situations such as when machines and operators with the best performances are
always scheduled for drilling openings with soft rocks, which will logically assure
more meters drilled, but it is not specifically what the operations required.
Constraints 4.16 and 4.17 are the variable bounds. Note that the soft and hard rock
67
classification used in this formulation is based on the RMR forecasted by the mine
planners to be drilled; rocks with RMR values were classified on a scale where the
ones located below 35 were understood as soft and the ones above 35, were classified
as hard. All the drilling machines are assumed to have the same specifications. The
characteristics of these drilling machines are outlined in section 4.2. In addition, no
degradation of equipment nor wear of the bit are considered. The parameter defining
the duration of each activity is obtained using the ANN.
4.4.2 Evaluating the impact of the ANN predictions in the
ILP results
The integer programming model described in section 4.4.1 is solved using the
branch-and-cut algorithm implemented in IBM´s CPLEX package, Studio version
12.7.1.0 in a Google Colab (python) environment. The final solution is obtained in
nearly 2.5 hours.
Table 4.66- Computational performance of the proposed model, number of decision
variables and number of constraints
Variables Constraints Time required to solve the problem
11566 53245 2.5 hours
Table74.7 - Cumulative results for the case study
Scenario Obj
Original Plan (S1) 481.95
New plan + No predictions (S2) 486.49
New plan + Predictions (S3) 528.90
68
Figure214.18 – Results of the case study
The three scenarios used to simulate and compare the effectiveness of the different
drilling plans and their optimization results are shown in Figure 4.18. The first
scenario corresponds to the initial drilling plan developed by the mine planners,
where the meters planned for each opening and the time that the task will take are
considered as constant. The second scenario results after optimizing over all the
possible combinations of variables available but without accounting for the
prediction of the ROP values. The third scenario is given by the optimization model
but in this case, it accounts for the prediction of the ROP values based on the
operational historical data.
It is shown that scenario 2 provides better values for the objective for some days,
but the distribution of the variables is still not optimal. Moreover, it can be noticed
that the solution value for the third scenario is significantly higher than the values
presented in the remaining scenarios. This can be explained by the fact that the
third scenario incorporates the predictions of ROP values based on historical data,
meaning that the performances used to build the drilling plans in terms of meters
per shift are closer to the reality of the operation. It was presented before that the
0
5
10
15
20
25
30
35
40
45
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
MET
ERS
DAYS
Objective
New plan + No Predictions Original Plan New Plan + Predictions
69
predictions obtained with the ANN model had an outstanding performance on the
testing set. Although independent variables as the equipment or the geology
represent considerably more information as the state or availability of the
equipment or geomechanical characteristics of the rocks, future studies can consider
the integration of these more detailed parameters in order to achieve even higher
performances in the prediction tasks.
70
Chapter 5
_________________________________________________
Conclusions and Future Work
5.1 Conclusions and objectives met
This thesis has shown that after testing the proposed model on a scenario that
incorporates predictions of ROP values based on historical data, the drilled meters
per day can increase for an underground mining project. These results suggest that
a more accurate drilling plan can be generated while mitigating the deviations that
plans created by the mine planners usually present by not having sufficient
operational information or by not accounting for it. Moreover, the model
demonstrates that accounting for operational information is valuable for building
plans that use the optimal distribution of the resources in order to outperform
conventional plans. Further down, the outcomes of this thesis are stated in terms of
the pertinent literature review performed and the initial objectives in Section 1.1.
1. Review the literature pertaining to the applications of artificial intelligence
algorithms in the mining industry.
A literature review was presented on the topics pertinent to the developments of
methods for predicting rates of penetration, specifically AI-based methods and their
benefits on this practice. One of the main observations in Sections 2.1 and 2.2 was
the inherent limitations of traditional models. These techniques are unable to model
complex and nonlinear relationships between crucial features. Therefore, these
models fail to understand those relationships and are often incapable of approaching
the problems or tasks in a more efficient manner. AI-based models are introduced
71
to overcome these drawbacks, and over time, have been shown to improve the
accuracy of the predictions. The review of the mathematical programming models
for optimizing short-term activities in Section 2.2 demonstrates a lack of
applications of these approaches in underground mines. Furthermore, very few
works have been developed, and failing to address the challenges of scheduling the
different activities presented within all the mining methods in underground
operations. Therefore, a crucial emphasis of future optimization for short-term
activities in underground projects should be the generation of more models that
account for the different underground mining methods, their corresponding mining
cycles, and their variants in terms of the operations and equipment.
2. Predict the rates of penetration (ROP) corresponding to underground
drilling machines using an artificial neural network (ANN) approach.
Different architectures of ANN are implemented and tested to predict rates of
penetration from a training set of operational features coming from an underground
mining project. An optimal architecture for an artificial neural network was found
with 3 hidden layers, 1500 hidden units, Adam as the optimizer, ReLu as the
activation function, batch normalization in all the layers, and a batch size equal to
64. A validation error of 0.000125 was obtained with the resulting architecture, and
the model was then used on a monthly drilling plan provided by the underground
mining project to predict the rates of penetration corresponding to the available
input features.
3. Develop an integer programming (IP) formulation for the optimization of
the drilling activities.
A formulation was proposed to optimize the drilled meters per day in an
underground mining project while accounting for operational constraints and
incorporating the predicted rates of penetration. The proposed approach
incorporates different operational constraints such as not allowing the best
72
machines to work always in soft openings or to vary the kind of activities that a
machine is doing during a day in order to realistically address problems that can be
encountered in the operation. The model was parametrized to represent a
mechanized underground gold project and the resulting problem was solved in
nearly 2.5 hours using a general-purpose solver (CPLEX).
4. Demonstrate by running the IP optimization model for scenarios with and
without the incorporation of the ROP predictions, it is possible to evaluate
and determine if these predictions should be implemented and included in
the computation of future drilling plans or not.
Both the artificial neural network predictions for the ROP and the integer
programming problem developed in Section 4.4 are applied to a realistic mining case
study in order to demonstrate the evaluation of the impact that an ANN predicter
of ROP can have on operational decision-making. The optimizer is able to efficiently
generate schedules of drilling activities that are more tied to the reality of the
operation and can take advantage of the performance of the individuals within the
different types of openings in terms of types of rock, geologies, and dimensions. The
resulting formulation is compared to a more traditional approach. This traditional
approach uses averaged inputs for operator/machine performances. The
incorporation of ANN predictions of ROP tends to increase the effectiveness of
assigning activities to the different resources and thus to maximize the drilled
meters per day within the operation.
73
5.2 Future work
The formulation presented in this thesis is successful in maximizing the drilled
meters per day to an underground mining operation while accounting for
operational constraints and incorporating predicted ROP values. However, the
approach presented is not without limitations.
Both the ANN neural network approach and the optimization model, developed in
Chapters 3 and 4, present three crucial venues for future work. First, a mine can be
seen as a dynamic system that suffers from several variations and changes after
every shift. Furthermore, rotation between operators, retirements, resignations,
and constant hiring of operators make this system even more challenging over time.
On the other hand, the proposed optimization formulation also needs to be re-
optimized, as new data becomes available. That being said, both predictions and
optimization model require continuous updates, and there are different
reinforcement learning approaches that can help to continuously account for new
data while offering the optimal objectives that a decision-maker might require.
Second, information regarding equipment availability can also be interesting to be
included within the proposed formulation and will help to build even more accurate
drilling plans. Third, the scope of the case study presented in Chapter 4 addresses
the complexity of only one part of the entire underground operations. Moreover, the
MSDF allows for only modeling drilling activities. Then, a larger case study
accounting for more activities will be worthwhile to investigate in order to measure
the predictive power of the tool as well as the time required for obtaining an optimal
coordination of drilling with other critical mining operations.
74
References
Amar, K., & Ibrahim, A. (2012). Rate of penetration prediction and optimization
using advances in artificial neural networks, a comparative study. In:
Proceedings of the 4th international joint conference on computational
intelligence. p.647–52.
Al-AbdulJabbar, A., Elkatatny, S., Mahmoud, M., Abdelgawad, K., & Al-Majed, A.
(2018). A robust rate of penetration model for carbonate formation. ASME. J.
Energy Resour. Technol.
Al-Jalil, Y. A. (1998). Analysis of Performance of Tunnel Boring Machine-based
Systems. The University of Texas at Austin, 427 p.
Arabjamaloei, R., & Shadizadeh, S. (2011). Modeling and optimizing rate of
penetration using intelligent systems in an Iranian southern oil field (Ahwaz
oil field). Pet. Sci. Technol. 29:1637–48.
Armaghani, D., Mohamad, E., & Narayanasamy, M. (2017). Development of hybrid
intelligent models for predicting TBM penetration. rate in hard rock
condition. Tunn Undergr Sp Technol 63: 29–43.
Awuah-Offei, K. (2016). Energy efficiency in mining: a review with emphasis on the
role of operators in loading and hauling operations. Journal of Cleaner
Production, 1-9.
Barton, N. (1999). TBM performance estimation in rock using QTBM. Tunnels and
Tunneling International 31 (9), 30–33.
Barton, N. (2000). TBM Tunneling In Jointed and Faulted Rock. Balkema,
Rotterdam, 173.
Bataee, M., & Mohseni, S. (2011). Application of artificial intelligent systems in ROP
optimization: a case study in Shadegan oil field. Paper presented in the SPE
Middle East Unconventional Gas Conference and Exhibition, Muscat, Oman.
Society of Petroleum Engineers. 31 January-2 February. SPE-140029-MS.
75
Benardos, A., & Kaliampakos, D. (2004). Modelling TBM performance with artificial
neural networks. Tunn Undergr Sp Technol 19:597–605.
Bilgesu, H., Tetrick, L., Altmis, U., Mohaghegh, S., & Ameri, S. (1997). A new
approach for the prediction of rate of penetration (ROP) values. Paper
presented at the SPE Eastern Regional Meeting, Lexington, Kentucky, 22–
24 October. SPE-39231-MS. https://doi.org/10.2118/39231-MS.
Bingham, M. (1965). A new approach to interpreting rock drillability. Petroleum
Publishing Company.
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. volume 4 of
Information science and statistics. Springer.
Bishop, C. M., & Nabney, I. T. (2008). Pattern Recognition and Machine Learning:
A Matlab Companion. Springer. In preparation.
Blindheim, O. (1979). Boreability Predictions for Tunneling. Ph.D. Thesis,
Department of Geological Engineering, The Norwegian Institute of
Technology, p.406.
Bourgoyne, A., & Young, F. (1974). A multiple regression approach to optimal
drilling and abnormal pressure detection. Soc. Pet. Eng. J ;14:371–84. doi:
https://doi.org/10.2118/4238-PA.
Bruines, P. (1988). Neuro-fuzzy modelling of TBM performance with emphasis on
the penetration rate. Memoirs of the Centre of Engineering Geology, Delft,
no 173.
Bruland, A. (1998). Hard Rock Tunnel Boring. Ph.D. Thesis, vol. 1–10, Norwegian
University of Science and Technology (NTNU), Trondheim, Norway.
Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition.
Knowledge Discovery and Data Mining.
Campeau, L.-P., & Gamache, M. (2019). Short-term planning optimization model for
underground mines. Computers and Operations Research.
Cassinelli, F., Cina, S., Innaurato, N., Mancini, R., & Sampaolo, A. (1982). Power
consumption and metal wear in tunnel-boring machines: analysis of tunnel-
boring operation in hard rock. Tunnelling ‘82, London. Inst. Min. Metall., 73–
81.
76
Cristianini, N., & Shawe-Taylor, J. (2000). Support vector machines and other
kernel-based learning methods. Cambridge University Press.
Dozat, T. (2016). Incorporating Nesterov Momentum into Adam.
Duchi, J., & E. Hazan, a. Y. (2011). Adaptive subgradient methods for online
learning and stochastic optimization. JMLR.
Eckel, J. R. (1967). Microbit Studies of the Effect of Fluid Properties and Hydraulics.
Journal of Petroleum Technology, pp. 514-546.
Elkatatny, S., Tariq, Z., Mahmoud, M., & Al-AbdulJabbar, A. (2017). Optimization
of rate of penetration using artificial intelligent techniques. Paper presented
at the Rock Mechanics/Geomechanics Symposium, San Francisco, California,
USA, 25-28 June. ARMA-2017-0429.
Epiroc (2020). Epiroc products. Retrieved from https://www.epiroc.com/en-
gr/products/drill-rigs/face-drill-rigs/boomer-282.
Farmer, I., & Glossop, N. (1980). Mechanics of disc cutter penetration. Tunnels
Tunneling;12(6):22–5.
Farrokh, J. R. (2012). Study of various models for estimation of penetration rate of
hard rock TBMs. Tunnelling and Underground Space Technology 30 (2012)
110–123.
Ghasemi, E., Yagiz, S., & Ataei, M. (2014). Predicting penetration rate of hard rock
tunnel boring machine using fuzzy logic. Bull Eng Geol Environ 73:23–35.
Gholamnejad, J., & Tayarani, N. (2010). Application of artificial neural networks to
the prediction of tunnel boring machine penetration rate. Mining Science and
Technology. 0727–0733.
Glorot, X., & Bengio., Y. (2010).
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Retrieved from http://www.deeplearningbook.org
Graham, P. (1976). Rock exploration for machine manufacturers, in exploration for
rock engineering. In: Bieniawski, Z.T. (Ed.), Proceedings of the Symposium,
vol. 1, Johannesburg, Balkema, pp. 173–180.
77
Grandori, R., Sem, M., Lembo-Fazio, A., & Ribacchi, R. (1995). Tunnelling by double
shield TBM in the Hong Kong granite. In: Proceedings of the 8th ISRM
Congress, vol. 1, pp. 569–574.
Grima, M., Bruines, P., & Verhoef, P. (2000). Modeling tunnel boring machine
performance by neuro-fuzzy methods. Tunn Undergr Sp Technol 15:259–269.
Hegde, C., Daigle, H., & Gray, K. (2018). Performance comparison of algorithms for
realtime rate-of-penetration optimization in drilling using data-driven
models. SPE J. 23:1706–22.
Herbrich, R. (2002). Learning Kernel Classifiers. MIT Press.
Hughes, H. (1986). The relative cuttability of coal measures rock. Mining Sci.
Technol. 3, 95–109.
Hillier, F., & Lieberman, G. (2016). Introduction to operations research. San
Francisco: Holden-Day: Chapter 3.
Hillier, F., & Lieberman, G. (2016). Introduction to Operations Research. Chapter
12.
Innaurato, N., Mancini, R., Rondena, E., & Zaninetti, A. (1991). Forcasting and
effective TBM performance in a rapid excavation of a tunnel in Italy. In:
Wittke W, editor. Proceedings of the 7th international cong rock mechanics.
p. 1009–14.
Ioffe, S. a. (2015). Batch normalization: Accelerating deep network training by
reducing internal covariate shift.
Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization.
Koopialipoor, M., Fahimifar, A., Ghaleini, E., Momenzadeh, M., & Armaghani, D.
(2020). Development of a new hybrid ANN for solving a geotechnical problem
related to tunnel boring machine performance. Engineering with Computers.
Laughton, C. (1998). Evaluation and Prediction of Tunnel Boring Machine
Performance in Variable Rock Masses. Ph.D. Thesis, The University of Texas,
Austin, USA.
Le Cun, Y., & B. Boser, J. S. (1989). Backpropagation applied to handwritten zip
code recognition (Vols. 541–551). Neural Computation.
78
Lumley, G. (2005). Reducing the Variability in Dragline Operator Performance. Coal
Operator´s Performance (pp. 97-106). Wollogon: in Aziz, N.
Mahdevari, S., Shahriar, K., Yagiz, S., & Shirazi, M. (2014). A support vector
regression model for predicting tunnel boring machine penetration rates. Int
J Rock Mech Min Sci 72:214–229.
Maurer, W. (1962). The ‘‘perfect-cleaning” theory of rotary drilling (Vol. 14).
doi:https://doi.org/10.2118/408-PA.
Müller, K. R., S. Mika, G. R., K. Tsuda, & Schölkopf, B. (2001). An introduction to
kernel based learning algorithms. IEEE Transactions on Neural Networks.
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in
nervous acactivity. Reprinted in Anderson and Rosenfeld (1998): Bulletin of
Mathematical Biophysics.
McFeat-Smith, I. (1999). Mechanised tunnelling for Asia. Work shop manual,
organized by IMS Tunnel Consultancy Ltd.
McFeat-Smith, I., & Tarkoy, P. (1979). Assessment of tunnel boring performance.
Tunnels and Tunneling, 33-37.
Murphy, K. P. (2013). Machine learning : a probabilistic perspective. MIT Press,
Cambridge, Mass.
Nair, V., & Hinton, G. E. (2009). 3d object recognition with deep belief nets. Culotta,
editors, Advances in Neural Information Processing Systems.
Nehring, M. ,. (2012). Integrated short-and medium-term underground mine
production scheduling. J. South. Afr. Inst. Min. Metall. 112(5), 365–378.
Nelson, P., Abad-Aljali, Y., & Laughton, C. (1999). Improved strategies for TBM
performance prediction and project management. In: RETC, pp. 963–979.
Nelson, P., Ingraffea, A., & O'Rourke, T. (1985). TBM performance prediction using
rock fracture parameters. Int. J. Rock Mech. Min. Sci. Geomech. Abstr. 22
(No. 3), 189–192.
Newman, A., Rubio, E., Caro, R., Weintraub, A., & Eurek, K. (2010). A review of
operations research in mine planning. Interfaces 40 (3), 222–245.
79
O'Sullivan, D., & Newman, A. (2015). Optimization-based heuristics for
underground mine scheduling. Eur. J. Oper. Res. 241 (1), 248–259.
https://doi.org/10.1016/j. ejor.2014.08.020.
Okubo, S., Kfukui, K., & Chen, W. (2003). Expert system for applicability of tunnel
boring machines in Japan. Rock Mechanics and Rock Engineering, 36 (4). pp.
305-322.
Ozdemir, L. (1978). Development of theoretical equations for predicting tunnel
borability. PhD thesis, Colorado School of Mines, Golden, Colorado.
Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptrons and the Theory of
Brain Mechanismn. Spartan.
Rostami, J. (1997). Development of a Force Estimation Model for Rock
Fragmentation with Disc Cutters Through Theoretical Modeling and
Physical Measurement of Crushed Zone Pressure. Ph. D. Thesis, Colorado
School of Mines, Golden, Colorado, USA, P. 249.
Rostami, J., & Ozdemir, L. (1993). A new model for performance prediction of hard
rock TBM. In: Bowerman, L.D., et al. (Eds.), Proceedings of RETC, Boston,
MA, pp. 793-809.
Rumelhart, D. E., & J. L. McClelland, a. t. (1986). Parallel Distributed Processing:
Explorations in the Microstructure of Cognition. MIT Press: Volume 1:
Foundations.
Kahraman, M. I. (2006). Performance prediction of a jumbo drill in Pozanti–Ankara
Motorway Tunnel (Turkey). Tunnelling and Underground Space Technology,
265.
Salaheldin, E. (2018). New approach to optimize the rate of penetration using
artificial neural network. Arabian J Sci Eng. 43(11):6297–304.
Sapigni, M., Berti, M., Bethaz, E., Busillo, A., & Cardone, G. (2002). TBM
performance estimation using rock mass classifications. Int J Rock Mech Min
Sci; 39: 771–88.
Schölkopf, B., A.Smola, C.Williamson, R., & Bartlett, P. L. (2000). New support
vector algorithms. Neural Computation.
80
Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: from
theory to algorithms. Cambridge University Press.
Soares C, D. H. (2016). Evaluation of PDC bit ROP models and the effect of rock
strength on model coefficients. J. Nat. Gas Sci. Eng; 34:1225–36.
Sundaram, N., Rafek, A., & Komoo, I. (1998). The influence of rock mass properties
in the assessment of TBM performance. Rotterdam: Balkema: In:
Proceedings of the 8th international IAEG congress. p.3353–9.
Song, Z., Rinne, M., & Wageningen, A. v. (2013). A review of real-time optimization
in underground mining production. J. South. Afr. Inst. Min. Metall. 113 (12),
889-897.
Tieleman, T., & Hinton, G. (2012). Rrmsprop: Divide the gradient by a running
average of its recent magnitude. COURSERA: Neuralnetworks for machine
learning, vol. 4, no. 2.
Vapnik, V. N. (1995). The nature of statistical learning theory. Springer.
Widrow, B., & Hoff, M. E. (1960). Adaptive switching circuits. In IRE WESCON
Convention. Record, Volume 4, pp. 96–104.
Yagiz, S. (2008). Utilizing rock mass properties for predicting TBM performance in
hard rock condition. Tunn. Undergr. Space Technol. pp. 326-339.
Yagiz, S., & Karahan, H. (2011). Prediction of hard rock TBMpenetration rate using
particle swarm optimization. Int J Rock Mech Min Sci 48:427–433.
Yagiz, S., Gokceoglu, C., Sezer, E., & Iplikci, S. (2009). Application of two nonlinear
prediction tools to the estimation of tunnel boring machine performance. Eng
Appl Artif Intell 22:808–814.
Zeiler, M. D. (2012). Adadelta: an adaptive learning rate method.
Zhao, Z., Gong, Q., Zhang, Y., & Zhao, J. (2007). Prediction model of tunnel boring
machine performance by ensemble neural Networks. Geomech. Geoeng. – Int.
J. pp. 123-128.
Zhou, J., Bejarbaneh, B., Armaghani, D., & Tahir, M. (2019). Forecasting of TBM
advance rate in hard rock condition based on artificial neural network and genetic
programming techniques. Bulletin of Engineering Geology and the Environment.
81