integer programming to evaluate operational impact of

Integer Programming to Evaluate

Operational Impact of Penetration Rate

Predictive Model

Sebastian Arenas Bermúdez

Department of Mining and Materials Engineering

McGill University, Montreal, Quebec, Canada

December 2020

A thesis submitted to McGill University in partial fulfillment of the

requirements of the degree of Master of Engineering

© Sebastian Arenas Bermúdez, 2020

1

Acknowledgements

First, I would like to express my deepest gratitude to professors Alessandro Navarra

and Roussos Dimitrakopoulos; they were both extraordinary mentors during this

graduate program and I could never have done it without their guidance. I would

also like to thank Barbara Hanley, and Deborah Frankland for the outstanding

administrative support during my time in McGill University.

I would also like to share my gratitude with those who where not only my partners

but my counselors during this journey: Daniel, Zachary, Zeyneb, Christian,

Matheus, Joao, Luiz, Ashish, Lingquing. I would like to give a very special thank

you to Dr. Zeyneb Brika for always taking the time to assist me during the

development of my recent research as well as her diligent efforts correcting and

suggesting changes to this thesis.

In addition, I would like to thank all the professors and colleagues at Universidad

Nacional de Colombia for helping me grow personally and professionally. In

particular, I would like to extend my thanks to Professor Jorge Martin Molina

Escobar for his guidance and advice during my undergraduate program.

Lastly and most importantly, I would like to thank our almighty God and my family,

Gloria, Otoniel, Susana, Mercedes, Sinforiano, Otoniel, and Luz Dary; their tireless

support and unconditional love makes this achievement also theirs.

2

Contributions

All of the chapters in this thesis have been entirely written by the student in

question, Sebastian Arenas Bermúdez.

3

Abstract

Short-term mine production scheduling for underground mines is considered to be

both complex and crucial not only for having successful operations but also for

accomplishing targets stated for the medium- and long-term mine plans. Moreover,

underground operations are also known for having a significant amount of

independent decisions concerning the available resources, tasks to be accomplished,

and technical aspects of the underground mining openings. Consequently, for

having high productivity within the operations, it is necessary to utilize a decision-

making tool to build short term production plans. Furthermore, incorporating

predictions of performances based on historical operational data into the decision-

making tool allows the decision-makers to accumulate more knowledge about the

reality of the operation. This thesis explores the benefits that performance

predictions can offer when incorporating them into the generation of drilling plans

within the context of underground mines.

In particular, an artificial neural network (ANN) is trained with operational data

from an underground gold mine. Consequently, the trained network is used to

predict the rate of penetration (ROP) for all possible combinations between the

available drilling machines and operators, activities to be performed, and openings

or destinations where these tasks can be executed. Moreover, an integer program

formulation is constructed to demonstrate the impact of incorporating these

predictions into operational decision-making. The integer programming formulation

aims to maximize the drilled meters per day while respecting physical and

operational constraints. A comparison between the initial drilling plan of operations

and the plans given by the optimization model with and without accounting for the

predictions is performed. The plan generated by the model which utilizes the

predictions resulting from the neural network outperforms both the initial plan of

drilling activities and the plan generated by the optimization model without

accounting for the predictions during the first 15 days of operations.

4

Résumé

La planification minière à court terme des opérations souterraines est considérée complexe

et à la fois cruciale, non seulement pour la réussite des opérations, mais aussi pour la

réalisation des objectifs fixés par la planification à moyen et long terme. De plus, les

opérations souterraines sont également connues pour avoir un nombre important de décisions

indépendantes concernant les ressources disponibles, les tâches particulières à accomplir et

les aspects techniques des tunnels. Pour avoir une productivité élevée, il est donc nécessaire

d'utiliser un outil d’aide à la décision pour établir des plans de production à court terme. Par

ailleurs, l'intégration de prédictions d’efficacité basées sur des données opérationnelles

historiques dans le outil d’aide à la décision permet aux décideurs d'accumuler davantage de

connaissances sur la réalité de l'opération. Dans ce contexte, cette thèse explore les avantages

que ces prédictions peuvent offrir lors de leur intégration dans la génération de plans de

forage dans le contexte des mines souterraines.

En particulier, la formation d’un réseau neuronal artificiel (RNA) est exécutée avec des

données industrielles provenant d'une mine d'or souterraine. Ce réseau est ensuite utilisé pour

prédire le taux de pénétration (TP) pour toutes les combinaisons possibles de machines de

forage et d’opérateurs disponibles, l’assignation de tâches et les tunnels ou destinations où

ces tâches devront être exécutées. De plus, une formulation de programme en nombre entier

est construite pour démontrer l'impact de l'incorporation de ces prédictions dans la prise de

décision opérationnelle. Cette formulation vise à maximiser les mètres forés par jour, tout en

respectant les contraintes physiques et opérationnelles. Une comparaison entre le plan

d'opérations de forage initial et les plans améliorés par le modèle d'optimisation avec et sans

prédictions de TP est effectuée. Au cours d’une période de 15 jours d'exploitation, le plan

généré par le modèle d’optimisation qui utilise les prédictions résultant du réseau neuronal

surpasse à la fois le plan initial et le plan généré par le modèle d'optimisation qui n’utilise

pas les prédictions.

5

Contents

Acknowledgements ...................................................................................................... 1

Contributions .............................................................................................................. 2

Abstract ....................................................................................................................... 3

Résumé ........................................................................................................................ 4

List of Figures ............................................................................................................. 7

List of Tables ............................................................................................................... 9

List of Terms ............................................................................................................. 10

Chapter 1 - Introduction ...................................................................................... 12

1.1 Objectives ......................................................................................................... 14

1.2 Thesis outline .................................................................................................. 15

Chapter 2 – Literature Review ........................................................................... 16

2.1 Rate of penetration (ROP) predictive models ................................................. 17

2.1.1 Traditional models .................................................................................... 17

2.1.2 Artificial intelligence-based models ......................................................... 21

2.2 Mathematical programming in the short-term planning of mining activities

for underground mines .......................................................................................... 26

Chapter 3 - Methods .............................................................................................. 28

3.1 Artificial neural networks (ANN) ................................................................... 28

3.1.1 Relation between least-squares regression and neural net training ...... 32

3.1.2 Adjustment of parameters ........................................................................ 34

3.1.3 Gradient descent optimization ................................................................. 36

3.1.4 Error back-propagation ............................................................................ 37

3.2 Integer linear programming ............................................................................ 40

3.2.1 Solution approaches .................................................................................. 41

Chapter 4 - Case Study – Underground Gold Project ................................... 43

4.1 Datasets ........................................................................................................... 43

4.2 Underground operation outline....................................................................... 44

4.2.1 Mine layout ............................................................................................... 45

4.2.2 Mining method .......................................................................................... 46

4.2.3 Drilling operation...................................................................................... 48

4.3 Prediction of rates of penetration (ROP) ........................................................ 48

6

4.3.1 Data preprocessing ................................................................................... 49

4.3.2 Testing different architectures of neural networks ................................. 52

4.3.3 Performance of the final architecture on the testing set ......................... 61

4.4 Incorporation of predictions into an integer program .................................... 63

4.4.1 Integer program formulation .................................................................... 63

4.4.2 Evaluating the impact of the ANN predictions in the ILP results ......... 67

Chapter 5 - Conclusions and Future Work ....................................................... 69

5.1 Conclusions and objectives met ...................................................................... 70

5.2 Future work ..................................................................................................... 73

References ................................................................................................................. 74

7

List of Figures

Figure 3.1 - Network diagram for a two-layer neural network (Bishop & Nabney,

2008) .......................................................................................................................... 31

Figure 3.2 - Geometrical view of the error function E(w) as a surface sitting over a

weight space (Bishop & Nabney, 2008) .................................................................... 35

Figure 3.3 - Forward and backward propagation of error information (Bishop &

Nabney, 2008)............................................................................................................ 39

Figure 4.1 - Flowchart of the proposed approach ..................................................... 44

Figure 4.2 – Boomer 282 (Epiroc, 2020) ................................................................... 45

Figure 4.3 – Mine layout (Red Eagle Mining, 2014) ................................................ 46

Figure 4.4 - MSDF: first and second lifts (Red Eagle Mining, 2014) ...................... 47

Figure 4.5 – Subsequent lifts and mucking (Red Eagle Mining, 2014) ................... 47

Figure 4.6 - Correlation matrix, including operators and performance measures . 49

Figure 4.7 - Correlation matrix, including shift, team, rock types and performance

measures .................................................................................................................... 49

Figure 4.8 - Left: ROP histogram. Right: normalized ROP histogram ................... 51

Figure 4.9 - Left: sample from training data. Right: sample from training data after

one-hot encoding of the team attribute .................................................................... 52

Figure 4.10 – Architecture and variables ................................................................. 53

Figure 4.11 - Comparison of training loss evolution with different optimizers for a

one hidden layer model ............................................................................................. 57


two hidden layers model. .......................................................................................... 57


three hidden layers model ........................................................................................ 57

Figure 4.14 - Comparison of training loss evolution with different batch sizes for a

two hidden layers model ........................................................................................... 58

Figure 4.15 - Comparison of training loss evolution with different batch sizes for a

three hidden layers model ........................................................................................ 58

8

Figure 4.16 - Training and validation performance ................................................. 60

Figure 4.17 - Features and label for the ANN model ............................................... 61

Figure 4.18 – Results of the case study .................................................................... 68

9

List of Tables

Table 4.1 - Model features and target ...................................................................... 43

Table 4.2 - Dimension of parameters for the ANN. Individual parameters are

referenced by the dimension letter in lowercase as a subscript .............................. 53

Table 4.3 - Combinations during hyperparameter optimization ............................. 59

Table 4.4 - Comparison with different neural network architectures ..................... 60

Table 4.5 - Drilling plan example for one day of operations .................................... 61

Table 4.6 - Computational performance of the proposed model, number of decision

variables and number of constraints ........................................................................ 67

Table 4.7 - Cumulative results for the case study ................................................... 67

10

List of Terms

ANN Artificial neural network

AI Artificial intelligence

AR Advance rating

BTS Brazilian tensile strength

CDF Cumulative distribution function

DNN Deep neural network

DPS Distance between planes of weakness

FL Fuzzy logic

FPI Field penetration index

GPU graphics processing unit

ICA Imperialism competitive algorithm

IMS Integrated mass system

IP Integer programming

LOM Life of mine

LP Linear programming

LTMPS Long-term mine production schedule

MIP Mixed-integer programming

MLP Multilayer perceptron

MSDF Mechanized shrinkage with delayed fill

PDF Probability density function

RCS Rock compressive strength

11

RMR Rock mass rating

ROP Rate of penetration

RPM Revolution per minute

RQD Rock quality designation

RSR Rock structure rating

SGD Stochastic gradient descent

STMPS Short-term mine production schedule

SVM Support vector machines

SVR Support vector regression

TBM Tunnel boring machine

USC Uniaxial compressive strength

WOB Weight on bit

12

Chapter 1

_________________________________________________

Introduction

Large capital investments are generally necessary when a company initiates a

mining project. High costs are encountered at the beginning of the operations and

returns on investment are not seen before developing enough openings to connect to

the orebody. On the other hand, mining is also known for being a very lucrative

industry when the operations involved have an efficient execution. Effectively

managing long-, medium-, and short-term mine planning is crucial in order to

achieve successful and highly productive operations, which, in the end, translates

into profits. Moreover, operations can be analyzed and scheduled differently

according to the timeframe of the given planning exercise. In fact, numerous

operational activities to be executed during the life of mine (LOM) are typically

planned considering the short-term state of the mine (Campeau & Gamache, 2019).

This is often seen as a problem when one is required to move from the long-term to

shorter and more precise operational plans. This thesis proposes a mathematical

formulation aiming to optimize and test multiple scenarios for the scheduling of

drilling operations while incorporating predictions of rates of penetration (ROP) in

a mechanized underground gold mine.

Furthermore, the performance of drilling activities is an essential indicator in

underground mining operations with a high correlation to production targets. In

mechanized projects where drilling machines are not autonomous, the efficiency of

performing a drilling activity is measured by considering both the performance of

the operator and the equipment. Other aspects such as the duration of the shifts,

the particular team of miners working on the job, as well as the geometry, the

13

geology, and geomechanical characteristics of the openings also have an important

impact on the output of drilling activities (Awuah-Offei, 2016; Kahraman, 2006).

Scheduling drilling operations in an underground project is known for being a highly

demanding and complex activity. This complexity is often related to the great

number of independent decisions to be made, involving the available operational

resources and accounting for their respective restrictions. Factors affecting this

decision-making process include analyzing previous relations between those

resources and the geological and geotechnical characteristics of the excavation, as

well as the high demand for openings to be drilled, among others. Consequently, the

implementation of a decision-making tool in order to perform the planning and

scheduling of short-term activities is imperative to maintain a high level of

productivity within the operations. Furthermore, incorporating data-driven

predictions of operational features in terms of performance allows the decision-

makers to be more informed about the process’ reality making any decision.

Therefore, this thesis contains a proposed optimization model for planning short-

term drilling activities while accounting for predictions of operational

performances.

In this thesis, the proceeding case study strategically determines and utilizes the

predicted performances from the collected data by using a feedforward network

approach. Then, these predictions are included in the parameters of the integer

programming model in order to generate an optimal drilling plan for a short-term

horizon of time. One initial experiment is performed by only testing the optimization

model without incorporating the predictions given by the neural network.

Consequently, a second experiment is executed, and in this case, the optimization

model is tested with the addition of the predictions. Finally, results found in both

scenarios are compared to the conventional drilling plan provided by the mining

project. This thesis thus demonstrates an approach for justifying the

implementation of novel ROP predictors within short-term underground planning.

In particular, the thesis develops a DNN predictor, but the approach could be

14

adapted for other predictors including traditional regressions or specialized machine

learning formulations.

In the following chapters, both the artificial neural network approach and the IP

model are outlined, followed by a comprehensive case study. Subsequently,

conclusions and future work are presented.

1.1 Objectives The goal of this thesis is to investigate and test frameworks that predict and

optimize penetration rates as well as short-term mine drilling schedules based on

historical data coming from an underground gold mine. In order to accomplish this

target, the subsequent tasks and objectives must be achieved.

1. Develop and demonstrate an artificial neural network (ANN) approach to

predict rates of penetration for an underground mining operation. These

predictions must be based on drilling information coming from the

underground operation.

2. Develop an integer programming (IP) formulation for the optimization of the

drilling activities in order to maximize the drilled meters per day in an

underground operation.

3. Demonstrate by running the IP optimization model for scenarios with and

without the incorporation of the ROP predictions, it is possible to evaluate

and determine if these predictions should be implemented and included in

the computation of future drilling plans or not.

15

1.2 Thesis outline

1. The current chapter presents the motivation, goals, objectives, and the

outline of the thesis.

2. The second chapter contains the literature review regarding different

methods used to predict the rate of penetration (ROP) values. This entails

covering developments in both traditional and machine learning

approaches. This chapter also presents the literature review corresponding

to different mathematical programming approaches in the mining industry,

specifically in applications that depend on the optimal short-term

scheduling of underground activities.

3. The third chapter outlines the methods used in this thesis. First, concepts

on artificial neural networks (ANN) are reviewed, followed by key concepts

in mathematical programming, including integer programming techniques.

4. The fourth chapter presents a case study where both methods are applied.

First, different artificial neural networks (ANN) are implemented and

tested, and an optimal architecture is found for predicting rates of

penetration. Subsequently, the proposed integer programming formulation

is tested on two different scenarios.

5. The fifth chapter summarizes the discoveries of this thesis and suggests

options for future work.

16

Chapter 2

_______________________________________________

Literature Review

This chapter outlines the literature pertinent to the subjects of artificial neural

networks for the prediction of drilling productivity, as well as some mathematical

programming approaches applied to different optimizations, within different

industries, focusing on the mining sector. More specifically, this literature review is

structured into two sections.

1. Section 2.2 reviews past works associated with the prediction of penetration

rates in the civil, petroleum, and mining industries. It covers traditional

approaches proposed since the early sixties. The transition between the

earlier empirical and analytical models is detailed in this section. Moreover,

it contains the development of machine learning techniques from more

conventional methods to algorithms that provide important improvements in

terms of the accuracy of the results. Some data-driven approaches to predict

rates of penetrations are covered.

2. Section 2.3 reviews the literature focusing on different mathematical

programming approaches used in the mining industry, more specifically in

the successful planning of activities for underground mines. Following this,

optimization models regarding short-term mine plans are covered.

17

2.1 Rate of penetration (ROP) predictive models

The rate of penetration (ROP) can be expressed as the quotient between the distance

excavated and the operating time in the course of an uninterrupted underground

development phase. This ROP can be used to measure the performance of the

underground drilling machines, which is widely known for being a challenging and

crucial task in the development and success of mining excavations. A precise

prediction of the ROP facilitates efficient and accurate planning. In recent decades,

several studies have been developed in order to formulate models with higher

accuracies when determining parameters involved in the ROP predictions, which

are not exclusive to underground mining operations. In fact, most of the applications

found in the literature belong to industries such as tunneling and oil and gas.

Indeed, the following subsections review some applications where the ROP is

predicted for underground mining as well as for tunneling and petroleum projects.

2.1.1 Traditional models

The planning of underground projects, a successful cost control, and the decision of

the construction approach require effective prediction of underground drilling

machines’ performances. Theoretically, there is a complex relationship between the

rock mass and the drilling machines. This complexity makes it hard to obtain a

reliable estimation of performances for the underground drilling machines. Many

earlier works aim to employ empirical and field models where the relationship

between the ROP and parameters involved in the underground drilling operations

is utilized by implementing mathematical functions. Early literature documenting

the prediction of ROP by employing analytical models was published in the early

sixties and corresponds to Maurer (1962) and Bingham (1965), two researchers

belonging to the petroleum industry. Maurer (1962) proposed a mathematical model

focused on the estimation of the ROP for rolling cutter bits. This analytical model is

known for utilizing the rock cratering approach while being based on input features

such as the rock compressive strength (RCS), the weight on bit (WOB), the diameter

of the drill bit, and the rotating speed revolution per minute (RPM). Moreover, an

18

empirical coefficient was added in this theoretical model in order to include the type

of rock where the drilling operation is performed. Bingham (1965) established a

mathematical model in order to perform ROP predictions based on only the WOB,

RPM, and bit diameter. As can be expected, these initial applications carried several

limitations that compromise the accuracy of the predictions (Soares C, 2016).

Another model was developed by Eckel (1967), aiming to utilize a Reynolds number

function to establish the relationship between the ROP and the characteristics of

the mud, considering the effects of the latter as an additional feature. Almost one

decade later, Bourgoyne & Young (1974) proposed an additional mathematical

model employing multiple regression analysis with new parameters in order to

incorporate different physical and geological features ignored by previous

investigations.

The earliest attempts to develop prediction models in the tunneling and mining

industries sought to estimate the performances of tunnel boring machines (TBMs)

(Graham, 1976, Nelson et al., 1985, Hughes, 1986) and specifically the disc cutters

in mining excavations (Farmer & Glossop, 1980, Ozdemir, 1978) based on the

features having the highest impact in the drilling operations. The model developed

by Graham (1976) only requires parameters such as the Uniaxial compressive

strength (UCS) and the average thrust per cutter when being computed. Farmer &

Glossop (1980) and Nelson et al. (1985) correlated the ROP with the rock fracture

toughness and the rock tensile strength, respectively. Ozdemir (1978) adopted new

parameters such as the UCS according to the diameter of a disc, the radius of the

cutter, the penetration of the disc, the spacing between disc grooves, shear and

compressive strength, and the angle corresponding to the cutter edge. McFeat-

Smith & Tarkoy (1979) proposed a model where different relations are used to

perform ROP predictions. Even though this study considers several machines and

geologies, in practice the established model cannot be used since it was developed

for a particular excavation. In the model presented by Hughes (1986), parameters

such as the number of cutters per kerf (groove) as well as the radius of the discs

were considered. Although these empirical and theoretical equations are well known

19

for being easy to implement, they have the limitation of only being developed for

homogeneous and isotropic environments. In general, the predictions performed

with these models are underestimated as a result of the lack of joint parameters,

making these models limited in terms of their applicability.

Posterior to the simple models listed above, the ROP predictive models continued to

account for multiple parameters while considering both rock mass and drilling

machine features. Models developed by the Colorado School of Mines (Rostami &

Ozdemir, 1993, Rostami, 1997), and Norwegian University of Science and

Technology (Blindheim, 1979, Bruland, 1998) are the most known ROP predictive

models using multiple parameters. Tests performed by Blindheim (1979) and

Bruland (1998) allowed the authors to derive parameters related to indentation,

drillability, and boreability in order to build an improved predictive model for cutter

penetration with different correlations between the drilling machine and the rock

mass, including most of the relevant influencing features. These models were

established through a multivariate regression approach. In the investigations

presented by Rostami & Ozdemir (1993) and Rostami (1997), rock fragmentation

produced by the disc cutters was recreated in the full-scale linear cutting tests.

These tests permitted the authors to produce predictions with higher accuracies due

to their capability to adapt to field conditions. Other studies focusing on models with

multiple parameters were developed in Barton (1999) and Barton (2000), wherein

the proposed QTBM model was based on the Q rock mass classification system. The

QTBM model is able to predict the ROP and the advance rate of the corresponding

drilling machine after computing the value of QTBM. It is important to mention that

new parameters such as joint conditions, Rock Quality Designation (RQD), the

quartz content, stress condition, intact rock strength, and the TBM thrust were

adopted in order to make the model adaptable to be used in drilling operations.

There was an improvement in terms of the robustness of the datasets being utilized

to build these models with multiple parameters. However, too many features are

considered for practical applications, and special laboratory tests are still required.

In addition, these models expect data from specific zones, meaning that in areas

20

where it is impossible to perform the test, the predictions must be based on data

from areas with similar properties, generating a considerable error within the

method (Farrokh et al., 2012).

More recent work has focused on developing probabilistic-based models like the ones

presented in Laughton (1998), Nelson et al. (1999), and Al-Jalil (1998), which

consider more elaborate statistical approaches to predict the ROP. Probabilistic

models have the advantage of accounting for the uncertainty inherent in the data

when executing a performance assessment. However, these types of models are not

very common when predicting features such as the ROP since they suffer from one

major drawback: they rely on information such as the probability density function

(PDF) for each of the features involved in the method. These PDFs must be gathered

from projects with similar characteristics in order to support the new predictions.

Nevertheless, this type of information is often hard to obtain, and the conditions

between one project and the others often do not have the similarity required for

these methods, resulting in critical errors. Another disadvantage of these

probabilistic models is based on the fact that the iteration between the drilling

machine and the rock mass is ignored. In general, probabilistic models are

considered to be more complex to be implemented, and therefore they are less used

than deterministic multi-parameter models.

Other authors aimed to correlate the ROP with the rock mass classification systems

(Cassinelli et al., 1982; Innaurato et al., 1991; Grandori et al., 1995; Sundaram et

al., 1998; McFeat-Smith, 1999; Sapigni et al., 2002). Rock classification systems

such as the rock mass rating (RMR), the rock structure rating (RSR), the Q system,

and the integrated mass system (IMS) were utilized in these models. For instance,

Cassinelli et al. (1982) utilized the RSR system in conjunction with the performance

of a drilling machine in order to estimate the ROP. Subsequently, Innaurato et al.

(1991) adopted the uniaxial compressive strength UCS to the model proposed by

Cassinelli et al. (1982). McFeat-Smith (1999) and Grandori et al. (1995) formulated

a model by generating a correlation between the IMS and features such as the

21

utilization of the drilling machine, the ROP, and the advance rate (AR). A model

presented in Sundaram et al. (1998) showed different correlations with features

such as the ROP, field penetration index (FPI), massic energy, torque, and

utilization. In addition, Sapigni et al. (2002) studied the correlation between the FPI

and the RMR system. Moreover, all the studies listed above concluded that the ROP

decreases when the rock being drilled is considered competent. However, results

also showed that the ROP is low in cases with poor rock qualities and constant

discontinuities.

2.1.2 Artificial intelligence-based models

Regression techniques are extensively employed in applications where the goal is to

predict continuous values. Moreover, machine learning methods such as

feedforward neural networks have become common tools to perform both

classification and regression tasks. A feedforward neural network is a mathematical

representation of computation mimicking the behavior of the human brain. The

model consists of many basic computing devices known as the neurons that are

interconnected within a dense system, where highly complex computations are

carried. The approach of learning with neural networks was first suggested in the

mid-20th century. Learning with neural nets is an effective paradigm to accomplish

advanced and innovative reasoning sequences within several learning applications.

In terms of mathematics, a neural network can be defined as a directed graph with

nodes acting as the neurons and edges acting as neural connections. At each node,

a weighted sum of the outputs is taken as an input linked to its incoming edges

(Shalev-Shwartz & Ben-David, 2014; McCulloch & Pitts, 1943; Widrow & Hoff,

1960; Rosenblatt, 1962; Rumelhart & McClelland, 1986).

Modern machine learning methods offer a very robust outline in terms of supervised

learning. Supervised and unsupervised learning are two common terms within the

machine learning field. The main difference between these two can be explained by

the fact that supervised learning methods are performed having prior knowledge of

what the output values should be, and unsupervised methods do not incorporate the

22

labeled outputs in order to infer the behavior given by the dataset in concern. The

step from a feedforward to a deep feedforward architecture is based on the addition

of layers and units within a layer. More than two layers constitute a deep network

that is able to characterize functions of escalating complexity regarding the high

dimensionality of the data. Deep neural nets are the foundation for so-called deep

learning. Most applications based on mapping an input vector of features to an

output vector of labels can be achieved with the implementation of a deep learning

model (Goodfellow et al., 2016; Bishop, 2006; Murphy, 2013). Although neural

networks are sometimes costly to train, they are comparatively robust and

adaptable in comparison to other mathematical representations. The structure of

multi-layered neural nets involves several parameters that must be fitted through

non-convex optimization problem which can result in more than one optimum value.

However, due to their universal function approximation property, neural nets are

allowed to be broad enough to represent most of the complex distributions

(Goodfellow et al., 2016).

There have been several advancements to determine more accurate predictions,

consisting of basic regressions and classifications that involve linear combinations

of predetermined basis functions, with some advantageous analytical and

computational attributes but low applicability, restricted by the so-called curse of

dimensionality that refers to the numerous phenomena encountered when studying

and organizing data in high dimensional spaces (Bishop & Nabney, 2008, Bishop,

2006, Goodfellow et al., 2016). Recent frameworks can adapt the basis functions to

the data in order to develop representations that consider with a large number of

features. Support vector machines (SVMs) focus on this by identifying basis

functions that are centered on the points corresponding to the chosen training

dataset and subsequently picking a subset of these while the model is training. The

adjustment of the fit, i.e., the learning algorithm, depends on the convexity of the

loss function which is to be minimized as part of the training process. The number

of basis functions that are active in the final models are usually far less than the

number of points corresponding to the training dataset (Vapnik, 1995; Burges, 1998;

23

Cristianini & Shawe-Taylor, 2000; Müller et al., 2001; Schölkopf et al., 2000;

Herbrich, 2002). Arguably, the most effective and robust approach among the

supervised learning frameworks for pattern identification is the artificial neural

network (ANN); this approach is considered to be the most similar to the human

approach of learning. In many applications, the final model is substantially more

efficient, and subsequently, the speed of the evaluation process is higher compared

to support vector machine having the same performance on the validation and

testing sets (Shalev-Shwartz & Ben-David, 2014; McCulloch & Pitts, 1943; Widrow

& Hoff, 1960; Rosenblatt, 1962; Rumelhart & McClelland, 1986).

The adaptable qualities of artificial intelligence (AI) models have allowed many

engineering problems with complex and nonlinear relationships between features to

be captured in an effective manner. Prediction of underground drilling

machine performances, and specifically of ROPs, has indeed been observed to

involve complex and nonlinear relationships between features.

Similar to the traditional models, artificial intelligence (AI) based models were

suggested first within the petroleum industry. Bilgesu et al. (1997) proposed an

ANN with three hidden layers and 27 hidden units; the layer structuring of ANN

will be described in the following chapter. This neural net was trained to predict

ROP values based on data coming from different formation types and drilling

features. This study aimed to provide a more accurate solution by modeling the

complex patterns involved in the drilling operations that other traditional models

and statistical approaches failed to represent. Consequently, several studies were

developed in the petroleum industry in order to demonstrate how AI techniques can

outperform the empirical and theoretical formulations as well as the mathematical

models previously proposed (Al-AbdulJabbar et al., 2018; Amar & Ibrahim, 2012;

Arabjamaloei & Shadizadeh, 2011; Bataee & Mohseni, 2011; Salaheldin, 2018;

Hegde et al., 2018; Elkatatny et al., 2017).

Likewise, several investigations were developed within the tunneling and the

mining industries in order to utilize the robustness of the AI-based models when

24

predicting the drilling performance of underground machines. Both Bruines (1988)

and Grima et al. (2000) proposed hybrid neuro-fuzzy approaches where both ANN

and fuzzy logic (FL) are combined to generate ROP predictions while accounting for

the uncertainty and imprecision of the data, and performing inference and decision

making. Okubo et al. (2003) presented an expert system approach aimed to predict

TBM performances for competent rocks for various underground projects in Japan.

In this study, the performance of the drilling machines is determined in terms of the

ROP, advance rate (AR), thrust force, rolling force, rotational speed, and other

selected features. Additionally, the approach presented in Okubo et al. (2003)

evaluates the predictions obtained by the model with respect to different approaches

presented in the literature.

Benardos & Kaliampakos (2004) trained an ANN model by using the Athens Metro

dataset. Different than traditional studies, the aim of this research was to predict

the advance rate (AR) of the tunnel boring machines by including different

geomechanical and geological features. The architecture of the ANN presented in

Benardos & Kaliampakos (2004) consists of four layers: eight units in the input layer

corresponding to the number of features, and thirteen units in the remaining hidden

and output layers for a total of twenty-one neurons. Following the mathematical

formulation showed in Yagiz (2008), an ANN model was proposed by Yagiz et al.

(2009) in order to predict the cutting performance of TBMs for competent rocks in a

tunneling project in the USA. In this study, only four features (UCS, Brazilian

tensile strength (BTS), the distance between planes of weakness (DPW), and the

angle between the plane of weakness and the direction of the TBM) were used to

train the model and to predict the ROP. This ANN model presented by Yagiz et al.

(2009) was built with only one hidden layer composed of eight units. Moreover, the

performances between the ANN model and a nonlinear multivariable regression

method were compared. Zhao et al. (2007) developed model to predict the ROP of

TBMs for a tunneling project. This model was trained only with data coming from

two tunnels with only one type of rock and one type of drilling machine; therefore,

this model is not suitable for scenarios with different characteristics. Additionally,

25

this model does not take into account the effect of the in situ stress in the ROP

values. Another ANN model was presented by Gholamnejad & Tayarani (2010),

seeking to predict the ROP of one TBM. The architecture used for this model

consisted of a five-layer neural network with three input features (UCS, DPW, and

RQD), three hidden layers with nineteen units, and one output layer.

More recent AI approaches have been developed by hybridization. A particle swarm

optimization (PSO) approach for predicting ROP values from a competent rock

dataset was presented by Yagiz & Karahan (2011). Moreover, a support vector

regression (SVR) and fuzzy-logic models were proposed by Mahdevari et al. (2014)

and Ghasemi et al. (2014), respectively, aiming to predict rates of penetration for a

hard rock dataset. A more robust work was presented by Armaghani et al. (2017)

aiming to investigate and test new AI-based models for estimating the performance

of TBMs in terms of rates of penetration. Several input features were utilized in this

research to test different combinations of intelligent systems. The hybrid system

used were combinations between imperialism competitive algorithm (ICA) and

ANN, PSO and ANN, and a separate ANN. Finally, Zhou et al. (2019) and

Koopialipoor et al. (2020) developed two hybrid models (ANN-genetic programming

and ANN-firefly algorithm) aiming to predict the performance of TBMs.

The results of the majority of the studies reviewed in this section showed that AI-

based models have higher accuracy compared with traditional models. However, it

is important to notice that all of this research was developed for predicting the

performance of nearly autonomous drilling machines, as is the case for the TBMs.

Furthermore, no studies regarding high-skilled human-operated equipment are

found.

26

2.2 Mathematical programming in the short-term

planning of mining activities for underground

mines

Many of the mathematical programming models developed for the mining industry

have been focused on long-term mine production scheduling (LTMPS). Nevertheless,

the attention on short-term mine production scheduling (STMPS), has significantly

increased in the past decades. The acceleration of scheduling tasks and the

reduction of operational costs are some of the reasons for the increasing interest in

the optimization of STMPS. Modeling real operational conditions is considered

fundamental and complex, given the wide variety of conditions from one mine to the

other. Indeed, the formulation of an optimization model for short-term mine

planning is often linked with the nature of the operations taking place in the

particular mine site that is under investigation. Therefore, the optimization goals

are also highly related to the operations themselves and the mining projects.

However, almost all the mathematical formulations for optimizing both LTMPS and

STMPS are directed to open-pit mines. Newman et al. (2010) stated that the layout

of the underground mines is more complex and is limited by more aspects than those

of open-pit mines. Furthermore, there is a broad range of underground mining

methods, making the models developed to optimize short-term mine plans particular

for each of the methods. Therefore, the optimization goals are even more sparse and

diverge from one underground project to the other.

Some of the research developed regarding the mathematical programming of

STMPS for underground projects are directed to the growth of real-time monitoring

and optimization systems, as is outlined in Song et al. (2013). These models consider

problems that have been widely investigated in open-pit mining, such as the

dispatching of mobile equipment. On the other hand, Nehring et al. (2012) proposed

an integrated short-and medium-term model for underground mine production

scheduling. This integrated scheduling tool proved to be useful and effective in

providing globally optimal scheduling while considering all the connections that

27

appear between both the medium and the short-term planning. Moreover,

O’Sullivan & Newman (2015) developed a model aiming to optimize the scheduling

of complex operations in short-term mine planning. In this research, a heuristic is

implemented in order to enhance the tractability of the given problem. Finally, a

larger research is presented by Campeau & Gamache (2019), where a preemptive

mixed-integer program was used to generate optimal short-term scheduling of

several underground activities while accounting for multiple independent decisions

involved within the operation. Multiple tests were run within this research while

also considering scenarios that simulate different stages of the project. Different

than previous works, this investigation aims to simplify the transition from medium

to short-term mine planning by assessing different operational scenarios and

guaranteeing an optimal allocation of assets. Furthermore, this research does not

follow the assumption supported by works outlined above, where the certain activity

that starts at any location of the project will be finished for a predetermined period

of time while waiting for it to be ended.

28

Chapter 3

________________________________________________

Methods

This chapter summarizes the methods used for building the artificial neural

network implemented for predicting the ROP as well as the IP model for optimizing

the drilling activities.

3.1 Artificial neural networks (ANN)

Artificial neural networks are widely recognized for their inherent and efficient

manner of approaching nonlinear problems. In contrast, logistic regression and

linear regression, are usually found attractive due to their efficiency and reliability

to fit given the data, for both the closed-form solution and the case of using iterative

optimization. However, linear models also suffer due to their restricted power given

by their linear functions, which is often translated in models that are not able to

capture the relationship between any two input features that have nonlinear

interdependence. In order to use linear approaches to model nonlinear functions of

𝑥, one can utilize the linear model on the converted input 𝜑(𝑥), with 𝜑 as the

nonlinear transformation as is presented in Equation 3.1.

𝑦(𝒙, 𝒘) = 𝑓 (∑ 𝑤𝑗𝜙𝑗(𝒙)

𝑀

𝑗=1

) 3.1

For this particular case, 𝑓 stands for a nonlinear activation function 𝑦 = 𝑓(𝑥) when

having a classification task, mapping an input 𝑥 to a category or label 𝑦, and the

identity in the case of regression tasks. The goal in a feedforward neural network is

29

defining a mapping 𝑦 = 𝑓(𝒙; 𝒘) by developing a model that allows making the basis

functions 𝜑𝑗(𝒙) dependent on some adaptive parameters and then learning the value

of those parameters, along with the coefficients {𝑤𝑗}, resulting in the best function

approximation.

A more basic neural network model can be explained as a series of functional

transformations. Initially, 𝑀 linear combinations of the input variables 𝑥1, . . . , 𝑥𝐷 are

assembled.

𝑎𝑗 = ∑ 𝑤𝑗𝑖(1)

𝑥𝑖 + 𝑤𝑗0(1)

𝐷

𝑖=1

3.2

with 𝑗 going from 1 to 𝑀, and (1) indicating the layer in which the related

parameters are. The parameters 𝑤𝑗𝑖(1)

and 𝑤𝑗0(1)

stand for the weights and the biases.

The magnitudes 𝑎𝑗 refer to the activations. Each of the activations is transformed

using a nonlinear activation function ℎ as is shown in Equation 3.3.

𝑧𝑗 = ℎ(𝑎𝑗) 3.3

The values taken by 𝑧𝑗 are the corresponding outputs of the basis functions

presented in Equation 3.1, also known as the hidden units. The differentiable,

nonlinear activation functions ℎ are commonly selected to be sigmoidal functions.

However, recent developments have shown alternative activation functions as ReLu

(short for Rectified Linear Unit) to be more efficient in some cases. Following

Equation 3.1, these values are one more time linearly combined with the output unit

activations in the second layer.

𝑎𝑘 = ∑ 𝑤𝑘𝑗(2)

𝑧𝑗 + 𝑤𝑘0(2)

𝑀

𝑖=1

3.4

30

In Equation 3.4, 𝑀 represents the total number of outputs, going from 1 to 𝑀.

Ultimately, the output unit activations are again transformed, applying a suitable

activation function to output a set of network results 𝑦𝑘.

In the case of standard regression, it is common to choose the identity as the

activation function (𝑦𝑘 = 𝑎𝑘 ). Correspondingly, in the case of binary classification,

the activation function commonly used is the sigmoid function presented in

Equation 3.6.

𝑦𝑘 = 𝜎(𝑎𝑘) 3.5

𝜎(𝑎) =1

1 + exp(−𝑎)3.6

It is possible to merge the previously discussed stages to provide the overall network

function, as in the case of sigmoidal output unit activation functions presented in

Equation 3.7.

𝑦𝑘(𝒙, 𝒘) = 𝜎 (∑ 𝑤𝑘𝑗(2)

ℎ (∑ 𝑤𝑗𝑖(1)

𝑥𝑖 + 𝑤𝑗0(1)

𝐷

𝑖=1

) + 𝑤𝑘0(2)

𝑀

𝑗=1

) 3.7

It is important to note that Equation 3.7 shows a set of all weights and biases

gathered into a vector 𝒘; the bias parameters have a zero as the second subindex

and are discussed below. Subsequently, the neural network model is no more than

a nonlinear function from a set of input features {𝑥𝑖} to a set of output labels {𝑦𝑘}

parametrized by a vector 𝒘 of adaptable parameters. This function can be illustrated

as a network graph, as is shown in Figure 3.1.

31

Figure 3.1 - Network diagram for a two-layer neural network (Bishop & Nabney, 2008)

The bias parameters in Equation 3.2 can be merged into the set of weight

parameters by identifying an additional input feature 𝑥0, with 𝑥0 = 1. Then

Equation 3.2 can be re-expressed, as is shown in Equation 3.8.

𝑎𝑗 = ∑ 𝑤𝑗𝑖(1)

+

𝐷

𝑖=0

𝑥𝑖 3.8

Likewise, it is possible to merge the biases corresponding to the second layer so that

the network function can be expressed as:

𝑦𝑘(𝒙, 𝒘) = 𝜎 (∑ 𝑤𝑘𝑗(2)

ℎ (∑ 𝑤𝑗𝑖(1)

𝑥𝑖

𝐷

𝑖=0

)

𝑀

𝑗=0

) 3.9

Note that Equation 3.9 is equivalent to Equation 3.7 but presented in a more concise

manner. Moreover, Figure 3.1 presents a standard configuration of a two-layer

neural network where the bias contributions are represented as 𝑥0 and 𝑧0, for the

first and second layers, respectively.

32

3.1.1 Relation between least-squares regression and

neural net training

In finding the best value of parameters for the network, a comparison can be made

with the polynomial curve fitting approach, in which the sum of the squares error

function is minimized. For a chosen training dataset containing several feature

vectors {𝒙𝒏}, for 𝑛 = 1, . . . , 𝑁, and the corresponding set of label vectors {𝒕𝒏}, the error

function can be minimized as follows:

𝐸(𝒘) =1

2∑‖𝑦(𝑥𝑛, 𝑤) − 𝑡𝑛‖2

𝑁

𝑛=1

3.10

In the particular case of regression applications, a single target variable 𝑡 can be

considered to be able to have any real value. It is important to note that 𝑡 can be

assumed to have a Gaussian distribution with 𝑦(𝒙, 𝒘) as the related mean and 𝛽 as

the inverse variance or also known as the precision of the noise. The distribution of

𝑡 is presented in Equation 3.11.

𝑝(𝑡|𝒙, 𝒘) = 𝑁 (𝑡|𝑦(𝒙, 𝒘), 𝛽−1) 3.11

For the conditional distribution shown in Equation 3.11, it is enough to consider the

resulting output unit function as the identity since the neural network under

consideration is able to map the approximation of any continuous function from 𝒙 to

𝑦. Moreover, given a dataset of 𝑁 independent, identically distributed (i.i.d.)

features 𝑿 = {𝒙𝟏, . . . , 𝒙𝑵}, and the corresponding labels 𝒕 = {𝒕𝟏, . . . , 𝒕𝑵}, it is possible

to build the subsequent likelihood function as follows:

𝑝(𝒕|𝑿, 𝒘, 𝛽) = ∏ 𝑝(𝑡𝑛|𝒙𝒏, 𝒘, 𝛽)

𝑁

𝑛=1

33

Then, by computing the negative logarithm, the error function can be obtained, such

as:

𝛽

2∑{𝑦(𝒙𝒏, 𝒘) − 𝑡𝑛}2 −

𝑁

2𝑙𝑛

𝑁

𝑛=1

𝛽 +𝑁

2ln(2𝜋) 3.12

From Equation 3.12, it is now possible to learn the adaptable parameters 𝒘 and 𝛽.

Indeed, considering that 𝛽 and 𝒘 are constants, the minimization of Equation 3.12

is equivalent to the minimization of 𝐸(𝑤) that was described for polynomial fitting.

Thus, if one considers first the determination of 𝒘, increasing the likelihood will

decrease the sum of squares error function as is shown in Equation 3.12.

𝐸(𝑤) =1

2∑{𝑦(𝒙𝒏, 𝒘) − 𝑡𝑛}2

𝑁

𝑛=1

3.13

The resulting 𝒘 computed after decreasing the error 𝐸(𝒘) is then represented by

𝒘𝑴𝒊𝒏, since it is equivalent to the solution coming from the max likelihood. In fact,

the nonlinearity implicated in the network function 𝑦(𝒙𝒏, 𝒘) produces the

nonconvexity of the error 𝐸(𝒘) , meaning that there can be local maxima of the

likelihood, equivalent to the local minima of 𝐸(𝒘). After computing 𝒘𝑴𝒊𝒏, 𝛽 is

calculated by minimizing the negative log-likelihood, as is shown in Equation 3.13.

1

𝛽𝑀𝑖𝑛

=1

𝑁∑{𝑦(𝒙𝒏, 𝒘𝑴𝒊𝒏) − 𝑡𝑛}2

𝑁

𝑛=1

3.14

In applications where several labels are implicated, it is possible to assume that the

labels or target variables are independent and conditional on 𝒙 and 𝒘 with the same

𝛽. The cumulative distribution function (CDF) of the labels can be computed as in

Equation 3.15.

34

𝑝(𝑡|𝒙, 𝒘) = 𝑁 (𝑡|𝑦(𝒙, 𝒘), 𝛽−1𝐈) 3.15

Subsequently, for 𝐾 number of labels, the weights corresponding to the max

likelihood can be computed by minimizing the sum of the squares error function,

and 𝛽 can be found as follows:

1

𝛽𝑀𝑖𝑛

=1

𝑁𝐾∑ ||𝑦(𝒙𝒏, 𝒘𝑴𝒊𝒏) − 𝑡𝑛||2

𝑁

𝑛=1

As presented in Bishop & Nabney (2008), Bishop (2006), and Goodfellow et al.

(2016), the negative log-likelihood produces a combination of the error function

being used and the last activation function. In the case of classical regressions, one

might consider the network providing an identity activation function where 𝑦𝑘 = 𝑎𝑘

implying that:

𝜕𝐸

𝜕𝑎𝑘

= 𝑦𝑘 − 𝑡𝑘 3.16

The training of a neural net involves the minimization of the error function 𝐸, which,

when using the identity activation functions, corresponds exactly to least-square

regression. From this perspective, neural nets are a generalization of classical

regression.

3.1.2 Adjustment of parameters

During the training, the goal is indeed to find a weight vector 𝒘 that minimizes the

error function 𝐸(𝒘), also called the loss function, which is depicted in Figure 3.2. It

is important to mention that minor moves in the weight space from 𝒘 to 𝒘 + 𝛿𝒘

produce a variation in the error function such that 𝛿𝐸 ≈ 𝛿𝒘𝑻𝛻𝐸(𝒘), with 𝛻𝐸(𝒘) as

the gradient vector which points toward the direction of highest rate of increase of

the error function. The error 𝐸(𝒘) can be seen as a soft continuous function of the

35

weights, which is bounded below. Therefore, the lowest magnitude of the error

occurs at a location in the weight space where 𝛻𝐸(𝒘) = 0, at one of the critical points.

Figure 3.2 - Geometrical view of the error function E(w) as a surface sitting over a

weight space (Bishop & Nabney, 2008)

The aim of the parameter optimization during the training is to find the weights for

which the error is indeed the lowest. Nevertheless, there is typically a nonlinear

dependence between the error and both the weights and bias parameters, meaning

that there can be several critical points in the weight space where the variation of

the error is locally minima (Bishop, 2006; Goodfellow et al., 2016; Murphy, 2013).

Since finding an analytical solution for the gradient of the error is not trivial, several

iterative numerical procedures can be implemented. Most techniques for the

optimization of continuous nonlinear functions involve the initialization of the

weight vector with zero values and consequently stepping through weight space as

is presented in Equation 3.17, with 𝜏 as the current step in the iteration procedure.

𝒘(𝜏+1) = 𝒘(𝜏) + ∆𝒘(𝜏) 3.17

36

3.1.3 Gradient descent optimization

During the gradient descent optimization, the information gathered at each step of

the gradient is used to update the upcoming weights in the direction of the negative

gradient, as is presented in Equation 3.13.

𝒘(𝜏+1) = 𝒘(𝜏) − 𝜂𝛻𝐸(𝒘(𝜏)) 3.18

where 𝜂 stands for the learning rate parameter with a positive value. After each

update, the gradient is re-evaluated for the new weight vector, and the process is

repeated. It is important to point out that at each 𝜏, the entire set is fitted and

evaluated (Bishop, 2006).

Furthermore, several runs of the gradient descent with different random 𝒘 values

must be executed in order to find the lowest local minimum, and ideally the true

global minimum. The results from the gradient descent runs must be validated with

a separate chosen validation dataset; this dataset must be distinct from the training

dataset to give a fair (unbiased) evaluation of the fittings (Bishop, 2006).

Le Cun & Boser (1989) introduced an online version of the gradient descent, also

known as the stochastic gradient descent (SGD), that has been widely used,

especially for large datasets. In this case, for each data point, error functions based

on maximum likelihood having the following general form:

𝐸(𝑤) = ∑ 𝐸𝑛(𝑤)

𝑁

𝑛=1

in which 𝐸𝑛 is the error contribution associated with the 𝑛𝑡ℎ term. Stochastic

gradient descent performs the weight updates based on one data point at a time, as

is presented in Equation 3.19.

37

𝒘(𝜏+1) = 𝒘(𝜏) + 𝜂𝛻𝐸𝑛(𝒘(𝜏)) 3.19

which facilitates the updating of 𝐸(𝑤), as

𝐸(𝒘(𝜏+1)) = 𝐸(𝒘(𝜏)) − 𝐸𝑛(𝒘(𝜏)) + 𝐸𝑛(𝒘(𝜏+1)) 3.20

This SGD weight update is reiterated by cycling through the data and choosing

random points with replacement (Bishop & Nabney, 2008). Nowadays, SGD powers

most of the current deep learning models. Indeed, a frequent challenge for current

machine learning implementations is the size of the datasets, for which SGD and its

variations are especially effective.

3.1.4 Error back-propagation

Back-propagation is an effective technique for evaluating the gradient of an error

function 𝐸(𝒘) for a neural network. The effectiveness of this technique is based on

the so-called ‘local message-passing scheme’, where the data travels consecutively

forwards and backward through the network (Rumelhart & McClelland, 1986).

The back-propagation algorithm will be explained in consideration of the least-

square error function presented in Equation 3.12 for the case of a simple linear

model:

𝑦𝑘 = ∑ 𝑤𝑘𝑖𝑥𝑖

𝑖

3.21

resulting in an error function 𝐸𝑛(𝒘) with an input pattern 𝑛, such that:

𝐸𝑛 =1

2∑(𝑦𝑛𝑘 − 𝑡𝑛𝑘)2

𝑘

3.22

38

with 𝑦𝑛𝑘 taking the place of 𝑦𝑛𝑘 (𝒙𝒏, 𝒘). The gradient of this particular error

function 𝐸𝑛(𝒘) with respect to a weight 𝑤𝑖𝑗 can be observed in Equation 3.23.

𝜕𝐸𝑛

𝜕𝑤𝑖𝑗

= (𝑦𝑛𝑘 − 𝑡𝑛𝑘)𝑥𝑛𝑖 3.23

A more general formulation considers so-called feedforward neural nets, in which

the error function depends on the weight 𝑤𝑖𝑗 only via the summed input 𝑎𝑗 entering

a unit 𝑗. This favors a particular application of the chain rule for the partial

derivatives

𝜕𝐸𝑛

𝜕𝑤𝑖𝑗

=𝜕𝐸𝑛

𝜕𝑎𝑗

𝜕𝑎𝑗

𝜕𝑤𝑖𝑗

3.24

𝛿𝑗 =𝜕𝐸𝑛

𝜕𝑎𝑗

3.25

in which 𝛿𝑗 is the unit error 𝑗, or simply unit 𝑗. One can show that in a feedforward

network, a weighted sum of the inputs is calculated for all the units individually,

such that

𝑎𝑗 = ∑ 𝑤𝑗𝑖𝑧𝑖

𝑖

3.26

Then, by replacing Equation 3.21 in Equation 3.19

𝜕𝐸𝑛

𝜕𝑤𝑖𝑗

= 𝛿𝑗𝑧𝑖 3.27

Thus 𝛿𝑗 can be interpreted as the unit resulting from the weights 𝑤𝑖𝑗 when the

transformed input 𝑧𝑖 is equal to 1.

39

Figure 3.3 - Forward and backward propagation of error information (Bishop &

Nabney, 2008)

From Equation 3.27, one can observe that the multiplication between the errors 𝛿𝑗

and the values of 𝑧𝑖 provides the value of the derivative without computing it

directly. Furthermore, instead of computing the derivatives, one can calculate the

value of 𝛿𝑗 for each hidden and output unit as follows:

𝛿𝑗 ≡ 𝜕𝐸𝑛

𝜕𝑎𝑗

= ∑𝜕𝐸𝑛

𝜕𝑎𝑘

𝜕𝑎𝑘

𝜕𝑎𝑗𝑘

3.28

Then, by substituting Equations 3.3, 3.25 and 3.26,

𝛿𝑗 = ℎ′(𝑎𝑗) ∑ 𝑤𝑘𝑗𝛿𝑘

𝑘

3.29

Finally, Equation 3.29 and Figure 3.3 show that the value of the errors 𝛿 for any

hidden unit can be computed by propagating the errors backward from the following

units in the net (The function ℎ′(𝑎𝑗) emerges from the summation as a common

factor within the 𝜕𝑎𝑘

𝜕𝑎𝑗 contributions; it is written with the apostrophe, with the

intention that ℎ describes a common function, such that 𝑎𝑘 = ℎ(𝑎𝑗)).

40

3.2 Integer linear programming

Mathematical programming (MP) is the organized listing of (i.e., the programming

of) variables, sets, constraints, objective functions (to be minimized or maximized),

and potentially other mathematical constructs. The following paragraphs outline

basic concepts of mathematical programming techniques such as linear

programming (LP) and integer programming (IP).

In scenarios with linear objective function and constraints, linear programming (LP)

is a widely used technique to state and ultimately solve optimization problems.

“Linear” stands for the required linearity of the constraints and objective function

that the mathematical models must have when representing a problem. This linear

nature of LP is often satisfied in classical operation research contexts such as in the

so-called task assignment problems where the user needs to allocate limited

resources among activities of interest in an optimal way (Hillier & Lieberman,

2016).

In the general LP scenarios where the objective function and all constraints are

linear, the feasible domain is often an enclosed area bordered by linear hyperplanes,

and the optimum value is located in one of the intersections between lines in the

feasible domain. However, LP methods are not only used for solving linear

optimization problems but also for recursively solving nonlinear applications with

nonlinear objective function and/or constraints by reconstructing the model in a

proper way (Hillier & Lieberman, 2016).

Unless otherwise specified, the variables within an LP are presumed to have

continuous domains. However, in reality, resources such as operators, drilling

machines, or openings only make sense if they are expressed as integer values.

Therefore, problems that require decision variables to take integer values should be

formulated as integer programming (IP) problems. Moreover, integer programs in

which all of the objective functions and all of the constraints are linear, are often

called integer linear programs (ILP), although it is common to use IP and ILP

41

interchangeably. The only difference between LP and ILP problems is that the latter

include an additional restriction that the decision variables must take integer

values. In addition, models dealing with both integer and non-integer values for the

decision variables are known as mixed integer programming (MIP) models; the

terminology mixed integer linear program (MILP) is used exclusively for problems

having a linear objective function and constraints (Hillier & Lieberman, 2016).

3.2.1 Solution approaches

3.2.1.1 Simplex method

The simplex method is the most popular techniques for solving linear programming

problems. The method is based on an iterative search process that shifts through

the collection of extreme points (i.e., vertices) within the feasible domain, one by one,

up until reaching the optimal value or determining that the problem is unbounded

or infeasible. A test is performed at each iteration, to determine if a neighboring

vertex can offer an improvement in the objective function, or if there is an

unbounded direction in which the improvement would be unbounded; if there are no

directions that offer improvement, then the current vertex corresponds to an optimal

solution (Hillier & Lieberman, 2016).

3.2.1.2 Branch-and-bound algorithm

The approach provided by the branch-and-bound algorithm is commonly applied for

discrete optimization problems as an alternative to combinatorial enumeration, and

especially ILP and MILP. This algorithm works under the so-called ‘divide and

conquer’ concept, where the initial (large) problem is divided into smaller problems.

Subsequently, a selection of the small problems is subdivided and/or solved. Each of

these subproblems are a branch along which a search for the optimal integer

solution can be located. Indeed, this procedure of dividing and subdividing the initial

problem is also known as the branching step of the algorithm. Furthermore, the

selection or fathoming step involves bounding how good the best solution along a

branch of a smaller problem can be, and subsequently abandoning the branches

42

along which there is no possibility of finding an optimal integer solution for the

original problem.

3.2.1.1 The cutting-plane method

A cut or cutting plane for an integer programming problem is a new functional

constraint that diminishes the feasible region for the LP relaxation without

removing feasible solutions for the IP problem. This method works under the

philosophy that any IP program can have several LP formulations, which translates

into different sets of linear inequalities illustrating the original set of points (Hillier

& Lieberman, 2016).

3.2.1.2 The branch-and-cut approach

The branch-and-cut approach is generally an augmentation of the branch-and-

bound algorithm and is commonly used for ILP and MILP. More specifically, the

bounding stage is enhanced by introducing additional constraints, known as cutting

planes. In each of the subproblems, these cutting planes reduce the feasible domain

of solutions for the continuous LP relaxation but are formulated so as not to exclude

integer solutions (i.e., solutions that are genuine candidates for the original

problem). In practice these additional constraints can eliminate a considerable

number of branches, in addition to those that are eliminated through standard

bounding. In fact, state of the art MILP solvers employs branch-and-cut approaches.

43

Chapter 4

____________________________________________

Case Study – Underground Gold

Project

In this section, the proposed methods are applied to historical data from a

mechanized underground gold mine to demonstrate the potential effectiveness

artificial neural networks in the prediction of ROP. After training a neural net model

with the historical data, its predictions are used to reparametrize an ILP thereby

observing its impact on operational decision-making. The outline of the

underground operation and the details of the dataset and input parameters are

presented in the following subsections. Some labels corresponding to operational

parameters have been modified for confidentiality purposes.

4.1 Datasets

Operational data gathered from drilling and support activities in an underground

mechanized gold mine was used for training the artificial neural network (ANN).

The training data is composed of 10000 instances and 11 independent variables. The

dependent variable to be used as the target for the prediction task is the ROP. Table

4.1 presents the model features and target used in the predictive model.

Table 4.1 - Model features and target

Team Geology Shift Equipment Rock

Type

Opening

Name Activity Drillings

Length

of

Drillings

Operator Meters ROP

(m/h)

44

Furthermore, a monthly plan developed by the mine planners was provided in order

to obtain the ROP predictions with the ANN approach. The drilling plan from a

specific part of the mine was extracted. More information on the datasets used for

both training and testing can be found in the following sections. Figure 4.1 presents

the proposed steps to be developed during this case study.

Figure44.1 - Flowchart of the proposed approach

4.2 Underground operation outline

This mining project uses underground jumbo boomer drills to perform the drilling

activities in both the development and production openings of the project.

Conventional scoop-trams and dumpers are utilized in order to load and transport

the material from the underground mining operation to the processing plant.

However, the only activities within the operation that are relevant for this study are

those where jumbo boomer drills are directly involved. Furthermore, openings to be

developed are planned to be drilled and supported every shift in order to guarantee

the continuity of the mining production. Figure 4.2 shows the boomer 282, a

45

hydraulic controlled mining face drilling rig used within the mining project under

consideration.

Figure54.2 – Boomer 282 (Epiroc, 2020)

4.2.1 Mine layout

The underground project begins with the opening of a single portal entrance,

followed by the development of the main ramp that goes through weathered

saprolite and schist. As the depth of the ore body is reached, and more acceptable

rock is encountered, the main ramp is partitioned into secondary development

ramps. Ventilation drifts, shafts, and ventilation raises are also constructed to

ensure a proper ventilation system within the excavation. In addition, permanent

galleries such as muck-bays, workshops, drill stations, and safety bays are dispersed

along the openings. Once the development areas for secondary development become

available, haulage drifts and attack ramps are excavated. Attack ramps are

designed to access the stopes, and many of them link to the sublevel development.

As the attack ramps are opened, the stopes are mined in accordance with the mining

46

method. Figure 4.3 shows the underground development design, as was previously

discussed.

Figure64.3 – Mine layout (Red Eagle Mining, 2014)

4.2.2 Mining method

The mining method used in the underground project is the Mechanized Shrinkage

with Delayed Fill (MSDF). This method is identical to the ‘mechanized cut and fill’

with an additional breast blasting of the back between ore accesses. Furthermore,

instead of mucking the ore and backfilling instantly, the ore is left in place in the

stope, only cleaning enough material from the stope to remove swell. For every ore

access or lift, the attack ramp must be drilled and blasted in order to open new

access for the next lift. An important part of the MSDF is that even though cleaning

is not mandatory, support plays a key role during the exploitation of the stopes for

each lift.

47

Figure74.4 - MSDF: first and second lifts (Red Eagle Mining, 2014)

Figure84.5 – Subsequent lifts and mucking (Red Eagle Mining, 2014)

After all the lifts have been developed, then the ore is mocked out entirely, followed

by the bottom-up backfilling of the stopes and finally the backflling of the attack

ramp.

48

4.2.3 Drilling operation

Operations are carried out over two twelve-hour shifts in which the different

activities programmed in the openings must be achieved to accomplish the goals

proposed for the shift, and consequently for the day and week. The drilling activities

are performed by trained and experienced operators. Each operator has one

assistant that helps during the entire shift. Available jumbos must perform drilling

activities in order to accomplish blasting or support in the openings. Not finishing

an activity affects not only what was proposed for the current shift but also all the

upcoming activities that were planned in the following shifts. In fact, failing to drill

an opening that is programmed to be blasted means leaving the opening inactive for

the entire following shift since the blastings only occur after finishing every shift.

Consequently, blasting more than any other factor is what limits the progress of

development; thus, the time that an operator and its jumbo need to complete a

drilling task during a shift plays a crucial role when trying to achieve development

targets. Moreover, Campeau & Gamache (2019) pointed out that assuming that

tasks that are started at any of the openings of a project will be completed in a fixed

time only works when looking at these activities from a long-term point of view. In

addition, this assumption does not hold for scenarios similar to the one presented in

this report, where the duration of the activities can have significant variations

creating disparities between the planned start and end of the tasks themselves.

4.3 Prediction of rates of penetration (ROP)

With the ultimate goal of enhancing the decision-making tool, predictions of ROP

values were obtained. In this case, the predictions were achieved after training an

artificial neural network (ANN) with historical data from the mine site. In

particular, data related to the drilling operations were gathered, preprocessed, and

used for training the AI-model. The following subsections outline the preprocessing

of the data as well as the different architectures of neural networks that were

implemented and tested to determine the optimal arrangement which obtains the

lowest error in the validation set.

49

4.3.1 Data preprocessing

The dataset with the information gathered from the mine site was first read in a

pandas Data-Frame in order to compute basic statistics and comprehend the

subsequent inputs of the model. Figures 4.6 and 4.7 show portions of the correlation

matrix between features and labels using the Pearson method.

Figure94.6 - Correlation matrix, including operators and performance measures

Figure104.7 - Correlation matrix, including shift, team, rock types and performance

measures

Figures 4.6 and 4.7 show an important correlation between some jumbo operators

and their performance in terms of penetration rates and drilled holes done per

activity. Moreover, the correlation matrix reveals a logical and strong correlation

between the number of drillholes, rock types and geology classes, establishing how

important it is to have a proper classification of the geology, and geomechanics of

the rock for developing a production schedule.

50

The inputs, weights, and gradients were all implemented as TensorFlow tensors to

take advantage of the Graphics Processing Units (GPUs) available through Google

Colab for computational efficiency. In addition, the batches for SGD were selected

using a TensorFlow DataPreprocessing, and the predict function took as input the

dataset under study. The dataset with the information gathered from the mine was

subsequently transfered to Dataloaders where the iteration over batches of input

data was done automatically, being memory and speed efficient. Even when

calculating the training error on all 10000 examples in the training set, the runs

never took more than 6 minutes.

4.3.1.1 Batch-normalization

Batch normalization is widely used for optimizing artificial neural networks. Batch

normalization is a method of adaptive reparametrization applied to architectures

comprising an arrangement of multiple activation functions or layers. As it was

explained before, the gradient informs how to update each adaptable parameter by

assuming that the remaining layers do not vary. Ultimately, all of the layers are

updated simultaneously, and unexpected outputs can be detected as several

activation functions are altered, while the other functions stay constant.

For this implementation, batch normalization was applied to the batch of input

features described in the matrix 𝐻 as follows:

𝐻𝑖𝑗′ =

𝐻𝑖𝑗 − 𝜇𝑗

𝜎𝑗 4.1

in which the individual elements of 𝐻 are described by 𝐻𝑖𝑗, and 𝜇𝑗 and 𝜎𝑗 correspond

to the column means and standard deviations, respectively. The resulting elements

𝐻𝑖𝑗′ are assembled into the batch normalized matrix 𝐻′.

51

Figure114.8 - Left: ROP histogram. Right: normalized ROP histogram

4.3.1.2 One-Hot Encoding

One-hot encoding was used to handle categorical features in this implementation to

avoid higher weights being specified for higher numbers produced in methods, which

is a common problem for label encoding. The one-hot representation can be captured

by a binary vector with 𝑛 bits that are mutually exclusive (only one is allowed to be

active). In this type of encoding, the representations contain many entries but

without significant meaningful separate control over each entry (Goodfellow et al.,

2016; Nair & Hinton, 2009).

The one-hot code vectors can be defined with the variable 𝑐, where 𝑐𝑦 = 1 and 𝑐𝑖 =

0 for all other values of 𝑖. The one-hot code involves some statistical benefits when

treating all instances within a similar cluster. Moreover, one-hot encoding produces

a computational improvement when an entire representation may be captured by a

single integer. During the implementation, binary columns were created for each

category, deriving categories based on the unique values in each feature.

52

Figure124.9 - Left: sample from training data. Right: sample from training data after

one-hot encoding of the team attribute

4.3.2 Testing different architectures of neural networks

4.3.2.1 Neural network with one hidden layer

First, as a baseline, a feedforward neural network class and its methods were

written. The first feedforward neural network implemented has a single hidden

layer and an adjustable number of hidden units. The inputs correspond to the 64

features resulting after normalizing and one-hot encoding the dataset. Parameters

used in the model are defined in Figure 4.10, with the dimensions defined in Table

4.2. The final layer of the feedforward neural net presented in Figure 4.10 includes

one neuron in charge of returning a continuous numerical value. In the following

section, it will be possible to understand the accuracy of the prediction by comparing

it with the true value, corresponding to the penetration rate, which is indeed

continuous.

53

Figure134.10 – Architecture and variables

The ReLu activation on the hidden layer was chosen because of its well-behaved

gradient and proven robustness (Shalev-Shwartz & Ben-David, 2014). More

traditional choices of activation function such as the sigmoid have gradients that

saturate at large activations, which impedes a gradient descent based optimization

algorithm from working efficiently. In the following sections, the behavior of the

different activation functions will be observed, as well as the different

hyperparameters involved in the model.

Table24.2 - Dimension of parameters for the ANN. Individual parameters are

referenced by the dimension letter in lowercase as a subscript

Input, x D features + bias 64

Input weights, W M × (D+1) M × 64

Hidden units, z M units + bias M +1

Hidden weights, V C × (M+1) 1

Output, �̂� Continuous Output, C 1

54

4.3.2.2 Back-propagation

By adapting the results of Section 3.1.4, the gradient of the loss 𝐸 with respect to

the parameter matrix 𝑉 is given by:

𝜕𝐸

𝜕𝑉𝐶×𝑀

=𝜕𝐸

𝜕�̂�𝐶

𝜕�̂�𝐶

𝜕𝑢𝐶

𝜕𝑢𝐶

𝑉𝐶×𝑀

4.2

The individual partial derivatives can be calculated by first evaluating the values of

each unit in the network at the current epoch in a "forward pass", then substituting

these values into the analytic forms for the partial derivatives. The analytic form of

the partial derivatives depends on the activation functions chosen at each layer and

the loss. With the linear activation on the output layer, the partial derivative of the

loss function 𝐸 with respect to 𝑉𝐶×𝑀 can be computed as follows:

𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 {�̂� = 𝑔(𝑢) = 𝑉𝑧

𝐸(𝑦, �̂�) =1

2‖𝑦 − 𝑉𝑧‖2

2

Then, by substituting and calculating the derivative:

𝐸(𝑦, 𝑧) =1

2‖𝑦 − 𝑉𝑧‖2

2 4.3

𝜕𝐸

𝜕𝑉𝐶×𝑀

= (𝑦�̂� − 𝑦𝐶)𝑧𝑀 4.4

For numerical stability, the maximum value of the linear input 𝑢 was subtracted

from all 𝑢𝐶, which does not change the output but prevents overflows. The partial of

the loss with respect to 𝑊 can be derived similarly. Equation 4.6 presents the partial

derivative of the ReLu activation function.

55

𝜕𝐸

𝜕𝑊𝐶×𝑀

=𝜕𝐸

𝜕�̂�𝐶

𝜕�̂�𝐶

𝜕𝑢𝐶

𝜕𝑢𝐶

𝜕𝑧𝑀

𝜕𝑧𝑀

𝜕𝑞𝑀

𝜕𝑞𝑀

𝜕𝑊𝐶×𝑀

4.5

Already computed in Equation 4.4

Note that:

𝑅𝑒𝐿𝑢 {0, 𝑞𝑀 ≤ 01, 𝑞𝑀 > 0

Then,

𝜕𝐸

𝜕𝑊𝐶×𝑀

= ∑(𝑦�̂� − 𝑦𝐶)𝑊𝐶×𝑀

𝑐

𝜕𝑅𝑒𝐿𝑢(𝑞𝑀)

𝜕𝑞𝑀

𝑥𝐷 4.6

Where the derivative of the ReLu activation is zero for inputs less than or equal to

zero and one for inputs greater than zero. Note that the partial derivatives of the

loss with respect to the parameters 𝑉 are part of this equation. This aspect makes

back-propagation efficient: the required values for each step of the calculation are

pre-stored.

4.3.2.3 Performance and adjustment of

hyperparameters

The model originally was predicting only one or zeros outputs for every input, which

was solved by implementing Xavier initialization (Glorot & Bengio, 2010) on the

initial weights. They were initialized from a normal distribution with mean zero and

standard deviation of 1/√𝐷 + 1. This roughly conserved the variance of the inputs

as they propagated through the layers for better performance and helped prevent

numerical issues.

A stochastic gradient descent (SGD) algorithm was written as a baseline to perform

the weight adjustments. Within each epoch, the following updates are performed:

𝑊 ← 𝑊 − 𝜏 ∙ 𝑑𝑊 4.7

56

𝑉 ← 𝑉 − 𝜏 ∙ 𝑑𝑉 4.8

where 𝜏 is the learning rate and 𝑑𝑊 and 𝑑𝑉 are the gradients of the chosen cost

function calculated by the back-propagation algorithm. Since the task is a

regression, the mean square error (MSE) was used as the loss function, averaged

over a batch of 100 sampled training instances. The weights were adjusted until the

error on the validation set did not change by more than 0.01 for three iterations or

until the maximum number of iterations was reached. The early stopping helped to

prevent overfitting and wasteful use of computation time. The base-line optimizer

was compared to four different and widely used algorithms that employ the SGD

with variants such as adaptive learning rates and momentum. AdaGrad or adaptive

gradient is an optimizer that adapts the learning rates for each of the parameters

aiming to be updated. Learning rates are controlled in AdaGrad by dividing the

parameters by the square root of the summation corresponding to their past squared

values. Consequently, learning rates corresponding to parameters with high

gradients are usually rapidly decreased while learning rates corresponding to

parameters with low gradients receive lower decay (Duchi et al., 2011). AdaDelta is

an extension of AdaGrad, where decay is applied over the learning rate without any

requirement of manual adjusting by the user. In AdaDelta, only the first-order

information is employed to dynamically adjust the learning rate (Zeiler, 2012).

Adam or adaptive moments optimizer employs the principle of the root mean square

propagation (Tieleman et al., 2012) with an addition of momentum. In Adam, both

exponential decaying expectation over the historical squared and linear gradients

are employed (Kingma et al., 2014). Finally, the Nadam optimizer works under the

same principle as Adam, with the only difference of incorporating the Nesterov

Momentum (Dozat, 2016).

The performance of the five optimizers was assessed during the training of three

different neural network architectures, and the progression of the loss is presented

in Figures 4.11, 4.12, and 4.13.

57

Figure144.11 - Comparison of training loss evolution with different optimizers for a

one hidden layer model


two hidden layers model.


three hidden layers model

58

As outlined above, five different optimizers were compared: SGD, AdaGrad,

AdaDelta, Adam, and Nadam. For each optimizer, three networks were trained (one,

two, or three hidden layers) ten times during 50 epochs with the following

combination of parameters. Overall, one can note that the best training performance

regarding loss values and convergency is achieved by Adam.

Moreover, after selecting Adam as the optimizer for the model, the training

performance for different batch sizes (16, 32, 64, and 128) was evaluated, and the

results are presented in Figures 4.14 and 4.15.

Figure174.14 - Comparison of training loss evolution with different batch sizes for a

two hidden layers model

Figure184.15 - Comparison of training loss evolution with different batch sizes for a

three hidden layers model

59

Larger batch sizes yielded lower errors and faster convergence, potentially because

they were using more information by studying more examples. However, the larger

the batch size, the longer the optimization algorithm would take, as well as an

increase in the GPU memory utilization. As a consequence, a batch size of 64 was

chosen for all subsequent experiments. The following parameters were tuned by

implementing grid search optimization:

Table 4.3 - Combinations during hyperparameter optimization

Parameter Options Best Performance

Activation

Function ReLu, Sigmoid and Tanh ReLu

Number of

Units

100, 200, 500, 1000, 1500, 2000,

3000 and 5000 1500

Optimizer Adam, AdaGrad, AdaDelta,

SGD, Nadam Adam

Batch

Normalization True or False True

Batch Size 16, 32, 64 and 128 64

The validation loss decreased with more hidden units (a more expressive model).

The danger with expressivity is that the model will overfit; however, implementing

early stopping helped to prevent this eventuality. Moreover, it is common wisdom

in machine learning that increasing the depth is more beneficial than increasing the

width, so one can be wary of creating more hidden units than input features

(Goodfellow et al., 2016). However, Figure 13 shows that augmenting the depth of

the model produces more hyperparameters to be optimized, and therefore, a higher

computational power is required. Additionally, it was shown that implementing

batch normalization not only for the initial batches but also for all the hidden layers

produced better loss values, increasing the learning speed and the stability of the

model. Table 4.4 presents the performance of four different architectures, whereas

60

the model with three hidden layers provides better results in terms of both training

and validation.

Figure 4.16 shows how the training and validation losses are close; then, there is no

evidence of overfitting. With the tuned hyperparameters, the evolution of the

training and validation losses is shown in Figure 4.16. Note that the training loss is

over the entire 10000 training instances.

Table 4.4 - Comparison with different neural network architectures

Figure194.16 - Training and validation performance

Hidden

layers val_mse val_mape mse Mape Activation

Batch

Size

Last

Activation Loss Optimizer

1 0.000889 198.45 0.000166 1266.38 relu 32 linear mse Adam




61

4.3.3 Performance of the final architecture on the testing

set

The optimal artificial neural network model obtained in the previous subsection was

tested with a monthly drilling plan provided by the mine planners of the project

with the drilling tasks distributed between shifts. The drilling plan used for testing

the model includes information such as the geometry of the opening, the activity to

be performed whether it is drilling for blasting (i.e., for advancing the opening) or

only for support, the operator performing the task, the equipment used for the

activity, the corresponding geology and rock type of the rock to be drilled, and the

shift whether it is the day or the night shift. This information is illustrated in Table

4.5.

Table 4.5- Drilling plan example for one day of operations

Team Geology Shift Equipment Rock

Type

Opening

Name Activity Drillings

Length

of

Drillings

Operator Meters ROP

(m/h)

C 2 DAY A 2 AR25 ADVANCE 54 3.6 OP_1 3.3 0.55

C 2 DAY B 1 SR09 ADVANCE 75 4.2 OP_2 6.6 0.38

B 2 DAY C 1 SR10 ADVANCE 75 4.2 OP_3 6.6 0.38

B 4 DAY A 2 AR12 ADVANCE 54 3.6 OP_4 3.3 0.55

A 4 NIGHT A 3 AR05 ADVANCE 54 3.6 OP_5 3.3 0.55

A 2 NIGHT B 2 AR25 SUPPORT 54 3.6 OP_7 3.3 0.55

A 2 NIGHT C 3 AR23 ADVANCE 54 3.6 OP_9 3.3 0.55

A 2 NIGHT C 1 AR13 ADVANCE 54 3.6 OP_8 3.3 0.55

Figure204.17 - Features and label for the ANN model

62

Figure 4.17 also clarifies what are the independent variables to be used as input

features within the model and the dependent variable to be used as the target for

the prediction task. It is important to mention that the feature denominated

‘opening name’ indicates the geometry of the opening, where for instance, SR and

MB stand for secondary ramps and muck bays with dimensions of 5m x 5m. The

rock type corresponds to the rock mass rating (RMR) from one to four, where one

stands for very good rock, while four corresponds to very poor rocks. Team stands

for the couple performing the drilling task, referring to the operator using the

drilling machine and the assistant. As it is presented in the training set, each of the

examples is linked to the length of the drillings made during the drilling task as

well as the number of drillings, which is directly related to the dimension of the

openings. Moreover, the meters planned per shift gives a reference for what the

planers consider as ROP values for each of the tasks. The ROP values in this initial

drilling plan are no more than constant values since the meters planned for the

secondary and attack ramps are always constant as well. These values are used by

the mine planners as references and are used for every single operator, machine,

type of rock, etc. Finally, the drilling plan provides the number of meters planned

for each specific opening for each of the shifts.

Consequently, with the ANN model already trained, part of the monthly drilling

plan. The loss value found in the test set was equivalent to 0.000169. After

predicting the ROP from the neural net, predicted meters were computed from each

of the openings scheduled in the monthly plan. By comparing both ROP values, it is

notable that the predicted ROP values diverge from the ones scheduled initially in

the project. It is important to note that the predictions are calculated based on

historical data collected in the underground mine, and the predicted ROP values are

adjusted to the reality of the daily operations. Different than predictions obtained

with traditional regression methods, these ANN predictions are able to model the

nonlinear relationships presented in the training data.

63

Moreover, once the predictions for all the available combinations are obtained, the

decision-maker is able to start assessing the correct combination of operational

variables that will maximize the targets within the drilling plan. Additionally,

different decisions can be taken based on the predictions by focusing on how to re-

organize the distribution of operational variables to achieve the additional meters

that the model is predicting. On the other hand, it is also important to focus on the

openings that the model is predicting higher ROP values compared to the

conventional plan since these areas can be experiencing an overestimation of the

assigned operational resources, which can be used in the more critical parts of the

mine. In addition, these predictions can be strategically applied to a wide range of

activities within the preparation of drilling plans. In cases where the mine planners

assume a continuous state as is the case of the type of rock or the geology of the

opening, the proposed model can be useful to determine what are the real

performances to be achieved regarding the different types of rocks or geologies that

can be found during the development of the openings.

4.4 Incorporation of predictions into an integer

program

4.4.1 Integer program formulation

The model presented in this section is formulated as an integer linear programming

problem (ILP) that optimizes the planning of drilling activities while considering

historical data from an underground mining operation. The objectives are to

maximize the drilled meters per day and generate a more realistic drilling plan that

can be accomplished with greater confidence by taking into consideration historical

data.

In the following subsections, indices, parameters, and decision variables are defined.

Subsequently, the ILP formulation, including the objective function and operational

constraints, is introduced, followed by the solution approach.

64

4.4.1.1 Indices 𝒔 is a shift, 𝒔 = 𝟏, … , 𝑺

𝒐 is an operator, 𝒐 = 𝟏, … , 𝑶

𝒎 is a drilling machine, 𝒎 = 𝟏, … , 𝑴

𝒕 is a opening, 𝒕 = 𝟏, … , 𝑻

𝒂 is an activity, 𝒂 = 𝟏, … , 𝑨

4.4.1.2 Model parameters

𝑫𝒔 is the demand in terms of the number of openings 𝒕 to be drilled during a shift 𝒔

𝑳𝒕 is the length of the opening 𝒕 to be drilled

𝒅𝒕,𝒔,𝒂𝒐,𝒎

is the duration of doing an activity 𝒂 at the opening 𝒕 using a drilling machine

𝒎 during a shift 𝒔 by operator 𝒐

𝑼 is the duration of the shift 𝒔

𝒓𝒕𝒔𝒐𝒇𝒕

is a binary parameter that is equal to 1 if the opening 𝒕 is of soft rock, and 0

otherwise

𝑷 is the portion of soft rock openings to be drilled

4.4.1.3 Decision variables

𝒙𝒕,𝒔,𝒂𝒐,𝒎 = {

𝟏 if operator 𝒐 uses 𝒎 to drill the opening 𝒕 during shift 𝐬 for activity 𝒂𝟎 Otherwise

𝒚𝒔𝒐 = {

𝟏 if operator 𝒐 works during shift 𝒔𝟎 Otherwise

65

4.4.1.4 Objective function – Maximize:

∑ ∑ ∑ ∑ ∑ 𝑳𝒕 ∙ 𝒙𝒕,𝒔,𝒂𝒐,𝒎

𝑨

𝒂=𝟏

𝑺

𝒔=𝟏

𝑴

𝒎=𝟏

𝑻

𝒕=𝟏

𝑶

𝒐=𝟏

𝟒. 𝟖

The objective function is to maximize the drilled meters during a whole day of

drilling operations. Similar to what is presented in Campeau & Gamache (2019),

the objective function developed in this thesis does not consider any maximization

in terms of money since, at this short and specific level of planning (one month), the

main economic decisions have been already taken. However, resources such as

drilling machines and operators available are already determined, the mining

method was chosen, and the layout of the underground project should not be exposed

to any alteration. Therefore, changes at a scale that can influence the revenue of the

underground mining project are not many nor representative. Moreover, the mining

method used by the project implies multiple blastings to be done along the

production openings without requiring the material to be mocked out immediately,

and only one blasting is scheduled for the development openings (see the following

section). Therefore, accomplishing the targets proposed by the mine planners in

terms of openings to be drilled during an available time is the goal that this thesis

is aiming to achieve.

4.4.1.5 Model constraints

(𝑴 ∙ 𝑻 ∙ 𝑨) 𝒚𝒔𝒐 ≥ ∑ ∑ ∑ 𝒙𝒕,𝒔,𝒂

𝒐,𝒎

𝑨

𝒂=𝟏

∀𝒐, 𝒔

𝑻

𝒕=𝟏

𝑴

𝒎=𝟏

𝟒. 𝟗

𝒚𝒔𝒐 + 𝒚𝒔+𝟏

𝒐 ≤ 𝟏 ∀ 𝒐 Є 𝑶 𝒂𝒏𝒅 ∀ 𝒔 = 𝟏, … , (𝑺 − 𝟏) 𝟒. 𝟏𝟎

∑ ∑ ∑ 𝒙𝒕,𝒔,𝒂𝒐,𝒎

𝑺

𝒔=𝟏

𝑴

𝒎=𝟏

𝑶

𝒐=𝟏

≤ 𝟏 ∀ 𝒕, 𝒂 𝟒. 𝟏𝟏

66

∑ ∑ ∑ ∑ 𝒙𝒕,𝒔,𝒂𝒐,𝒎

𝑻

𝒕=𝟏

𝑨

𝒂=𝟏

𝑴

𝒎=𝟏

𝑶

𝒐=𝟏

≥ 𝑫𝒔 ∀ 𝒔 𝟒. 𝟏𝟐

∑ ∑ ∑ 𝒅𝒕,𝒔,𝒂𝒐,𝒎 ∙ 𝒙𝒕,𝒔,𝒂

𝒐,𝒎 ≤ 𝑼 ∀ 𝒔, 𝒐

𝑴

𝒎=𝟏

𝑨

𝒂=𝟏

𝑻

𝒕=𝟏

𝟒. 𝟏𝟑

∑ ∑ ∑ 𝒅𝒕,𝒔,𝒂𝒐,𝒎 ∙ 𝒙𝒕,𝒔,𝒂

𝒐,𝒎 ≤ 𝑼

𝑨

𝒂=𝟏

𝑻

𝒕=𝟏

𝑶

𝒐=𝟏

∀ 𝒔, 𝒎 𝟒. 𝟏𝟒

∑ ∑ ∑ ∑ 𝒓𝒕𝒔𝒐𝒇𝒕

∙ 𝒙𝒕,𝒔,𝒂𝒐,𝒎 ≤ 𝑷 ∑ ∑ ∑ ∑ 𝒙𝒕,𝒔,𝒂

𝒐,𝒎

𝑻

𝒕=𝟏

𝑨

𝒂=𝟏

𝑺

𝒔=𝟏

𝑶

𝒐=𝟏

𝑻

𝒕=𝟏

𝑨

𝒂=𝟏

𝑺

𝒔=𝟏

𝑶

𝒐=𝟏

∀ 𝒎 𝟒. 𝟏𝟓

𝒙𝒕,𝒔,𝒂𝒐,𝒎 ∈ 𝟎 or 𝟏 ∀ 𝒐, 𝒎, 𝒕, 𝒔, 𝒂 𝟒. 𝟏𝟔

𝒚𝒔𝒐 ∈ 𝟎 or 𝟏 ∀ 𝒐, 𝒔 𝟒. 𝟏𝟕

Constraints 4.9 and 4.10 ensure that an operator cannot work more than one shift.

Constraint 4.11 enforces that an activity cannot be done more than one time during

a shift. Constraint 4.12 sets the demand for openings that must be drilled. Setting

a demand ensures having a lower bound of openings planned to be advanced in order

to accomplish the needs of the project shift by shift without any deviation that might

occur during the operation. Constraints 4.13 and 4.14 ensure that both drilling

machines and operators cannot work more than the duration of the shift,

respectively. Limiting the work periods for both operators and machines allows the

planned development operations to be accomplished while respecting the shift

deadlines for blasting and bolting, among other activities that must be completed in

time in order to not compromise future work. Constraint 4.15 guarantees the

distribution of operations between both soft and hard rocks. This constraint avoids

situations such as when machines and operators with the best performances are

always scheduled for drilling openings with soft rocks, which will logically assure

more meters drilled, but it is not specifically what the operations required.

Constraints 4.16 and 4.17 are the variable bounds. Note that the soft and hard rock

67

classification used in this formulation is based on the RMR forecasted by the mine

planners to be drilled; rocks with RMR values were classified on a scale where the

ones located below 35 were understood as soft and the ones above 35, were classified

as hard. All the drilling machines are assumed to have the same specifications. The

characteristics of these drilling machines are outlined in section 4.2. In addition, no

degradation of equipment nor wear of the bit are considered. The parameter defining

the duration of each activity is obtained using the ANN.

4.4.2 Evaluating the impact of the ANN predictions in the

ILP results

The integer programming model described in section 4.4.1 is solved using the

branch-and-cut algorithm implemented in IBM´s CPLEX package, Studio version

12.7.1.0 in a Google Colab (python) environment. The final solution is obtained in

nearly 2.5 hours.

Table 4.66- Computational performance of the proposed model, number of decision

variables and number of constraints

Variables Constraints Time required to solve the problem

11566 53245 2.5 hours

Table74.7 - Cumulative results for the case study

Scenario Obj

Original Plan (S1) 481.95

New plan + No predictions (S2) 486.49

New plan + Predictions (S3) 528.90

68

Figure214.18 – Results of the case study

The three scenarios used to simulate and compare the effectiveness of the different

drilling plans and their optimization results are shown in Figure 4.18. The first

scenario corresponds to the initial drilling plan developed by the mine planners,

where the meters planned for each opening and the time that the task will take are

considered as constant. The second scenario results after optimizing over all the

possible combinations of variables available but without accounting for the

prediction of the ROP values. The third scenario is given by the optimization model

but in this case, it accounts for the prediction of the ROP values based on the

operational historical data.

It is shown that scenario 2 provides better values for the objective for some days,

but the distribution of the variables is still not optimal. Moreover, it can be noticed

that the solution value for the third scenario is significantly higher than the values

presented in the remaining scenarios. This can be explained by the fact that the

third scenario incorporates the predictions of ROP values based on historical data,

meaning that the performances used to build the drilling plans in terms of meters

per shift are closer to the reality of the operation. It was presented before that the

0

5

10

15

20

25

30

35

40

45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

MET

ERS

DAYS

Objective

New plan + No Predictions Original Plan New Plan + Predictions

69

predictions obtained with the ANN model had an outstanding performance on the

testing set. Although independent variables as the equipment or the geology

represent considerably more information as the state or availability of the

equipment or geomechanical characteristics of the rocks, future studies can consider

the integration of these more detailed parameters in order to achieve even higher

performances in the prediction tasks.

70

Chapter 5

_________________________________________________

Conclusions and Future Work

5.1 Conclusions and objectives met

This thesis has shown that after testing the proposed model on a scenario that

incorporates predictions of ROP values based on historical data, the drilled meters

per day can increase for an underground mining project. These results suggest that

a more accurate drilling plan can be generated while mitigating the deviations that

plans created by the mine planners usually present by not having sufficient

operational information or by not accounting for it. Moreover, the model

demonstrates that accounting for operational information is valuable for building

plans that use the optimal distribution of the resources in order to outperform

conventional plans. Further down, the outcomes of this thesis are stated in terms of

the pertinent literature review performed and the initial objectives in Section 1.1.

1. Review the literature pertaining to the applications of artificial intelligence

algorithms in the mining industry.

A literature review was presented on the topics pertinent to the developments of

methods for predicting rates of penetration, specifically AI-based methods and their

benefits on this practice. One of the main observations in Sections 2.1 and 2.2 was

the inherent limitations of traditional models. These techniques are unable to model

complex and nonlinear relationships between crucial features. Therefore, these

models fail to understand those relationships and are often incapable of approaching

the problems or tasks in a more efficient manner. AI-based models are introduced

71

to overcome these drawbacks, and over time, have been shown to improve the

accuracy of the predictions. The review of the mathematical programming models

for optimizing short-term activities in Section 2.2 demonstrates a lack of

applications of these approaches in underground mines. Furthermore, very few

works have been developed, and failing to address the challenges of scheduling the

different activities presented within all the mining methods in underground

operations. Therefore, a crucial emphasis of future optimization for short-term

activities in underground projects should be the generation of more models that

account for the different underground mining methods, their corresponding mining

cycles, and their variants in terms of the operations and equipment.

2. Predict the rates of penetration (ROP) corresponding to underground

drilling machines using an artificial neural network (ANN) approach.

Different architectures of ANN are implemented and tested to predict rates of

penetration from a training set of operational features coming from an underground

mining project. An optimal architecture for an artificial neural network was found

with 3 hidden layers, 1500 hidden units, Adam as the optimizer, ReLu as the

activation function, batch normalization in all the layers, and a batch size equal to

64. A validation error of 0.000125 was obtained with the resulting architecture, and

the model was then used on a monthly drilling plan provided by the underground

mining project to predict the rates of penetration corresponding to the available

input features.

3. Develop an integer programming (IP) formulation for the optimization of

the drilling activities.

A formulation was proposed to optimize the drilled meters per day in an

underground mining project while accounting for operational constraints and

incorporating the predicted rates of penetration. The proposed approach

incorporates different operational constraints such as not allowing the best

72

machines to work always in soft openings or to vary the kind of activities that a

machine is doing during a day in order to realistically address problems that can be

encountered in the operation. The model was parametrized to represent a

mechanized underground gold project and the resulting problem was solved in

nearly 2.5 hours using a general-purpose solver (CPLEX).

4. Demonstrate by running the IP optimization model for scenarios with and

without the incorporation of the ROP predictions, it is possible to evaluate

and determine if these predictions should be implemented and included in

the computation of future drilling plans or not.

Both the artificial neural network predictions for the ROP and the integer

programming problem developed in Section 4.4 are applied to a realistic mining case

study in order to demonstrate the evaluation of the impact that an ANN predicter

of ROP can have on operational decision-making. The optimizer is able to efficiently

generate schedules of drilling activities that are more tied to the reality of the

operation and can take advantage of the performance of the individuals within the

different types of openings in terms of types of rock, geologies, and dimensions. The

resulting formulation is compared to a more traditional approach. This traditional

approach uses averaged inputs for operator/machine performances. The

incorporation of ANN predictions of ROP tends to increase the effectiveness of

assigning activities to the different resources and thus to maximize the drilled

meters per day within the operation.

73

5.2 Future work

The formulation presented in this thesis is successful in maximizing the drilled

meters per day to an underground mining operation while accounting for

operational constraints and incorporating predicted ROP values. However, the

approach presented is not without limitations.

Both the ANN neural network approach and the optimization model, developed in

Chapters 3 and 4, present three crucial venues for future work. First, a mine can be

seen as a dynamic system that suffers from several variations and changes after

every shift. Furthermore, rotation between operators, retirements, resignations,

and constant hiring of operators make this system even more challenging over time.

On the other hand, the proposed optimization formulation also needs to be re-

optimized, as new data becomes available. That being said, both predictions and

optimization model require continuous updates, and there are different

reinforcement learning approaches that can help to continuously account for new

data while offering the optimal objectives that a decision-maker might require.

Second, information regarding equipment availability can also be interesting to be

included within the proposed formulation and will help to build even more accurate

drilling plans. Third, the scope of the case study presented in Chapter 4 addresses

the complexity of only one part of the entire underground operations. Moreover, the

MSDF allows for only modeling drilling activities. Then, a larger case study

accounting for more activities will be worthwhile to investigate in order to measure

the predictive power of the tool as well as the time required for obtaining an optimal

coordination of drilling with other critical mining operations.

74

References

Amar, K., & Ibrahim, A. (2012). Rate of penetration prediction and optimization

using advances in artificial neural networks, a comparative study. In:

Proceedings of the 4th international joint conference on computational

intelligence. p.647–52.

Al-AbdulJabbar, A., Elkatatny, S., Mahmoud, M., Abdelgawad, K., & Al-Majed, A.

(2018). A robust rate of penetration model for carbonate formation. ASME. J.

Energy Resour. Technol.

Al-Jalil, Y. A. (1998). Analysis of Performance of Tunnel Boring Machine-based

Systems. The University of Texas at Austin, 427 p.

Arabjamaloei, R., & Shadizadeh, S. (2011). Modeling and optimizing rate of

penetration using intelligent systems in an Iranian southern oil field (Ahwaz

oil field). Pet. Sci. Technol. 29:1637–48.

Armaghani, D., Mohamad, E., & Narayanasamy, M. (2017). Development of hybrid

intelligent models for predicting TBM penetration. rate in hard rock

condition. Tunn Undergr Sp Technol 63: 29–43.

Awuah-Offei, K. (2016). Energy efficiency in mining: a review with emphasis on the

role of operators in loading and hauling operations. Journal of Cleaner

Production, 1-9.

Barton, N. (1999). TBM performance estimation in rock using QTBM. Tunnels and

Tunneling International 31 (9), 30–33.

Barton, N. (2000). TBM Tunneling In Jointed and Faulted Rock. Balkema,

Rotterdam, 173.

Bataee, M., & Mohseni, S. (2011). Application of artificial intelligent systems in ROP

optimization: a case study in Shadegan oil field. Paper presented in the SPE

Middle East Unconventional Gas Conference and Exhibition, Muscat, Oman.

Society of Petroleum Engineers. 31 January-2 February. SPE-140029-MS.

75

Benardos, A., & Kaliampakos, D. (2004). Modelling TBM performance with artificial

neural networks. Tunn Undergr Sp Technol 19:597–605.

Bilgesu, H., Tetrick, L., Altmis, U., Mohaghegh, S., & Ameri, S. (1997). A new

approach for the prediction of rate of penetration (ROP) values. Paper

presented at the SPE Eastern Regional Meeting, Lexington, Kentucky, 22–

24 October. SPE-39231-MS. https://doi.org/10.2118/39231-MS.

Bingham, M. (1965). A new approach to interpreting rock drillability. Petroleum

Publishing Company.

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. volume 4 of

Information science and statistics. Springer.

Bishop, C. M., & Nabney, I. T. (2008). Pattern Recognition and Machine Learning:

A Matlab Companion. Springer. In preparation.

Blindheim, O. (1979). Boreability Predictions for Tunneling. Ph.D. Thesis,

Department of Geological Engineering, The Norwegian Institute of

Technology, p.406.

Bourgoyne, A., & Young, F. (1974). A multiple regression approach to optimal

drilling and abnormal pressure detection. Soc. Pet. Eng. J ;14:371–84. doi:

https://doi.org/10.2118/4238-PA.

Bruines, P. (1988). Neuro-fuzzy modelling of TBM performance with emphasis on

the penetration rate. Memoirs of the Centre of Engineering Geology, Delft,

no 173.

Bruland, A. (1998). Hard Rock Tunnel Boring. Ph.D. Thesis, vol. 1–10, Norwegian

University of Science and Technology (NTNU), Trondheim, Norway.

Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition.

Knowledge Discovery and Data Mining.

Campeau, L.-P., & Gamache, M. (2019). Short-term planning optimization model for

underground mines. Computers and Operations Research.

Cassinelli, F., Cina, S., Innaurato, N., Mancini, R., & Sampaolo, A. (1982). Power

consumption and metal wear in tunnel-boring machines: analysis of tunnel-

boring operation in hard rock. Tunnelling ‘82, London. Inst. Min. Metall., 73–

81.

76

Cristianini, N., & Shawe-Taylor, J. (2000). Support vector machines and other

kernel-based learning methods. Cambridge University Press.

Dozat, T. (2016). Incorporating Nesterov Momentum into Adam.

Duchi, J., & E. Hazan, a. Y. (2011). Adaptive subgradient methods for online

learning and stochastic optimization. JMLR.

Eckel, J. R. (1967). Microbit Studies of the Effect of Fluid Properties and Hydraulics.

Journal of Petroleum Technology, pp. 514-546.

Elkatatny, S., Tariq, Z., Mahmoud, M., & Al-AbdulJabbar, A. (2017). Optimization

of rate of penetration using artificial intelligent techniques. Paper presented

at the Rock Mechanics/Geomechanics Symposium, San Francisco, California,

USA, 25-28 June. ARMA-2017-0429.

Epiroc (2020). Epiroc products. Retrieved from https://www.epiroc.com/en-

gr/products/drill-rigs/face-drill-rigs/boomer-282.

Farmer, I., & Glossop, N. (1980). Mechanics of disc cutter penetration. Tunnels

Tunneling;12(6):22–5.

Farrokh, J. R. (2012). Study of various models for estimation of penetration rate of

hard rock TBMs. Tunnelling and Underground Space Technology 30 (2012)

110–123.

Ghasemi, E., Yagiz, S., & Ataei, M. (2014). Predicting penetration rate of hard rock

tunnel boring machine using fuzzy logic. Bull Eng Geol Environ 73:23–35.

Gholamnejad, J., & Tayarani, N. (2010). Application of artificial neural networks to

the prediction of tunnel boring machine penetration rate. Mining Science and

Technology. 0727–0733.

Glorot, X., & Bengio., Y. (2010).

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.

Retrieved from http://www.deeplearningbook.org

Graham, P. (1976). Rock exploration for machine manufacturers, in exploration for

rock engineering. In: Bieniawski, Z.T. (Ed.), Proceedings of the Symposium,

vol. 1, Johannesburg, Balkema, pp. 173–180.

77

Grandori, R., Sem, M., Lembo-Fazio, A., & Ribacchi, R. (1995). Tunnelling by double

shield TBM in the Hong Kong granite. In: Proceedings of the 8th ISRM

Congress, vol. 1, pp. 569–574.

Grima, M., Bruines, P., & Verhoef, P. (2000). Modeling tunnel boring machine

performance by neuro-fuzzy methods. Tunn Undergr Sp Technol 15:259–269.

Hegde, C., Daigle, H., & Gray, K. (2018). Performance comparison of algorithms for

realtime rate-of-penetration optimization in drilling using data-driven

models. SPE J. 23:1706–22.

Herbrich, R. (2002). Learning Kernel Classifiers. MIT Press.

Hughes, H. (1986). The relative cuttability of coal measures rock. Mining Sci.

Technol. 3, 95–109.

Hillier, F., & Lieberman, G. (2016). Introduction to operations research. San

Francisco: Holden-Day: Chapter 3.

Hillier, F., & Lieberman, G. (2016). Introduction to Operations Research. Chapter

12.

Innaurato, N., Mancini, R., Rondena, E., & Zaninetti, A. (1991). Forcasting and

effective TBM performance in a rapid excavation of a tunnel in Italy. In:

Wittke W, editor. Proceedings of the 7th international cong rock mechanics.

p. 1009–14.

Ioffe, S. a. (2015). Batch normalization: Accelerating deep network training by

reducing internal covariate shift.

Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization.

Koopialipoor, M., Fahimifar, A., Ghaleini, E., Momenzadeh, M., & Armaghani, D.

(2020). Development of a new hybrid ANN for solving a geotechnical problem

related to tunnel boring machine performance. Engineering with Computers.

Laughton, C. (1998). Evaluation and Prediction of Tunnel Boring Machine

Performance in Variable Rock Masses. Ph.D. Thesis, The University of Texas,

Austin, USA.

Le Cun, Y., & B. Boser, J. S. (1989). Backpropagation applied to handwritten zip

code recognition (Vols. 541–551). Neural Computation.

78

Lumley, G. (2005). Reducing the Variability in Dragline Operator Performance. Coal

Operator´s Performance (pp. 97-106). Wollogon: in Aziz, N.

Mahdevari, S., Shahriar, K., Yagiz, S., & Shirazi, M. (2014). A support vector

regression model for predicting tunnel boring machine penetration rates. Int

J Rock Mech Min Sci 72:214–229.

Maurer, W. (1962). The ‘‘perfect-cleaning” theory of rotary drilling (Vol. 14).

doi:https://doi.org/10.2118/408-PA.

Müller, K. R., S. Mika, G. R., K. Tsuda, & Schölkopf, B. (2001). An introduction to

kernel based learning algorithms. IEEE Transactions on Neural Networks.

McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in

nervous acactivity. Reprinted in Anderson and Rosenfeld (1998): Bulletin of

Mathematical Biophysics.

McFeat-Smith, I. (1999). Mechanised tunnelling for Asia. Work shop manual,

organized by IMS Tunnel Consultancy Ltd.

McFeat-Smith, I., & Tarkoy, P. (1979). Assessment of tunnel boring performance.

Tunnels and Tunneling, 33-37.

Murphy, K. P. (2013). Machine learning : a probabilistic perspective. MIT Press,

Cambridge, Mass.

Nair, V., & Hinton, G. E. (2009). 3d object recognition with deep belief nets. Culotta,

editors, Advances in Neural Information Processing Systems.

Nehring, M. ,. (2012). Integrated short-and medium-term underground mine

production scheduling. J. South. Afr. Inst. Min. Metall. 112(5), 365–378.

Nelson, P., Abad-Aljali, Y., & Laughton, C. (1999). Improved strategies for TBM

performance prediction and project management. In: RETC, pp. 963–979.

Nelson, P., Ingraffea, A., & O'Rourke, T. (1985). TBM performance prediction using

rock fracture parameters. Int. J. Rock Mech. Min. Sci. Geomech. Abstr. 22

(No. 3), 189–192.

Newman, A., Rubio, E., Caro, R., Weintraub, A., & Eurek, K. (2010). A review of

operations research in mine planning. Interfaces 40 (3), 222–245.

79

O'Sullivan, D., & Newman, A. (2015). Optimization-based heuristics for

underground mine scheduling. Eur. J. Oper. Res. 241 (1), 248–259.

https://doi.org/10.1016/j. ejor.2014.08.020.

Okubo, S., Kfukui, K., & Chen, W. (2003). Expert system for applicability of tunnel

boring machines in Japan. Rock Mechanics and Rock Engineering, 36 (4). pp.

305-322.

Ozdemir, L. (1978). Development of theoretical equations for predicting tunnel

borability. PhD thesis, Colorado School of Mines, Golden, Colorado.

Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptrons and the Theory of

Brain Mechanismn. Spartan.

Rostami, J. (1997). Development of a Force Estimation Model for Rock

Fragmentation with Disc Cutters Through Theoretical Modeling and

Physical Measurement of Crushed Zone Pressure. Ph. D. Thesis, Colorado

School of Mines, Golden, Colorado, USA, P. 249.

Rostami, J., & Ozdemir, L. (1993). A new model for performance prediction of hard

rock TBM. In: Bowerman, L.D., et al. (Eds.), Proceedings of RETC, Boston,

MA, pp. 793-809.

Rumelhart, D. E., & J. L. McClelland, a. t. (1986). Parallel Distributed Processing:

Explorations in the Microstructure of Cognition. MIT Press: Volume 1:

Foundations.

Kahraman, M. I. (2006). Performance prediction of a jumbo drill in Pozanti–Ankara

Motorway Tunnel (Turkey). Tunnelling and Underground Space Technology,

265.

Salaheldin, E. (2018). New approach to optimize the rate of penetration using

artificial neural network. Arabian J Sci Eng. 43(11):6297–304.

Sapigni, M., Berti, M., Bethaz, E., Busillo, A., & Cardone, G. (2002). TBM

performance estimation using rock mass classifications. Int J Rock Mech Min

Sci; 39: 771–88.

Schölkopf, B., A.Smola, C.Williamson, R., & Bartlett, P. L. (2000). New support

vector algorithms. Neural Computation.

80

Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: from

theory to algorithms. Cambridge University Press.

Soares C, D. H. (2016). Evaluation of PDC bit ROP models and the effect of rock

strength on model coefficients. J. Nat. Gas Sci. Eng; 34:1225–36.

Sundaram, N., Rafek, A., & Komoo, I. (1998). The influence of rock mass properties

in the assessment of TBM performance. Rotterdam: Balkema: In:

Proceedings of the 8th international IAEG congress. p.3353–9.

Song, Z., Rinne, M., & Wageningen, A. v. (2013). A review of real-time optimization

in underground mining production. J. South. Afr. Inst. Min. Metall. 113 (12),

889-897.

Tieleman, T., & Hinton, G. (2012). Rrmsprop: Divide the gradient by a running

average of its recent magnitude. COURSERA: Neuralnetworks for machine

learning, vol. 4, no. 2.

Vapnik, V. N. (1995). The nature of statistical learning theory. Springer.

Widrow, B., & Hoff, M. E. (1960). Adaptive switching circuits. In IRE WESCON

Convention. Record, Volume 4, pp. 96–104.

Yagiz, S. (2008). Utilizing rock mass properties for predicting TBM performance in

hard rock condition. Tunn. Undergr. Space Technol. pp. 326-339.

Yagiz, S., & Karahan, H. (2011). Prediction of hard rock TBMpenetration rate using

particle swarm optimization. Int J Rock Mech Min Sci 48:427–433.

Yagiz, S., Gokceoglu, C., Sezer, E., & Iplikci, S. (2009). Application of two nonlinear

prediction tools to the estimation of tunnel boring machine performance. Eng

Appl Artif Intell 22:808–814.

Zeiler, M. D. (2012). Adadelta: an adaptive learning rate method.

Zhao, Z., Gong, Q., Zhang, Y., & Zhao, J. (2007). Prediction model of tunnel boring

machine performance by ensemble neural Networks. Geomech. Geoeng. – Int.

J. pp. 123-128.

Zhou, J., Bejarbaneh, B., Armaghani, D., & Tahir, M. (2019). Forecasting of TBM

advance rate in hard rock condition based on artificial neural network and genetic

programming techniques. Bulletin of Engineering Geology and the Environment.

integer programming to evaluate operational impact of

Documents