artificial neural network identification and control of the inverted

Artificial Neural Network

identification and control

of the inverted pendulum

Tim Callinan

August 2003

Acknowledgements

I would like to thank my supervisor Jennifer Bruton for her help, guidance and support

throughout the project.

Thank you to Conor Maguire for helping me with the inverted pendulum rig and lending me

many of his manuals and books.

Thank you to Anthony Holohan for allowing me to experiment on the inverted pendulum rig.

Declaration

I hereby declare that, except where otherwise indicated, this document is entirely my own

work and has not been submitted in whole or in part to any other university.

Signed:……………………………………………………. Date:………………………

Abstract

This project takes the area of Artificial Neural Networks (ANN) and applies it to the inverted

pendulum control problem. The inverted pendulum is typically used to benchmark new control

techniques, as it’s a highly non-linear unstable system. Neural networks have unique

characteristics, which enable them to control non-linear systems. Feedforward and Recurrent

neural networks are used to model the inverted pendulum. Multi-output online identification

was also researched. A neuro-controller for the inverted pendulum was developed. Traditional

control methods were utilized to develop a control law to stabilize the inverted pendulum. A

feedforward network was trained to mimic the control law. Tbe neuro-control shows that if a

disturbance occurs in the system, the neural network learns to counteract this disturbance.

Finally the knowledge learned in identification and control was applied to the real time

inverted pendulum rig. An online adaptive neural network was developed to model the real

time system.

Table of Contents

1 Introduction ..........................................................................................5

Outline of the document ..........................................................................6

2 Inverted Pendulum...............................................................................7

3 Artificial Neural Networks...................................................................18

Advantages of ANN’s ............................................................................19

Types of Learning .................................................................................20

Neural network structures .....................................................................20

Multi-layered perceptrons......................................................................22

4 System Identification ..........................................................................24

System identification procedure............................................................25

Linear identification of the system.........................................................28

Non-Linear identification of the system.................................................36

Non-linear Identification using neural networks....................................37

Multi-output identification ......................................................................46

5 Neural control of the inverted pendulum............................................51

Neural-control in simulink:.....................................................................58

6 Real-time identification and control....................................................67

7 Conclusions........................................................................................75

Summary...............................................................................................75

Scope for future work ............................................................................79

8 Bibliography .......................................................................................80

11 IInnttrroodduuccttiioonn

The process used in this project is the inverted pendulum system. The inverted pendulum is a

highly nonlinear and open-loop unstable system. This means that standard linear techniques

cannot model the nonlinear dynamics of the system. When the system is simulated the

pendulum falls over quickly. The characteristics of the inverted pendulum make identification

and control more challenging. There are two main aims of the project. The first is to develop

an accurate model of the inverted pendulum system using neural networks. The second aim is

to develop a neural network controller which determines the correct control action to stabilize

the system, but can also learn from experience.

System identification is the procedure that develops models of a dynamic system based on the

input and output signals from the system. The input and output data must show some of the

dynamics of the process. The parameters of the model are adjusted until the output from the

model is similar to the output of the real system. In order to develop an accurate model of the

inverted pendulum, different methods (linear and nonlinear) of identification will be tested.

One of the problems encountered early in the project is collecting experimental data from the

inverted pendulum system. The output data from the unstable system does not show enough

information or dynamics of the system. Feedback controllers are developed which stabilize the

system before identification can take place.

Neural networks have shown great progress in identification of nonlinear systems. There are

certain characteristics in ANN which assist them in identifying complex nonlinear systems.

ANN are made up of many nonlinear elements and this gives them an advantage over linear

techniques in modelling nonlinear systems. ANN are trained by adaptive learning, the network

‘learns’ how to do tasks, perform functions based on the data given for training. The

knowledge learned during training is stored in the synaptic weights. The standard ANN

structures (feedforward and recurrent) are both used to model the inverted pendulum.

The main task of this project is to design a neural network controller which keeps the

pendulum system stabilized. There are 3 main types of neural control – supervised, direct

inverse and unsupervised.

Supervised learning uses an existing controller or human feedback in training the neural

network. In order to train the neural network to imitate an existing controller a vector of inputs

and control targets from the controller must be collected. With supervised control, a neural

network could be trained to imitate a robust controller. The robust controller can operate

correctly, if the process operates around a certain point. The neuro-controller operates

similarly to the robust controller but can also adapt if any disturbance occurs in the system.

Direct inverse control does not require an existing controller in training. A neural network is

trained to model the inverse of the process. The neural network is cascaded with the process.

Theoretically if the inverse model is very accurate, the nonlinearities in the ANN will cancel

out the nonlinearities in the process.

OOuuttlliinnee ooff tthhee ddooccuummeenntt

Chapter 2 details the research on the inverted pendulum system. The dynamic system

equations (linear and nonlinear) are derived. The simulink models of the linear and nonlinear

systems are developed. The development of the feedback controllers to stabilize the system is

also discussed. Chapter 3 covers the theory, structure and operation of artificial neural

networks. Chapter 4 covers the whole area of system identification. The procedure of system

identification is discussed first. Linear identification techniques are applied to the linear

system. Nonlinear identification using neural networks is then reported. Chapter 5 details the

development of the neuro-controller. Chapter 6 discusses the real time identification and

control using the inverted pendulum rig. Finally Chapter 7 provides a summary of the work

discussion of the results and scope for future work.

22 IInnvveerrtteedd PPeenndduulluumm

The inverted pendulum system is a classic control problem that is used in universities around

the world. It is a suitable process to test prototype controllers due to its high non-linearities

and lack of stability. The system consists of an inverted pole hinged on a cart which is free to

move in the x direction. In this chapter, the dynamical equations of the system will be derived,

the model will be developed in simulink and basic controllers will be developed. The aim of

developing an inverted pendulum in simulink is that the developed model will have the same

characteristics as the actual process. It will be possible to test each of the prototype controllers

in the simulink environment. Before the inverted pendulum model can be developed in

simulink, the system dynamical equations will be derived using ‘Lagrange Equations’. [1] The

Lagrangian equations are one of many methods of determining the system equations. Using

this method it is possible to derive dynamical system equations for a complicated mechanical

system such as the inverted pendulum. Figure. 1 is a free-bodied diagram of the pendulum

system.

The Lagrange equations use the kinetic and potential energy in the system to determine the

dynamical equations of the cart-pole system.

M – Mass of the cart

m – mass of the pole

l – length of the pole

f – control force

Fig. 1: Free body diagram of the inverted pendulum system

The kinetic energy of the system is the sum of the kinetic energies of each mass. The kinetic

energy, 1T of the cart is

The pole can move in both the horizontal and vertical directions so the pole kinetic energy is

From the free body diagram 2y and 2z are equal to

The total kinetic energy, T of the system is equal to

Equation 3 and 5 are inputted into equation 7 to give equation 8.

The potential energy, V of the system is stored in the pendulum so

The Lagrangian function is

)(21 2

••

+= zymT

•••

)(21 2

21 zymyMTTT

θsin2 lyy +=

θθ cos2

•••

+= lyyθcos2 lz =

θθ sin2

••

−= lz

••••2222 cos2

θθθ llyymyMT

θcos2 mglmgzV ==

••••

−+++=−= θθθθ cos21

cos)(21 222 mglmlymlymMVTL

(Eq.1)

(Eq.2)

(Eq.3) (Eq.4)

(Eq.7)

(Eq.8)

(Eq.9)

(Eq.5) (Eq.6)

(Eq.10)

= 21 2

The state-space variables of the system are y andθ , so the Lagrange equations are

The above equations (Eq. 13-16) are inputted into the Lagrange equations (Eq. 11-12) and this

results in the non-linear dynamical equations for the inverted pendulum system, which are

shown below.

fmlmlymM =−++ θθθθ sincos)( 2&&&&&

0sin.sin.cos 2 =−+− θθθθθ mglmlymlyml &&&&&&

Some of the modelling and control techniques involved in the project are linear so these

equations must be linearized. It is possible to linearize these equations by approximating

cosθ =1 and sinθ =0. It is assumed that θ is kept small. The quadratic terms are also

negligible. Therefore the two linear system equations are

y −=&&

θθ gMl

+−=&&

∂−

∂•

••

• ++=∂

∂θθcos)( mlymM

••

• +=∂

∂θθ

2cos mlymlL

sinmglL

∂−

∂•

(Eq.11)

(Eq.13)

(Eq.14)

(Eq.15)

(Eq.16)

(Eq.17)

(Eq.18)

(Eq.19)

(Eq.20)

(Eq.12)

At this stage, a set of equations (linear & non-linear) describing the inverted pendulum have

been developed. The next stage is constructing a simulnk model of the inverted pendulum

system. There is no procedure for developing simulink models from dynamical state

equations. The diagram below is the linear pendulum model. This model is constructed using

integrators, gain blocks, etc. The model (Fig. 2) is simply a simulink representation of the

linear state equations.

The non-linear pendulum system (Fig. 4) is shown in the next page. The non-linear system,

even though it is more complicated is developed in a similar manner. Both models are large so

it is possible to encapsulate them in subsystem blocks shown below (Fig. 3). Both the models

are set-up using a mask. The mask makes it possible to change the values of m, l, g, etc for

different simulations. The mass of the cart, M is set to 1.2 Kg, the mass of the pendulum is set

to 0.11 Kg and the length of the pendulum is 0.4 meters. These figures are taken from the real

time inverted pendulum rig.

Fig. 2 : Simulink model of the linear pendulum system

Fig. 3 : Simulink blocks of the pendulum systems

The following simulink diagram is the non-linear pendulum model.

Fig. 4 : Simulink model of the nonlinear pendulum system

Both pendulum models are simulated in simulink. The angle of the pendulum is shown below

(Fig. 5). The simulation shows that the pendulum goes unstable and falls over.

One of the requirments in system identification is the collection of ‘information rich’

input/output data. The graph above (Fig. 5) of the pendulum angle does not give us enough

information on the pendulum system.- The pendulum falls over too quickly. In order to

adequatly model the inverted pendulum it is necessary to stabilize it using a feedback

controller. Using a feedback controller, the output data will contain more information

describing the process. [2]

0 20 40 60 80 100 120-100

Fig. 5: Open loop response of the inverted pendulum

Pendulum angle, deg.

A full state feedback controller is developed to stabilize the linear pendulum system. The

linear system could have been stabilized using many different methods (PID,etc). The full-

state feedback controller stabilizes the system by positioning the closed loop poles in the

stable region. The simulink model with controller is shown below. (Fig.6)

The linear pendulum system is simulated, the angle of the pendulum is shown below. (Fig.7)

The stabilized system with controller keeps the pendulum angle stable. The pendulum can be

simulated for longer times. The data is also of better quality for system identification

purposes.

0 50 100 150 200 250 300 350-0.8

Fig. 6 : Simulink diagram of Linear Pendulum and controller

Fig. 7: Closed loop response of the inverted pendulum with controller

Developing a controller for the non-linear pendulum is more difficult. Linear control

techniques such as the PID, full-state feedback were tested but had no success in controlling

the non-linear pendulum. A feedback linearisation controller was developed to control the

non-linear pendulum system. Feedback linearisation cancels the non-linearities in the

pendulum system so that the closed loop system is more linear.

The following equations are a control law developed for the inverted pendulum controller. The

first four equations (Eq. 21-24) are entered into the main equation. The main equation (Eq. 25)

calculates the required force, U to keep the pendulum stable.

For the simulations M, m, l, g are set to the values of the pendulum model. The following

numeric values are used: M = 1.2 Kg, m = 0.1 Kg, l = 0.4 m, g = 9.81 m/s, 1k = 25, 2k = 10,

1C = 1, 2C = 2.6. Also dx = 0 meters and dθ = 0 rad, which are the desired position of the cart

and angle of the pendulum respectively. For details on all the parameters see [4]. A simulink

model of the above control law was developed and is shown in Figure 8.

θsin43

h = θcos43

2 lh =

••

−= xfglmf θθθ 2sin

−+= θ2

2 cos43

1212112

2 )()( fxcxxckkhhf

u dd −

+−++−+=

••

θθθ

(Eq.22)

(Eq.23)

(Eq.25)

(Eq.21)

(Eq.24)

The inputs to this controller are the 4 output states of the non-linear pendulum model. The

correct magnitude and force to keep the pendulum stable is calculated by the control law.

Fig. 8 : Simulink model of the nonlinear control law.

The following diagram (Fig.9) shows the set-up of the non-linear pendulum with control law.

Figure 10 is the closed loop pendulum angle plotted by matlab. The closed loop response is

stable and shows that the control law is working.

0 100 200 300 400 500 600

Fig. 9: Simulink diagram of the nonlinear system with control law.

Fig. 10: Closed loop response of the nonlinear pendulum with controller

The linear and nonlinear models of the cart-pole system have been developed and simulated. It

was found that the system is open loop unstable. For accurate system identification the process

must be stable, because of this, standard feedback controllers were developed and tested. The

next chapter in the report discusses the theory and operation of artificial neural networks.

33 AArrttiiffiicciiaall NNeeuurraall NNeettwwoorrkkss

The science of artificial neural networks is based on the neuron. In order to understand the

structure of artificial networks, the basic elements of the neuron should be understood.

Neurons are the fundamental elements in the central nervous system. The diagram below (Fig.

11) shows the components of a neuron. [5]

A neuron is made up of 3 main parts -dendrites, cell body and axon. The dendrites receive

signals coming from the neighbouring neurons. The dendrites send their signals to the body of

the cell. The cell body contains the nucleus of the neuron. If the sum of the received signals is

greater than a threshold value, the neuron fires by sending an electrical pulse along the axon to

the next neuron. The following model is based on the components of the biological neuron

(Fig. 12). The inputs X0-X3 represent the dendrites. Each input is multiplied by weights W0-

W3. The output of the neuron model, Y is a function, F of the summation of the input signals.

Fig. 11: The diagram shows the basic elements of a neuron

Fig. 12: Diagram of neuron model

AAddvvaannttaaggeess ooff AANNNN’’ss

1. The main advantage of neural networks is that it is possible to train a neural network to

perform a particular function by adjusting the values of connections (weights) between

elements. For example, if we wanted to train a neuron model to approximate a specific

function, the weights that multiply each input signal will be updated until the output

from the neuron is similar to the function.

2. Neural networks are composed of elements operating in parallel. Parallel processing

allows increased speed of calculation compared to slower sequential processing.

3. Artificial neural networks (ANN) have memory. The memory in neural networks

corresponds to the weights in the neurons. Neural networks can be trained offline and

then transferred into a process where adaptive learning takes place. In our case, a neural

network controller could be trained to control an inverted pendulum system offline say

in the simulink environment. After training, the network weights are set. The ANN is

placed in a feedback loop with the actual process. The network will adapt the weights to

improve performance as it controls the pendulum system.

The main disadvantage of ANN is they operate as black boxes. The rules of operation in

neural networks are completely unknown. It is not possible to convert the neural structure into

known model structures such as ARMAX, etc. Another disadvantage is the amount of time

taken to train networks. It can take considerable time to train an ANN for certain functions.

Fig. 13: Diagram shows the parallelism of neural networks

TTyyppeess ooff LLeeaarrnniinngg

Neural networks have 3 main modes of operation – supervised, reinforced and unsupervised

learning. [6] In supervised learning the output from the neural network is compared with a set

of targets, the error signal is used to update the weights in the neural network. Reinforced

learning is similar to supervised learning however there are no targets given, the algorithm is

given a grade of the ANN performance. Unsupervised learning updates the weights based on

the input data only. The ANN learns to cluster different input patterns into different classes.

NNeeuurraall nneettwwoorrkk ssttrruuccttuurreess

There are 3 main types of ANN structures -single layer feedforward network, multi-layer

feedforward network and recurrent networks. [7] The most common type of single layer

feedforward network is the perceptron. Other types of single layer networks are based on the

perceptron model. The details of the perceptron are shown below (Fig. 14).

Inputs to the perceptron are individually weighted and then summed. The perceptron computes

the output as a function F of the sum. The activation function, F is needed to introduce non-

linearities into the network. This makes multi-layer networks powerful in representing

nonlinear functions.

Fig. 14: Diagram of the perceptron model

There are 3 main types of activation function -tan-sigmoid, log-sigmoid and linear. [8]

Different activation functions affect the performance of an ANN.

The output from the perceptron is

The weights are dynamically updated using the back propagation algorithm. The difference

between the target output and the actual output (error) is calculated.

The errors are back propagated through the layers and the weight changes are made. The

formula for adjusting the weights is

Once the weights are adjusted, the feed-forward process is repeated. The weights are adapted

until the error between the target and actual output is low. The approximation of the function

improves as the error decreases. Single-layer feedforward networks are useful when the data

to be trained is linearly separable. If the data we are trying to model is not linearly separable or

the function has complex mappings, the simple perceptron will have trouble trying to model

the function adequately.

Log-sigmoid function Tan-sigmoid function Linear function

(Eq.26)

(Eq.27)

(Eq.28)

])[].[(][ kxkwfky T=

][][][ kykTke −=

][].[.][]1[ kxkekwkw µ+=+

MMuullttii--llaayyeerreedd ppeerrcceeppttrroonnss

Neural networks can have several layers. There are 2 main types of multi-layer networks-

feedforward and recurrent. In feedforward networks the direction of signals is from input to

output, there is no feedback in the layers. The diagram below (Fig. 15) shows a 3-layered

feedforward network.

Increasing the number of neurons in the hidden layer or adding more hidden layers to the

network allows the network to deal with more complex functions. Cybenko’s theorem states

that, “A feedforward neural network with a sufficiently large number of hidden neurons with

continuous and differentiable transfer functions can approximate any continuous function over

a closed interval.” [9] The weights in MLP’s are updated using the backpropagation learning.

[10] There are two passes before the weights are updated.

In the first pass (forward pass) the outputs of all neurons are calculated by multiplying the

input vector by the weights. The error is calculated for each of the output layer neurons.

In the backward pass, the error is passed back through the network layer by layer. The weights

are adjusted according to the gradient decent rule, so that the actual output of the MLP moves

closer to the desired output. A momentum term could be added which increases the learning

rate with stability.

IInnppuutt llaayyeerr

HHiiddddeenn llaayyeerr

OOuuttppuutt llaayyeerr

Fig. 15: Diagram of a multi-layered perceptron

The second type of multi-layer networks are recurrent (Fig.16). Recurrent networks have at

least one feedback loop. This means an output of a layer feeds back to any proceeding layer.

This gives the network partial memory due to the fact that the hidden layer receives data at

time t but also at time t-1. This makes recurrent networks powerful in approximating functions

depending on time. [11] The simulink model for the nonlinear inverted pendulum shows that

there are many feedback loops. This means the next state of the model depends on previous

states. It is expected that to accurately model this type of dynamic system, a recurrent neural

network with feedback loops will perform better than a static feedforward network.

Fig. 16: Diagram of a recurrent neural network

IInnppuutt llaayyeerr

HHiiddddeenn llaayyeerr

OOuuttppuutt llaayyeerr

44 SSyysstteemm IIddeennttiiffiiccaattiioonn

System identification is the process of developing a mathematical model of a dynamic system

based on the input and output data from the actual process. [12] This means it is possible to

sample the input and output signals of a system and using this data generate a mathematical

model. An important stage in control system design is the development of a mathematical

model of the system to be controlled. In order to develop a controller, it must be possible to

analyse the system to be controlled and this is done using a mathematical model. Another

advantage of system identification is evident if the process is changed or modified. System

identification allows the real system to be altered without having to calculate the dynamical

equations and model the parameters again.

System identification is concerned with developing models. The diagram below (Fig. 17)

shows the inputs and output of a system.

The mathematical model in this case is the black box, it describes the relationship between the

input and output signals. The inverted pendulum system is a non-linear process. To adequately

model it, non-linear methods using neural networks must be used. Previous studies in system

identification have demonstrated that neural networks are successful in modelling many non-

linear systems. [13] Before neural networks are investigated for identification, linear

techniques such as auto regressive with exogenous input (ARX) and auto regressive moving

average with exogenous input (ARMAX) will be applied to the linear inverted pendulum

model.

Fig. 17: System showing input, disturbance and output signals

OUTPUT INPUT

DISTURBANCE

SSyysstteemm iiddeennttiiffiiccaattiioonn pprroocceedduurree

Basically system identification is achieved by adjusting the parameters of the model until the

model output is similar to the output of the real system. Below is a diagram (Fig. 18)

explaining the system identification procedure. [14]

There are three main steps in the system identification procedure.

1. The first step is to generate some experimental input/output data from the process we

are trying to model. In the case of the inverted pendulum system this would be the

input force on the cart and the output pendulum angle.

2. The next step is to choose a model structure to use. For example the following model

structure is the ARX.

3. The parameters A and B will be adjusted until this model output is similar to the output

of the process. In identification, there is no perfect model structure to use. Models can

be developed using engineering intuition or a priori knowledge of the process we are

trying to model.

)()()(. tetButyA +=

Fig. 18: Diagram of the system identification procedure

(Eq.29)

The best solution in choosing a model structure is to pick a number of different models, test

them all and use the model which yields the closest output to the process. The standard linear

models (ARX,ARMAX,etc) used in system identification were researched. [15] Below is the

diagram for the ARX model and the ARX model equation. (Fig. 19) The ARX is a simple

linear difference equation, which describes the input-output relationship. The input-output

relationship is modelled using a transfer function block B/A. It is assumed that the noise

spectrum and the input-output model have the same characteristic dynamics. This could

attribute to some modelling error.

The ARMAX model contains an extra C parameter in the noise spectrum model. (Fig. 20)

This gives the ARMAX model more accuracy than the ARX. The input-output block and the

noise spectrum block still have the same denominator.

)(.)()( teAC

)()( teA

Noise spectrum model Input-output model

Fig. 19: ARX model Eq.30: ARX equation

Fig. 20: ARMAX model Eg.31 ARMAX equation

The next type of linear model is the output-error model (Fig.21). The main difference between

this model and the previous models is that the output-error assumes that the disturbances are

white noise so there is no noise spectrum model. The input-output relationship is defined in

the transfer block B/F.

The last model used in linear system identification is called the Box-Jenkins model. (Fig. 22)

This model has separate transfer functions for the input-output relationship and the noise

spectrum. This is an advantage compared to the ARMAX models where the noise model and

input-output relationship have the same denominator.

The next stage in the identification procedure is to generate the parameters of the model.

Matlab uses the least squares algorithm update the model parameters. The least square

algorithm takes the model structure and input/output data from the process and estimates the

model parameters. It also generates the residuals, which is the error between the model and the

process. If the residuals are too high, another model structure could be used or the

experimental data might not show the true dynamics of the system to be modelled. There are

many different linear methods of estimating the parameters of the model. Least squares is the

main algorithm used in system identification. The next section of the report details the linear

identification experiments on the inverted pendulum system.

)()()( tetuFB

)()()( teDC

Fig. 21: Output-error model Eq. 32: OE equation

Fig. 22: Box-Jenkins model Eq.33: BJ equation

LLiinneeaarr iiddeennttiiffiiccaattiioonn ooff tthhee ssyysstteemm

In order to generate an ARX, ARMAX or Box-Jenkins model of the inverted pendulum

system, the input-output data from the linear system must be collected. The diagram below

(Fig. 23) shows the simulink model with the feedback control system. The noise input is an

excitation signal. It is used to obtain a unique input-output response. Using ‘to workspace’

blocks the data is exported to the Matlab console. The data is split up into estimation and

validation data. Half of the generated data is used to generate the model and half will be used

to test the performance of the model.

The diagram above shows that the pendulum block is a SIMO (single-input multi-output)

system. Previous linear models that were developed are SISO (single-input single-output)

systems which model the pendulum angle θ . The first type of models to be developed are

Fig. 23: Linear Pendulum with feedback controller

The following code is an example of ARX estimation in matlab. The input/output data is split

up into estimation and validation data. The ARX function uses least-squares to estimate the

parameters of the model. The model can be converted into transfer function format using the

command th2tf (tetha to transfer function).

The nn matrix defines the orders and delay of the ARX model.

nn = [na nb nk]

na = number of parameters to be estimated in the denominator

nb = number of parameters in the numerator

nk = time delay in the model

To develop the best ARX model possible, different orders of na, nb will be tested. Note that

ARX221 means na=2, nb=2 nk=1. The following table (table 1) shows the different orders of

na,nb used and the mean squared error between the model output and the target output. The

compare function is used to compare the model output with the validation data. The models

were tested with two sets of validation data. The second validation data was generated using a

different initial seed in the input signal. The mean squared error increased slightly using the

new data but this shows that the generated models can predict the output of the process. The

number of parameters in the models were increased. It was expected that as the complexity of

the models increased, the mean squared error would decrease. This was not the case, from the

results the models with the lowest mean squared error are ARX221 and ARX421.

z1 = [y(1:500) u(1:500)]; z2 = [y(501:1000) u(501:1000)]; nn = [5 3 1]; th = arx(z1,nn); [yh,fit1] = compare(z2,th); [num,den] = th2tf(th); sysarx = tf(num,den);

ARX [na nb nk] Data 1 Data2 1 1 1 0.0053 0.0057 2 2 1 0.0040 0.0054 3 2 1 0.0042 0.0057 3 3 1 0.1105 0.1158 4 2 1 0.0041 0.0052 4 3 1 0.0113 0.0175 4 4 1 0.0104 0.0167 5 2 1 0.0043 0.0058 5 3 1 0.0124 0.0152 5 5 1 0.3143 0.3249

Table 1: Error associated with ARX models

The following two plots (Fig.24,25) show the model output with the actual process output.

The mean squared error between the two is low. Also the models seem to track the shape of

the actual output. These results for the ARX models are good but the best method to test the

quality of a model is to generate a transfer function block of the model in simulink and input a

noise signal with a different seed (Fig. 26).

0 50 100 150 200-0.015

-0.005

Blue: Model output, Black: Measured output

Output # 1 Fit: 0.0040831

0 20 40 60 80 100-0.015

-0.005

Blue: Model output, Black: Measured output

Output # 1 Fit: 0.0051914

Fig. 24: ARX 421 model output with validation data 1 Fig. 25: ARX 421 model output with validation data 2

Pole Angle rad.

It’s expected that the model output would be similar to the actual output but the above models

completely failed to predict the actual output.

The low error results of the models above are computed using the ‘compare’ function. The

results from the compare function indicate that the models are accurate but when a transfer

function of the models is used in simulink they go completely unstable. The reason for this is

the models are developed using closed loop data. A controller must be used to keep the

pendulum stable but the problem with this is the input/output data contains too much of the

controller dynamics. To adequately model the process, the input and output data must show

the dynamics of the pendulum system. This can be achieved by using a de-tuned controller.

This is a controller that just keeps the pendulum stable but more of the process dynamics can

be seen. A de-tuned PID controller is developed by adjusting the P, I, D parameters until the

output becomes more unstable.

Fig. 26: Testing the linear models using transfer function block

The diagram below shows the difference between a normal controller and a de-tuned

controller (Fig.27). Using a de-tuned controller more of the pendulum dynamics can be seen.

The testing resumed using the de-tuned controller. A new method of online identification in

simulink was found. There is a toolbox of system identification blocks in simulink. The four

types of models that can be generated are ARX, ARMAX, Box-Jenkins and Output-error

(Fig.28). Using this toolbox the different types of models, order of the models, etc can be

changed easily.

0 100 200 300 400 500 600-0.1

Red: PID

Blue: Detuned PID

Fig. 27: Closed-loop response with normal and de-tuned controllers

Fig. 28: Simulink model using the system identification toolbox

Pendulum Angle, rad

The models developed using ARX and ARMAX couldn’t adequately predict the pendulum

system output even using the de-tuned controller. All orders of ARX, ARMAX models were

tested. The plot below is an example output from one of the ARX models. (Fig. 29)

Many of the journals on closed-loop identification indicate the best model structures to use are

Box-Jenkins and Output-Error. [16] ARX and ARMAX both make an assumption that the

noise spectrum model and the input-output model have the same characteristic dynamics. This

explains why the ARX transfer functions were unable to model the linear inverted pendulum.

Models were generated using the BJ and OE simulink blocks. These models were converted

into transfer function blocks and simulated. (Fig. 30)

Fig. 29: ARX model response compared to the real output

Fig. 30: Testing the accuracy of the Box-Jenkins and Output-error models

Blue: Model output Black: Process output

Pendulum Angle, rad

0 100 200 300 400 500-9

5 Output # 1 Fit: 140322.7567

The following tables show the results from the Box–Jenkins and Output-Error models. The

BJ/OE parameters were changed for each simulation to determine the best model. The model

BJ53331 has the lowest mean squared error. The Output-Error models have a slightly higher

error than the Box-Jenkins models. These results indicate that

1. When trying to model an unstable system a feedback controller must be used to keep

the system stable. If the controller is de-tuned this will allow more of the pendulum

dynamics to be seen. This will make the model more accurate.

2. The Box-jenkins/Output error models are the only structures that can adequately model

the pendulum using the closed loop data.

3. The best way to test the quality of a model is to construct a transfer block of the model

and simulate it using a different initial input seed.

BJ [na nc nd nf nk] MSE 2 1 1 1 0 3.6214e-004 2 1 1 1 1 4.1615e-004 2 1 2 1 0 4.7409e-004 2 1 2 2 0 8.7144e-004 3 1 1 1 0 2.8155e-004 3 1 2 2 0 1.2681e-004 3 2 2 2 0 1.2606e-004 4 3 3 3 1 5.3353e-005 5 3 3 3 1 4.6758e-005

Table 2: Error associated with Box-Jenkins models

OE [nb nf nk] MSE 2 1 1 0.0014 2 2 1 0.0069 2 2 2 0.0104 3 1 1 0.0151 3 2 1 0.0074 3 2 2 0.0128 4 1 1 0.0039 4 2 1 0.0084 4 2 2 0.0126 5 1 1 0.0089

Table 3: Error associated with Output-error models

The results from the best models developed are shown below.

36 38 40 42 44 46

-0.015

-0.005

Time (secs)

Actual Output (Red Line) vs. The Predicted Model output (Blue Line)

36 38 40 42 44 46

x 10-3Actual Output (Red Line) vs. The Predicted Model output (Blue Line)

Fig. 31 : Box Jenkins Model [5 3 3 3 1] tested on validation data

Fig. 32 : Output Error [2 2 1] tested on validation data

Pendulum Angle, rad

NNoonn--LLiinneeaarr iiddeennttiiffiiccaattiioonn ooff tthhee ssyysstteemm

The previous system identification is based on linear systems. To achieve a better

approximation of the inverted pendulum system, the non-linear model must be used. Before

neural networks are utilized in identification, linear methods of identification such as Box-

Jenkins were applied to the nonlinear pendulum. The diagram below shows the nonlinear

pendulum with the feedback controller (Fig.33). The input and output signals from the system

are directed to the BJ algorithm which develops the model.

It is expected that linear methods such as BJ will not be able to capture the nonlinearitys of the

pendulum, however as the diagram below shows (Fig.34) the Box-Jenkins model can actually

model the nonlinear function. This is because the process is in a closed loop. The control law

keeps the pendulum stable by removing some of the non-linearities of the pendulum system.

Even using a detuned controller the Box-Jenkins can model the dynamics well.

36 38 40 42 44 46

Time (secs)

Actual Output (Red Line) vs. The Predicted Model output (Blue Line)

Error In Predicted Model

Fig. 33: Linear identification on the nonlinear system

Fig. 34: The Box-Jenkins output Vs. Process output

Pendulum Angle, rad

NNoonn--lliinneeaarr IIddeennttiiffiiccaattiioonn uussiinngg nneeuurraall nneettwwoorrkkss

This section discusses the different methods of identifying the pendulum process using neural

networks. The most common method of neural network identification is called forward

modelling (Fig. 35). [17] During training both the process and ANN receive the same input,

the outputs from the ANN and process are compared, this error signal is used to update the

weights in the ANN. This is an example of supervised learning- the teacher (pendulum

system) provides target values for the learner (the neural network).

There are 2 main types of networks (Feed-forward and Elman) that will be used for

identification. In order to provide a set of targets for the network to learn, the simulink model

with feedback control is used. (Fig. 36)

Fig. 35: Neural network forward modelling method

Fig. 36: The non-linear pendulum provides the neural targets

To emphasise the pendulum dynamics, the feedback controller was de-tuned. The controller is

based on a control law which basically cancels any non-linearities in the pendulum system.

The mass of the cart, M is set to 0.2 kg (In the control law). This makes the controller use less

control force making the pendulum go more unstable. More of the pendulum dynamics will be

seen at the output. (Fig. 38)

Initially single-input single-output networks were developed, the input being the control force

and the output pole angle. The first type of neural network to be developed are feed-forward.

Using matlab it is possible to develop multi-layer perceptrons (MLP).

The following matlab code creates a feed-forward network.

net = newff([-10 10],[4 1],{'tansig' 'purelin'},'trainlm'); net.trainParam.epochs = 400; net.trainParam.lr = 0.001; net = train(net,in(1:1000)',tethain(1:1000)'); The ‘newff’ function allows a user to specify the number of layers, the number of neurons in

the hidden layers and the activation functions used. This network contains 4 neurons in its

hidden layer. The hidden layer contains tan-sigmoid activation functions and the output layer

contains a linear function. This is the standard set-up of activation functions in MLP.

0 500 1000 1500 2000 2500

x 10-3

0 500 1000 1500 2000-0.2

Fig. 37: Closed loop response with normal controller Fig. 38: Closed loop response with detuned controller

Pole Angle rad

The type of training used is also set here. These networks use the back-propagation learning

rule to update the weights. The number of epochs for this example is set to 400. During

training, the input vector will be passed through the neural network and the weights will be

adjusted 400 times. The learning rate of the network is also set. The ‘train’ function adjusts the

weights of the network so the output of the network will be similar to the non-linear

pendulum.

The following diagram shows the training of the MLP. (Fig. 39) At the start of the training,

the error between the network and the pendulum is high. As the number of epochs increases

the mean squared error decreases. As the curve begins to converge, very little learning is

taking place. Looking at the training diagram below it is possible to determine the correct

amount of epochs for training.

0 50 100 150 200 250 300 350 400 45010

100 Performance is 0.000411685, Goal is 0

475 Epochs

Fig. 39: The training error decreases as the neural network ‘learns’

When the training is finished, the neural network is exported to simulink using the ‘gensim’

command. The diagram below (Fig. 40) shows the neural network in simulink. Notice that the

neural model and the process receive the same input signal. To adequately test the quality of

the model the initial seed of the input signal must be changed.

The quality of the neural model is tested by calculating the MSE (mean squared error). The

MSE gives a good indication of the accuracy of the model. The MSE between the model and

the process should be low. A model could have a low MSE but not predict any of the

dynamics of the pendulum system. The output from the model and process is plotted to

compare the dynamics. Basically we want to see whether the model predicts the movement of

the inverted pendulum. Increasing the number of hidden layer neurons allows for more

complex functions to be modelled. During testing, neural networks with a range of hidden

layer neurons were simulated. It was expected that as the number of hidden neurons increased

the more accurate the model would become. The following graphs show the process output

plotted against the model outputs. (Fig. 41-44) The initial seed is kept the same for all

simulations to show the difference between the models.

Fig. 40: The quality of the model is tested in simulink

0 10 20 30 40 50 60-0.06

0.08Feed-forward 4 hidden neurons

Blue: NN output

Red: Process

0 10 20 30 40 50 60-0.06

Blue NN output

Red Process output

Fig. 41: Feed-Forward Network, 1 Hidden layer, 4 hidden neurons

Pendulum Angle, rad

0 10 20 30 40 50 60-0.06

Blue: NN output

Red: Process

0 20 40 60 80 100 120-0.3

Blue NN output

Red: Process

Pendulum Angle, rad

The feedforward networks model the process well. The MSE error is low and the neural model

predicts the pendulum angle. The results indicate that increasing the number of hidden

neurons does not improve the MSE between the model and the process.

Type ANN

Neurons in Hidden Layers

Training Epochs

Learning Rates

FF 50 500 0.0001 3.2622-6 FF 20 500 0.0001 3.4465e-6 FF 10 500 0.0001 1.1103e-5 FF 4 500 0.0001 1.54e-5

Most of the research in system identification uses open loop identification. Increasing the

number of hidden layer neurons will have a direct influence on the accuracy of the model. In

the closed loop case, it was found that using a de-tuned controller had more of an influence on

the model accuracy than increasing the number of hidden layer neurons.

The next type of ANN tested are Elman networks. These networks are discussed in Chapter 3.

Elman networks with their built in feedback loops enable them to have dynamic memory.

This makes them more suitable in predicting dynamic systems. Elman networks are setup in

matlab similar to feed-forward. Using the command ‘newelm’ the number of hidden layers,

number of hidden layer neurons and the type of activation functions can be set.

net= newelm([-10 10],[4 1],{'tansig' 'purelin'},'trainlm'); net.trainParam.epochs = 400; net.trainParam.lr = 0.001; net = train(net,in(1:1000)',tethain(1:1000)');

Elman networks with different sizes of hidden layers were tested. The learning rate, training

epochs and setup of the activation functions were all tested to determine the most accurate

model using Elman networks. The following graphs show the process output plotted against

the Elman model outputs. (Fig. 45-47)

Table 4: Setup/Results of the Feedforward networks

0 500 1000 1500 2000 2500-1

2Elman 4 hidden neurons

Red: NN output

Blue: Process output

0 500 1000 1500 2000 2500-0.8

0.6Elman 10 hidden neurons

Blue: NN output

Red: Process

Fig. 45: Elman Network, 1 Hidden layer, 4 hidden neurons

Pendulum Angle, rad

Type ANN

Neurons in Hidden Layers

Training Epochs

Learning Rates

Elman 20 500 0.0001 0.4086 Elman 10 500 0.0001 0.2233 Elman 4 500 0.0001 1.2804

The results indicate that the elman neural models were not accurate. The maximum possible

size of hidden layer to be trained was approximately 20 neurons. It was found that when

training elman networks with more than 20 neurons in the hidden layer matlab crashes. This is

due to the high memory requirements when training elman networks in matlab. It was

expected that elman networks would approximate a dynamic process such as the pendulum

system better than the static feed-forward networks. The poor results of the elman networks is

due to the fact that the training data is from a closed loop system.

0 500 1000 1500 2000 2500-0.8

0.8Elman 20 hidden neurons

Red: NN output

Blue: Process

Table 5: Setup/Results of the Elman networks

Pendulum Angle, rad

MMuullttii--oouuttppuutt iiddeennttiiffiiccaattiioonn

All of the previous neural models that have been developed have been single-input single-

output systems. The previous neural networks have modelled just the pendulum angle output.

A neural network is going to be developed which models the 1 input and 4 output states. The

diagram below shows the nonlinear pendulum with control law. (Fig. 48) In this example

feed-forward networks will be utilised to model the multi-output system.

The process and the neural model will receive the same input. Instead of having just one

target, the neural network will have 4 targets to learn. The neural network is trained by

presenting the 4 targets together at each time interval. A cell array combines the 4 different

targets into 1 input vector. Two sizes of feedforward networks (50 and 100 neurons) will

model the multi-output process.

clear tempP for k = 1:200, P = [y(k);ydot(k);tetha(k);tethadot(k)]; tempP = [tempP P]; end net = newff([-10 10],[50 4],{'tansig' 'purelin'},'trainlm'); net.trainParam.epochs = 500; net = train(net,in(1:200)',tempP);

Fig. 48: Nonlinear pendulum/controller is used as for training the ANN

The neural network is trained in matlab and when the training is over the neural network is

generated in simulink. The quality of the neural model is checked by comparing the 4 outputs

from the neural network to the 4 outputs of the process.

The following diagrams (Fig 50-53) show the response from the process and model outputs

for each state. The blue signal is the process output. The neural model with 100 neurons in the

hidden layer is more consistently more accurate that the neural model with 50 hidden neurons.

The neural model does a good job of modelling two of the output states however both neural

networks fail to model the velocity of the cart and angular velocity of the pendulum

accurately.

Fig. 49: The outputs from the ANN and model are compared

0 20 40 60 80 100 120 140 160 180

Displacement of the cart

Blue: Real OutputRed: Neural Model 50 Black: Neural Model 100

0 20 40 60 80 100 120 140 160 180

10Velocity of the cart

Blue: Process OutputRed: Neural Model 50Black Neural Model 100

Fig. 50: Displacement of the cart

Meters, m

Fig. 51: Velocity of the cart

0 50 100 150 200 250 300-0.8

1Angle of the pendulum

Blue: Process OutputRed: Neural model 50Black: Neural model 100

0 20 40 60 80 100 120 140 160

Angular velocity of the pendulum

Blue: Process outputRed: Neural model 50Black: Neural model 100

Fig. 52: Angle of the pendulum

Fig. 53: Angular velocity of the pendulum

Pendulum Angle, rad

rad/sec

System identification techniques have been applied to the inverted pendulum system. The

results from linear identification indicate that using closed loop techniques, its possible to

generate accurate transfer block models. Box-Jenkins and output-error models are developed

which have similar dynamics to the pendulum process and low mean squared error.

Neural networks were utilised to model the nonlinear pendulum. Before the nonlinear

penudulum is identified, the process is stabilised using feedback control. The feedback control

removes some of the nonlinearitys of the process so detuned controllers were used which

allows the process to exhibit more of its dynamics. This improved the quality of the data used

in the system identification. Different sizes of static feedforward networks were trained using

the input/output data from the nonlinear pendulum. The FF networks were generated in

simulink for testing. The FF networks showed similar dynamics to the pendulum model and a

low MSE.

Recurrent neural networks were also trained to identify the inverted pendulum. The recurrent

networks have built in feedback loops which enable them to model dynamic systems more

accurately than static feedforward networks. In practice, this was not the case. The results

from the Elman networks were not as accurate as the feedforward networks. Previous research

using recurrent networks have all been used in identification of open-loop stable systems.

When identifying open-loop unstable systems, feedback control must stabilise the system.

Unfortunately this also removes some of the dynamics of the process and makes accurate

identification more difficult. This is one of the main reasons why the results from the Elman

networks are not as accurate.

55 NNeeuurraall ccoonnttrrooll ooff tthhee iinnvveerrtteedd ppeenndduulluumm

The main task of this project is to design a controller which keeps the pendulum system

inverted. There are a few important points to remember when designing a controller for the

inverted pendulum. The inverted pendulum is open-loop unstable, non-linear and a multi-

output system. To show the advantages of using neural-control in this project a comparison

between astandard PID control and neuro-control is made.

Nonlinear system: Standard linear PID controllers cannot be used for this system because they

cannot map the complex nonlinearities in the pendulum process. ANN’s have shown that they

are capable of identifying complex nonlinear systems. They should be well suited for

generating the complex internal mapping from inputs to control actions.

Multi-output system: The inverted pendulum has four outputs, in order to have full state

feedback control four PID controllers would have to be used. Neural networks have a big

advantage here due to their parallel nature. One ANN could be used instead of four PID’s.

Open-loop unstable: The inverted pendulum is open-loop unstable. As soon as the system is

simulated the pendulum falls over. Neural networks take time to train so the pendulum system

will have to be stabilized somehow before a neural network can be trained.

Before the actual neuro-controller is developed in matlab, the main types of neuro-control are

discussed. The five types of neural network control methods that have been researched are

supervised, model refernce control, direct inverse, internal model control and unsupervised.

SSuuppeerrvviisseedd CCoonnttrrooll::

It is possible to teach a neural network the correct actions by using an existing controller or

human feedback. This type of control is called supervised learning. But why would we want to

copy an existing controller that already does the job? Most tradional controllers (feedback

linearisation, rule-based control) are based around an operating point. This means that the

controller can operate correctly if the plant/process operates around a certain point. These

controllers will fail if there is any sort of uncertainty or change in the unknown plant. The

advantages of neuro-control is if an uncertainty in the plant occurs the ANN will be able to

adapt it’s parameters and maintain controlling the plant when other robust controllers would

fail. In supervised control, a teacher provides correct actions for the neural network to learn.

(Fig. 54) In offline training the targets are provided by an existing controller, the neural

network adjusts its weights until the output from the ANN is similar to the controller. [18]

When the neural network is trained, it is placed in the feedback loop. Because the ANN is

trained using the existing controller targets, it should be able to control the process.

Fig. 54: Supervised learning using an existing controller

At this stage, there is a ANN which controls the process similar to the existing controller. The

real advantage of neuro-control is the ability to be adaptive online.(Fig.55) An error signal

(desired signal – real output signal) is calculated and used to adjust the weights online.

If a large disturbance/uncertainty occurs in the process- the large error signal is feedback into

the ANN and this adjusts the weights so the system remains stable.

MMooddeell RReeffeerreennccee CCoonnttrrooll

In the diagram above (Fig. 55) the error signal is generated by subtracting the output signal

from the desired system response. In model reference control the desired closed loop response

is specified through a stable reference model (Fig.56). [19] The control system attempts to

make the process output similar to the reference model output.

Fig. 55: Adaptive neural control

Fig. 56: Model reference control

DDiirreecctt IInnvveerrssee CCoonnttrrooll::

The next type of control technique researched is direct inverse control. The advantage of using

inverse control over supervised control is that inverse control does not require an existing

controller in training. Inverse control utilises the inverse of the system model. The diagram

below (Fig.57) is a simple example of direct inverse control. A neural network is trained to

model the inverse of the process. [20] When the inverse controller is cascaded with the

process the output of the combined system will be equal to the setpoint. The inverse

nonlinearities in the controller cancel out the nonlinearities in the process. For the

nonlinearities to be effectively cancelled, the inverse model must be very accurate.

Setpoint Output

[ ] 1mod )()( −= sGsG elc

Inverse modelling is used to generate the inverse of the process. The system output is used as

an input to the network. The ANN output is compared with the training signal (the system

input) and this error signal is used to train the network. (Fig. 58) This training method will

force the neural network to represent the inverse of the system.

)(sGc )(sG p

Fig. 57: Direct inverse control

Fig. 58: Inverse modelling of a process

There are certain problems associated with direct inverse control. In the case of the inverted

pendulum, the process may not be invertible. The inverted pendulum is open-loop unstable,

the training data would not show the dynamics of the system as the pendulum falls over

quickly. There also can be process-model mismatches. The training of the ANN for an inverse

model might lead to the model being not strictly proper. This will lead to unknown

disturbances in the system.

IInntteerrnnaall mmooddeell ccoonnttrrooll::

Internal model control is based on direct inverse control. The problems associated with direct

inverse control such as process-model mismatch, etc are reduced using IMC. The diagram

below shows the set-up of IMC (Fig. 59). [21]

A neural network model is placed in parallel with the real system. The controller is an inverse

model of the process. The filter makes the system robust to process-model mismatch. With the

IMC scheme, the aim is to eliminate the unknown disturbance affecting the system. The

difference between the process and the model d(s) is determined. If the ANN model is a good

approximate of the process then the d(s) is equal to the unknown disturbance. The signal d(s)

is the information that is missing from the NN-model and can be used to improve the control.

The d(s) signal is subtracted from the input setpoint Uin. In theory, using this method it is

possible to achieve perfect control.

Fig. 59: Diagram of Internal model control

UUnnssuuppeerrvviisseedd CCoonnttrrooll::

The previous neural control methods are all trained using a priori knowledge such as

an explicit teacher providing correct actions. In unsupervised learning set-up, no existing

controllers can be imitated and the ANN doesn’t have a target to compare to its output. The

ANN must try different states and determine which state produces a good output. Learning

from experience during periods of no performance feedback is difficult.

Anderson et al [22,23] developed an unsupervised controller for an inverted pendulum system.

Modifications to this controller are based on a failure signal. The failure signal occurs when

the pole falls past a certain angle or when the cart reaches the end of the track. A long period

of the pole being inverted can occur before the failure signal occurs and the controller must

decide which actions in the sequence contributed to the failure. The graph below shows the

results of the unsupervised controller. (Fig. 60)

It takes over 5000 failures until the total time of pendulum inverted increases above 1000. The

research by Anderson et al shows that the learning time in unsupervised control is very high

but the unsupervised ANN can deal with uncertainty and the complexities of nonlinear

control.

Fig. 60: Graph of Unsupervised controller

To determine the type of neural control to be used, the pros and cons of each of the control

methods was determined specifically relating to the pendulum system. To develop a

supervised neural controller for the inverted pendulum an existing controller is required. A

nonlinear controller has already been developed using feedback linearisation for the inverted

pendulum. This controller could be used as a teacher. The main disadvantage with this method

is the neural controller is based on a control law. The control law can only effectively control

the inverted pendulum model developed earlier. If this controller was applied to a more

complex pendulum model (Eg: including friction) the controller would fail to keep the

pendulum stable.

Inverse control and internal model control are both based on developing an accurate inverse

model of the inverted pendulum system. The problem with this method is the pendulum is

open loop unstable. To develop an inverse model, the pendulum must be stabilised using a

controller. When a feedback controller is used, the inverse model contains some of the

dynamics of the controller. The inverse model developed using a feedback controller would

never be accurate enough to be used in direct inverse control. This type of neural control is

suited to robotic applications and control of stable open-loop systems.

The unsupervised control of the inverted pendulum developed by Anderson is the only neural

control method that does not require some sort of existing controller for training. The results

using unsupervised learning are promising and show that its possible for the controller to

‘learn’ to keep the pendulum upright. Realistically the unsupervised method covered by

Anderson is too complex for the project time frame.

NNeeuurraall--ccoonnttrrooll iinn ssiimmuulliinnkk::

The first type of neural-control developed uses supervised learning. There is an existing

feedback controller for the nonlinear pendulum already developed. A feed-forward neural

network will be trained to imitate this controller. The neural controller will be developed

similar to the identification methods covered earlier. Below is a diagram (Fig.61) of the

nonlinear pendulum model and the control law. The four inputs (y, ydot, tetha, tethadot)

signals are stored in matlab. The target for the neural network is the output from the controller.

The four input signals and the target output are exported to the matlab workspace. The

following matlab code trains the neural network. The first section of code generates the ‘cell

array’. The cell array combines the 4 different inputs into 1 input vector. The FF network has

50 neurons in the hidden layer. The activation functions in the hidden layer are tan-sigmoid

and the output layer is a linear function. clear tempP for k = 1:500, P = [y(k);ydot(k);tetha(k);tethadot(k)]; tempP = [tempP P]; end net= newff([-2 2;-2 2;-2 2;-2 2],[50 1],{'tansig','purelin'},'trainlm'); net.trainParam.epochs = 500; net = train(net,tempP,out(1:500)');

Fig. 61: Supervised learning using the control law.

When the training is finished, the weights are set and a simulink ANN is generated. The

network is placed in the feedback loop instead of the existing controller. (Fig.62)

The plot on the left (Fig.63) shows the MSE error difference between the neural network and

the original controller. The error between the ANN and the controller is in the order of 710− so

the network is an accurate approximation of the controller. The diagram on the right (Fig.64)

is a plot of the pendulum angle with the above system.

The ANN above (Fig.62) is developed using the neural network toolbox. This toolbox allows

for the weights to adjusted in the matlab environment but when a simulink neural network is

created the weights are set and cannot be adjusted.-online learning isn’t possible.

0 100 200 300 400 500 600 700 8000

2.5x 10

0 100 200 300 400 500 600 700 800

Fig. 62: The original controller is replaced by the neural network

Squared Error

Pendulum Angle, rad

Fig. 63: MSE between the original controller and ANN Fig. 64: Closed loop response with neural control

This is not a problem if the parameters of the pendulum system are fixed and there is no

disturbance to the system. The ANN cannot adapt its weights if a disturbance or uncertainty

occurs. This is shown below. (Fig.65) Using the set-up below, the mass of the cart is going to

be changed from 1.2 Kg to 1.28 Kg midway through the simulation using the switch.

The plot below shows the angle of the pendulum. (Fig. 66) The disturbance to the system is

added at 200 time units (approx). As the disturbance occurs the pendulum goes unstable. This

is because the ANN cannot adapt its weights using an error signal to counteract the

disturbance.

0 100 200 300 400 500 600 700-2

Fig. 65: Introducing a disturbance to the system

Fig. 66: Plot of pendulum angle

Pendulum Angle, rad

This problem was solved by using an adaptive neural toolbox which is an add-on for simulink.

[24] This toolbox basically allows for online neural learning to occur. The block diagram of

the toolbox is shown below. (Fig.67) This toolbox contains an Adaline and MLP (Multi-

layered perceptron) simulink blocks.

All of the blocks have the same interface so its possible to try out many different networks

quickly and easily.

There is an interface for each block so that the user can set the network parameters such as

learning rate, number of neurons in each layer, etc.

The inputs to each block are:

x: The input vector to the neural network.

e: The error between the real output and the network approximation.

LE: A logic signal that enables or disables the learning.

The outputs of each block are:

Ys: The value of the approximated function.

X: All the “states” of the network, namely the weights and all the

parameters that change during the learning process.

Fig. 67: Simulink toolbox of adaptive neural networks

The first type of adaptive network to be used is the adaline. The adaline is used to approximate

‘almost linear’ functions. The adaline will be trained offline using the original feedback

controller. The diagram below (Fig.68) shows the simulink set-up of the adaptive offline

learning. The error signal is equal to the output of the controller minus the output of the

network. This signal is fed back into the adaline. The weights in the adaline are updated using

the steepest descent gradient, which minimises the square error between measurements and

estimates. The learning rate of the adaline is set to 0.01 and the sample rate to 0.05.

Fig. 68: Adaptive network trained using the original controller

Learning switch

Figure 69 shows the four weight values of the adaline as the training progresses. Figure 70

shows the error between the controller output and the ANN output. As the error goes to zero,

the network weights converge on their final values.

When the error converges to zero, the network is trained. The network can be now placed in

the feedback loop instead of the original controller.

0 2000 4000 6000 8000 10000 12000-5

20Network weights

0 2000 4000 6000 8000 10000 12000-40

30Error signal

Fig. 69: Plot of the network weights

Fig.70: Plot of the error signal

The diagram below (Fig.71) is the simulink set-up of the adaptive neural controller. The

previous weights that are trained offline are now used as the initial weights the adaline. The

adaline network has an input error signal which is equal to the desired pendulum angle minus

the actual pendulum angle. The desired pendulum angle is produced by the stable linear

pendulum model block. This error signal is inputted into the adaline, which adapts the weights

online. This improves the performance of the network.

Figure 72 shows the pendulum angle from the simulation. The results indicate that the neural

controller keeps the pendulum angle stable.

0 1000 2000 3000 4000 5000 6000 7000 8000-0.01

-0.008

-0.006

-0.004

-0.002

0.01Pendulum Angle

Fig.71: Adaptive online neural controller using MRAC

Pendulum Angle, rad

Fig.72: Pendulum angle

The previous neural controller developed shows that a neural network can be trained offline

using another controller as a teacher. The neural controller can be then placed online where it

will continuously update its weights. The advantage of using adaptive control can be shown if

a disturbance occurs during operation. Using the set-up below, the mass of the cart is going to

be changed from 1.2 Kg to 1.28 Kg midway through the simulation using the switch.

Figure 74 shows the pendulum angle during the simulation. The pendulum angle oscillates

until 2000 time units. When the mass is changed, the oscillations increase to 0.1 radians. The

error signal increases and this adjusts the weights. The pendulum angle decreases to normal

oscillation.

Fig.73: Introducing a disturbance to the system

All of the previous work on adaptive neural control has used the Adaline network. It has been

proven that a MLP has greater accuracy than the Adaline in approximating nonlinear

functions. The reason for the adaptive research using the Adaline is because of the failure of

the MLP in controlling the inverted pendulum. The MLP was trained offline in the same

method as the Adaline to model the control law. The error between the MLP and the control

law was to the order of 310− . The MLP was placed in the feedback loop instead of the existing

controller. During simulations, the inverted pendulum goes unstable every time. As the MLP

was trained to model the control law there is no reason why the MLP should not control the

inverted pendulum.

0 1 2 3 4 5 6

1.5x 10

-1 Pendulum Angle

Fig.74: Effect of the disturbance on the pendulum angle

Pendulum Angle, rad

66 RReeaall--ttiimmee iiddeennttiiffiiccaattiioonn aanndd ccoonnttrrooll

The previous research that has been covered in identification and control has been based on a

nonlinear pendulum model. The dynamics of this model might be similar to the real system

but implementing identification and control is much more complicated in the real system. This

chapter discusses any of the practical work on the inverted pendulum rig.

The pendulum rig consists of a simple cart which runs along a track. The cart is restricted to

travelling in the track axis. The position of the cart is controlled by a DC motor and drive belt.

A pole with mass on the end is pivoted on the cart and is free to swing in the same axis. The

outputs from this system are the position of the cart along the track and the angle of the

pendulum. These are both measured using optical encoder sensors. The two output signals are

sent to a control algorithm in matlab via a data acquisition card. The control algorithm

determines a control action to keep the pendulum inverted. A DC signal controls the speed and

magnitude of the motor which determines the position of the cart. Figure 75 shows the digital

pendulum system.

Fig. 75: Setup of the digital pendulum system.

At this stage there is a system that measures the position of the cart and the angle of the

pendulum. There is also an interface which makes it possible to control the position of the

cart. The next important part of the pendulum system is the control algorithm in matlab. The

diagram below shows the real time kernal (RTK) in the matlab environment. The RTK is an

encapsulated block which covers all the control tasks. The input to the RTK block is the

desired cart position. The outputs of the real time task is a vector which contains information

about the pendulum angle, angular velocity, the cart position, cart velocity and the control

value for the DC drive.

There is no feedback control loop because the controller is embedded in the RTK. Two PID

controllers are utilized to stabilize the inverted pendulum. The first PID controls the angular

position of the pendulum. The second is used to control the position of the cart. The outputs of

the PID controllers are added to produce the final DC control signal. Figure 77 shows the

structure of the PID contollers.

Fig. 76: Real time task in simulink environment

Previously in the report it has been stated that it is not possible to control the nonlinear

pendulum using PID control. In this case, it is possible for the RTK to utilize PID control

because the pendulum is placed in the upright position in the linearised region before the

experiment starts. During the experiment the inverted pendulum can sometimes swing past the

linearised region and fall over. If the pendulum falls down during the experiment, it has to be

turned up manually. Figure 78 shows the pendulum angle during the simulation. It is clear that

the PID control stabilizes the inverted pendulum.

0 200 400 600 800 1000 1200 1400-0.02

-0.015

-0.005

0.01Process pendulum angle

Fig. 78: Plot of the pendulum angle

Fig. 77: Structure of the PID controller

Pendulum Angle, rad

The next stage is to develop a neural network which identifies the pendulum system online. A

multi-layered perceptron (MLP) is used to model the angle of the pendulum, θ and the

position of the cart, y. Figure 79 shows the setup of the identification process. The PID

controller is used to stabilize the inverted pendulum. Closed loop identification is necessary

for open loop unstable systems. The control signal input to the pendulum system is inputted

into the ANN. The error signal, yee ,θ between the process output and the ANN output is

backpropagated to adjust the weights of the MLP.

Figure 80 shows the simulink setup of the real time identification. A multi-layered perceptron

with 100 neurons in the hidden layer and a learning rate of 0.05 is utilized. The input into the

MLP is the control signal from the PID controller. The MLP has 2 outputs modelling the

pendulum angle and the cart position.

Fig. 79: Setup of the online identification process

Figure 81 shows the plot of the process pendulum angle and the neural network angle. The

blue graph is the inverted pendulum angle and the red is the neural model output. The MLP

shows that it is possible to identify the pendulum angle online.

0 200 400 600 800 1000 1200 1400-0.02

-0.015

-0.005

Fig. 80: Simulink setup of the online identification

Fig. 81: Plot of the pendulum angle

Pendulum Angle, rad

Figure 82 shows the plot of the process cart position. The blue graph is the cart position and

the red is the neural model output.

The real time kernal for the inverted pendulum system contains many different types of

controller- PID, nonlinear control law, etc. It is possible to develop and test prototype

controllers using the external controller function. The external controller is a file which

contains the control routine which is accessed at the interrupt time. The control algorithm eg:

neural, fuzzy, adaptive, etc must be written in C code. The develop a correct external

controller the input/output architecture and the limits of the signals must be obeyed.

In order to develop an external neural controller for the real time inverted pendulum it will

have to be written in C. The online neural toolbox used previously in the project contains

multiplayer perceptrons (MLP) which are all written in C. Instead of writing a neural

controller from scratch, a MLP could be adapted to the structure of external controller format.

Before the external controller is developed, a MLP must be trained to control the inverted

pendulum. It is possible to train a neural network to imitate the existing PID controller.

0 100 200 300 400 500 600 700 800-0.08

Fig. 82: Plot of the cart position.

Meters, m

Figure 83 shows the training of a neural network. The inputs to the MLP are the pendulum

angle and the cart position. The error signal between the output of the MLP and the PID

control signal is backpropagated to adjust the neural weights.

Figure 84 shows the plot of the real control signal and the output from the MLP. The blue

signal is the PID control and the red is the neural controller output.

0 50 100 150 200 250 300-1.5

Fig. 83: Training a MLP to imitate the PID controller

Fig. 84: Plot showing the MLP output and PID output

Control signal

At this stage, there is a trained MLP which imitates the PID controller. The weights in the

MLP are stored. The next step is developing a neural controller in C and adapting it to external

controller format. Unfortunately there was not enough time in the project to implement the

real time neural controller. If a neural controller was implemented in C, the neural weights

from the trained network in Fig. 83 could be transferred to the new network and set as initial

weights. In theory this ANN should be able to control the real time inverted pendulum.

77 CCoonncclluussiioonnss

SSuummmmaarryy

This research has applied artificial neural networks to the identification and control of the

inverted pendulum. Before identification techniques could be tested, a model representing the

inverted pendulum was developed in simulink. Some of the modelling and control techniques

involved in the project are linear so a linearized version of the inverted pendulum was

developed. Open loop identification was initially tested but it was found that the inverted

pendulum is open loop unstable. One of the requirements for accurate identification is

experimental input-output data that shows the dynamics of the system. It was decided that

system identification would be performed in closed-loop so stabilizing feedback controllers

had to be developed for the linear and nonlinear inverted pendulum. A simple full-state

feedback controller stabilized the linear pendulum and a control law was developed to

stabilize the nonlinear pendulum. The closed loop data is stable and the inverted pendulum can

be simulated for longer times so more data can be collected.

Linear identification techniques were applied to the linear pendulum. The aim was to develop

a transfer function block that accurately modelled the inverted pendulum. An accurate model

will have a low MSE in relation to the process and the model will show some of the dynamics

of the process. The four types of model tested are ARX, ARMAX, Box-Jenkins and Output-

Error. It was found that the ARX and ARMAX could not model the inverted pendulum at all.

The Box-Jenkins and Output-Error models had low MSE but did not show any of the

dynamics of the inverted pendulum system. The few journals on closed loop identification

have all indicated that the best linear model structures are Box-Jenkins and Output-Error.

ARX and ARMAX both make assumptions that the noise spectrum models and the input-

output models have the same characteristic dynamics. This explains why the ARX and

ARMAX could not model the linear inverted pendulum. One of the reasons why the Box-

Jenkins and Output-Error models did not show any of the process dynamics is due to the

closed loop identification. Closed loop identification must be used on open-loop unstable

systems such as the inverted pendulum but one of the disadvantages is that feedback

controllers mask some of the dynamics of the system.

A detuned controller was used to control the process. This type of controller keeps the

inverted pendulum barely stable but more of the process dynamics can be seen. The Box-

Jenkins and Output-Error identification resumed with much better results. The main

conclusions from the linear identification are:

1. When trying to model an unstable system a feedback controller must be used to keep the

system stable. If the controller is de-tuned this will allow more of the pendulum dynamics

to be seen. This will make the model more accurate.

2. The Box-jenkins/Output error models are the only structures that can adequately model the

pendulum using the closed loop data.

3. The best way to test the quality of a model is to construct a transfer block of the model and

simulate it using a different initial input seed.

To achieve a better approximation of the inverted pendulum, the nonlinear system must be

used. The linear identification techniques were applied to the nonlinear pendulum system and

were found to be inadequate in modelling the nonlinear nature of the system. The nonlinear

nature of neural networks gives them an advantage over linear models in the prediction of

non-linear systems. Before the inverted pendulum system is identified, the process is

stabilized using the control law. The control law removes some of the nonlinearities from the

process so a detuned control law was used which allows the process to exhibit more of its

dynamics. This improves the quality of the data used in the system identification.

Initially single-input single-output networks were developed, the input being the control force

and the output pendulum angle. The first type of neural network to be developed are

feedforward. Feedforward networks with a range of hidden layer neurons were tested. The

feedforward networks modelled the inverted pendulum well. The MSE between the process

and the neuron model is low and the model predicts the dynamics of the pendulum angle. In

open-loop identification, increasing the number of hidden layer neurons will have a direct

influence on the accuracy of the model. In the closed loop case, it was found that using a

detuned controller had more of an influence on the model accuracy than increasing the number

of hidden layer neurons.

Recurrent ‘Elman’ networks are the second type of neural networks to be developed. Elman

networks have built in feedback loops which enable them to model dynamic systems such as

the inverted pendulum more accurately than static feedforward networks. Elman networks

with different sizes of hidden layer were tested. The results indicate that Elman networks

were not as accurate as feedforward in approximating the inverted pendulum. It was found that

when training the Elman networks to model the inverted pendulum, the training would get

stuck at a local minima. This affected the accuracy of the models developed. The poor results

of the Elman networks are due to the fact that the training data is from a closed loop system.

The next stage in system identification was to develop a multi-output neural network which

models the four outputs of the inverted pendulum. A feedforward network with 100 hidden

layer neurons was used to model the process. The neural network developed could model the

pendulum angle and the cart position accurately but completely fails to model the velocity of

the cart and angular velocity of the pendulum.

The main task in the project was to design a controller which keeps the pendulum system

inverted. The four main types of neural control (supervised, unsupervised, direct inverse and

internal model control) were researched to determine which control technique would be the

most efficient to implement. The earliest application of neural networks to the inverted

pendulum is by Widrow and Smith [25] and Widrow [26]. They used traditional control

methods to derive a control law to stabilize the linearized system. They then trained a neural

network to mimic the output of the control law. It was decided that supervised control would

be the least complex to implement. It was not possible to develop direct inverse control

because this control method requires that the process to be controlled is already open-loop

stable. The unsupervised control technique developed by Anderson was just too complex for

the project time frame. The first neuro-controller was developed by training a feedforward

network to model the control law. Elman networks were also used here to model the control

law but were not as accurate. When the training was finished the neural network was exported

into simulink and the network was placed in the feedback loop instead of the existing control

law. The neural network controlled the inverted pendulum similar to the control law.

An experiment was set-up which creates a disturbance to the process during the simulation.

The neural network lost control of the inverted pendulum because it was unable to adjust its

weights to counteract this disturbance. This problem was solved by using the adaptive neural

toolbox. This toolbox makes it possible for online neural learning to occur. Two types of

neural network were used – Adaline and multi-layered perceptron (MLP). The ANN was

trained offline using the control law. The advantage of using this type of network is if a

disturbance occurs during operation, the error signal is fed back into the Adaline which adjusts

the weights of the network and this counteracts the disturbance. The Adaline adaptive block is

designed for approximating ‘almost linear’ functions. It was found that the Adaline could

approximate the control law very accurately.

It was decided to test some of the identification and control techniques on the real time

inverted pendulum rig. The real-time inverted pendulum is also open-loop unstable. The real

time kernal (RTK) uses standard PID controllers to stabilize the system. Online identification

was possible using the adaptive neural toolbox. It was not possible to develop a neural

controller for the real time system but significant progress was made.

SSccooppee ffoorr ffuuttuurree wwoorrkk

The results from the Elman networks were not as accurate as the feedforward networks. The

dynamic Elman networks should have been more accurate when modelling a dynamic system

such as the inverted pendulum. This could be investigated. When modelling the inverted

pendulum closed loop identification must be used. One of the faults of closed loop

identification is the controller removes some of the dynamics of the process. More research is

needed in developing models from closed loop data.

The neural network controllers developed in the project were all based on the traditional

control law developed. When training an ANN using supervised learning there must be an

existing controller to copy. In order to develop a control law the dynamics of the process must

be known. If it is not possible to develop a control law or the dynamics of the process are not

known then there is no way to train a neural network. A solution to this problem is developing

an unsupervised controller. Unsupervised control does not require an accurate model of the

system dynamics or the systems desired behaviour. The only feedback signal to the controller

is a failure signal when the pendulum falls past a certain angle. The control signal must learn

through experience by trying various actions. The work done by Anderson [23] in

unsupervised control gives practical guidelines in developing a controller. The next possible

future research could be on unsupervised control of the inverted pendulum. Supervised control

with neural networks has been done a thousand times now and unsupervised control is a more

difficult but interesting problem.

88 BBiibblliiooggrraapphhyy

[1] Friedland, Bernard. (1987), Control System design, New York: McGraw-Hill,

pp 30-52

[2] Ljung, L (1987) System Identification-Theory for the user, Prentice Hall

[3] Nechyba and Xu, (1994), “Neural network approach to control system identification

with variable activation functions”, Robotics Institute, Carnegie-Mellon University

[4] Guez, A., Selinsky,J., “A trainable neuromorphic controller, Journal of robotic

systems”, Vol 5, No.4, pp 363-388, 1988.

[5] Davalo, Naim Neural Networks, Macmillan.

[6] Hunt and Sbarbaro, “Neural Networks for Control System - A Survey”,

Automatica, Vol. 28, 1992, pp. 1083-1112.

[7] Neural Network Toolbox Users Guide, October 1998, The Mathworks Inc.

[8] Pham and Liu, Neural Networks for Identification, Prediction and Control, Springer

[9] Cybenko,G “Approximation by superposition of a Sigmoidal Function, Mathematics

of Control, Signals and Systems”, Vol 2, No. 4, pp 303-314, 1989.

[10] Saerens M., Soquet A., “Neural Controller based on back-propagation algorithm”, IEE

Proceedings –F, Vol. 138, No.1, pp 55-62,1991

[11] Narendra K.S., Parthasarathy K., “Identification an control of dynamical systems using

neural networks, IEEE transactions on neural networks”, Vol.1, No.1, pp 4-27.

[12] Ljung, L System Identification-Theory for the user, Prentice Hall

[13] Billings, S.A., “Introduction to nonlinear system analysis and identification”. In K

Godfrey and P Jones, Signal Processing for control, Springer-Verlag, Berlin.

[14] Ljung, L System Identification-Theory for the user, Prentice Hall

[15] Johansson, Rolf – System modelling and identification, Prentice Hall

[16] Snow,W. and Emigholz,K., “Increase model predictive control (MPC) project

efficiency by using a modern identification method.”, ERTC Computing, Paris, France

[18] Hagan,M and Demuth,H- Neural Network Design, Boston,PWS, 1996

[19] Marco,P and Raul,L “Application of several neurocontrolschemes to a 2 DOF

manipulator”.

[20] Magnus Norgaard, Neural Network Design Toolkit,

http://www.iau.dtu.dk/research/control/nnlib/manual.pdf

[22] Barto, Sutton and Anderson, “Neuronlike adaptive elements that can solve difficult

learning control problems”, IEEE Trans on Systems, Man and Cybernetics,

Vol SMC-13, pp834-846, Sept-Oct 1983

[23] C.W. Anderson. “Learning to control an inverted pendulum using neural networks”,

IEEE Controls Systems Magazine, 9:31-37, 1989.

[24] Campa, Fravolini, Napolitano- “A library of Adaptive neural networks for control

purposes.” The simulink library can be downloaded from the Mathworks file

exchange website in the ANN section.

http://www.mathworks.com/matlabcentral/fileexchange/

[25] Widrow, B. and Smith,F., “Pattern-Recognising Control Systems,”1963 Computer and

Information Sciences (COINS) Symp. Proc., Washington DC: Spartan, pp288-317,

[26] Widrow, B., “The Original Adaptive Neural Net Broom-Balancer,” Int. Symp. Circuits

and Systems, Vol.5, no.4, pp. 363-388, Aug. 1988.

artificial neural network identification and control of the inverted

Documents

artificial neural network model & hidden layers in...

artificial neural network

artificial neural networks for data...

rapidly adapting artificial neural networks for autonomous...

jacek mazurkiewicz, phd softcomputing · softcomputing part...

artificial neural networking

artificial intelligence: artificial neural networks

artificial neural networks - lasar · in an artificial...

artificial neural networks

artificial neural network

artificial neural networks technology - iz3mez · 2 2.0...

artificial neural networks -...

13 artificial intelligence-neuralnetworks · artificial...

artificial neural netwoks2

artificial neural networks

(artificial) neural network

artificial neural

chapter 4 artificial neural networks · 4.4 an artificial...

artificial neural nets

spatial predictive mapping using artificial neural · pdf...