feedforward systems

7/29/2019 Feedforward Systems

http://slidepdf.com/reader/full/feedforward-systems 1/4

Feedforward neural network 1

Feedforward neural network

In a feed forward network information always moves one direction; it never goes

backwards.

A feedforward neural network is an

artificial neural network where

connections between the units do not

form a directed cycle. This is different

from recurrent neural networks.

The feedforward neural network was the

first and arguably simplest type of

artificial neural network devised. In this

network, the information moves in only

one direction, forward, from the input

nodes, through the hidden nodes (if any)

and to the output nodes. There are no

cycles or loops in the network.

Single-layer perceptron

The simplest kind of neural network is a

single-layer perceptron network, which

consists of a single layer of output

nodes; the inputs are fed directly to the

outputs via a series of weights. In this

way it can be considered the simplest

kind of feed-forward network. The sumof the products of the weights and the

inputs is calculated in each node, and if

the value is above some threshold (typically 0) the neuron fires and takes the activated value (typically 1); otherwise

it takes the deactivated value (typically -1). Neurons with this kind of activation function are also called artificial

neurons or linear threshold units. In the literature the term perceptron often refers to networks consisting of just one

of these units. A similar neuron was described by Warren McCulloch and Walter Pitts in the 1940s.

A perceptron can be created using any values for the activated and deactivated states as long as the threshold value

lies between the two. Most perceptrons have outputs of 1 or -1 with a threshold of 0 and there is some evidence that

such networks can be trained more quickly than networks created from nodes with different activation and

deactivation values.

Perceptrons can be trained by a simple learning algorithm that is usually called the delta rule. It calculates the errors

between calculated output and sample output data, and uses this to create an adjustment to the weights, thus

implementing a form of gradient descent.

Single-unit perceptrons are only capable of learning linearly separable patterns; in 1969 in a famous monograph

entitled Perceptrons, Marvin Minsky and Seymour Papert showed that it was impossible for a single-layer

perceptron network to learn an XOR function. It is often believed that they also conjectured (incorrectly) that a

similar result would hold for a multi-layer perceptron network. However, this is not true, as both Minsky and Papert

already knew that multi-layer perceptrons were capable of producing an XOR Function. (See the page on

Perceptrons for more information.)

http://en.wikipedia.org/w/index.php?title=Perceptrons_%28book%29

http://en.wikipedia.org/w/index.php?title=XOR_function

http://en.wikipedia.org/w/index.php?title=Seymour_Papert

http://en.wikipedia.org/w/index.php?title=Marvin_Minsky

http://en.wikipedia.org/w/index.php?title=Perceptrons_%28book%29

http://en.wikipedia.org/w/index.php?title=Monograph

http://en.wikipedia.org/w/index.php?title=Linearly_separable

http://en.wikipedia.org/w/index.php?title=Gradient_descent

http://en.wikipedia.org/w/index.php?title=Delta_rule

http://en.wikipedia.org/w/index.php?title=Walter_Pitts

http://en.wikipedia.org/w/index.php?title=Warren_McCulloch

http://en.wikipedia.org/w/index.php?title=Perceptron

http://en.wikipedia.org/w/index.php?title=Artificial_neurons

http://en.wikipedia.org/w/index.php?title=Artificial_neurons

http://en.wikipedia.org/w/index.php?title=Activation_function

http://en.wikipedia.org/w/index.php?title=Recurrent_neural_networks

http://en.wikipedia.org/w/index.php?title=Directed_cycle

http://en.wikipedia.org/w/index.php?title=Artificial_neural_network

http://en.wikipedia.org/w/index.php?title=File%3AFeed_forward_neural_net.gif




Although a single threshold unit is quite limited in its computational power, it has been shown that networks of

parallel threshold units can approximate any continuous function from a compact interval of the real numbers into

the interval [-1,1]. This very recent result can be found in Peter Auer, Harald Burgsteiner and Wolfgang Maass "A

learning rule for very simple universal approximators consisting of a single layer of perceptrons".[1]

A multi-layer neural network can compute a continuous output instead of a step function. A common choice is the

so-called logistic function:

(In general form, f(X) is in place of x, where f(X) is an analytic function in set of x's.) With this choice, the

single-layer network is identical to the logistic regression model, widely used in statistical modeling. The logistic

function is also known as the sigmoid function. It has a continuous derivative, which allows it to be used in

backpropagation. This function is also preferred because its derivative is easily calculated:

(times , in general form, according to the Chain Rule)

Multi-layer perceptron

A two-layer neural network capable of calculating XOR. The

numbers within the neurons represent each neuron's explicit

threshold (which can be factored out so that all neurons have the

same threshold, usually 1). The numbers that annotate arrowsrepresent the weight of the inputs. This net assumes that if the

threshold is not reached, zero (not -1) is output. Note that the bottom

layer of inputs is not always considered a real neural network layer

This class of networks consists of multiple layers of

computational units, usually interconnected in a

feed-forward way. Each neuron in one layer has

directed connections to the neurons of the subsequent

layer. In many applications the units of these networks

apply a sigmoid function as an activation function.

The universal approximation theorem for neural

networks states that every continuous function that

maps intervals of real numbers to some output interval

of real numbers can be approximated arbitrarily closely

by a multi-layer perceptron with just one hidden layer.

This result holds only for restricted classes of activation

functions, e.g. for the sigmoidal functions.

Multi-layer networks use a variety of learning

techniques, the most popular being back-propagation.

Here, the output values are compared with the correct

answer to compute the value of some predefined

error-function. By various techniques, the error is then

fed back through the network. Using this information,

the algorithm adjusts the weights of each connection in

order to reduce the value of the error function by some

small amount. After repeating this process for a

sufficiently large number of training cycles, the network will usually converge to some state where the error of the

calculations is small. In this case, one would say that the network has learned a certain target function. To adjust

weights properly, one applies a general method for non-linear optimization that is called gradient descent. For this,

the derivative of the error function with respect to the network weights is calculated, and the weights are then

changed such that the error decreases (thus going downhill on the surface of the error function). For this reason,

back-propagation can only be applied on networks with differentiable activation functions.

http://en.wikipedia.org/w/index.php?title=Gradient_descent

http://en.wikipedia.org/w/index.php?title=Optimization_%28mathematics%29

http://en.wikipedia.org/w/index.php?title=Back-propagation

http://en.wikipedia.org/w/index.php?title=Universal_approximation_theorem

http://en.wikipedia.org/w/index.php?title=File%3AXOR_perceptron_net.png

http://en.wikipedia.org/w/index.php?title=Chain_Rule

http://en.wikipedia.org/w/index.php?title=Sigmoid_function

http://en.wikipedia.org/w/index.php?title=Logistic_function


http://en.wikipedia.org/w/index.php?title=Statistical_model

http://en.wikipedia.org/w/index.php?title=Logistic_regression

http://en.wikipedia.org/w/index.php?title=Analytic_function


http://en.wikipedia.org/w/index.php?title=Step_function

http://en.wikipedia.org/w/index.php?title=Wolfgang_Maass

http://en.wikipedia.org/w/index.php?title=Harald_Burgsteiner

http://en.wikipedia.org/w/index.php?title=Peter_Auer




In general, the problem of teaching a network to perform well, even on samples that were not used as training

samples, is a quite subtle issue that requires additional techniques. This is especially important for cases where only

very limited numbers of training samples are available.[2]

The danger is that the network overfits the training data

and fails to capture the true statistical process generating the data. Computational learning theory is concerned with

training classifiers on a limited amount of data. In the context of neural networks a simple heuristic, called early

stopping, often ensures that the network will generalize well to examples not in the training set.

Other typical problems of the back-propagation algorithm are the speed of convergence and the possibility of ending

up in a local minimum of the error function. Today there are practical solutions that make back-propagation in

multi-layer perceptrons the solution of choice for many machine learning tasks.

ADALINE

ADALINE stands for Adaptive Linear Element. It was developed by Professor Bernard Widrow and his graduate

student Ted Hoff at Stanford University in 1960. It is based on the McCulloch-Pitts model and consists of a weight, a

bias and a summation function.

Operation:Its adaptation is defined through a cost function (error metric) of the residual where is the

desired output. With the MSE error metric the adapted weight and bias become:

and

The Adaline has practical applications in the controls area. A single neuron with tap delayed inputs (the number of

inputs is bounded by the lowest frequency present and the Nyquist rate) can be used to determine the higher order

transfer function of a physical system via the bi-linear z-transform. This is done as the Adaline is, functionally, an

adaptive FIR filter. Like the single-layer perceptron, ADALINE has a counterpart in statistical modelling, in this

case least squares regression.

There is an extension of the Adaline, called the Multiple Adaline (MADALINE) that consists of two or more

adalines serially connected.

References

[1] Auer, Peter; Harald Burgsteiner; Wolfgang Maass (2008). "A learning rule for very simple universal approximators consisting of a single

layer of perceptrons" (http:/ / www. igi. tugraz. at/ harry/ psfiles/ biopdelta-07. pdf). Neural Networks 21 (5): 786 – 795.

doi:10.1016/j.neunet.2007.12.036. PMID 18249524. .

[2] Roman M. Balabin, Ravilya Z. Safieva, and Ekaterina I. Lomakina (2007). "Comparison of linear and nonlinear calibration models based on

near infrared (NIR) spectroscopy data for gasoline properties prediction". Chemometr Intell Lab 88 (2): 183 – 188.

doi:10.1016/j.chemolab.2007.04.006.

External links

• Feedforward neural networks tutorial (http:/ / www. emilstefanov. net/ Projects/ NeuralNetworks. aspx)

• Feedforward Neural Network: Example (http:/ / wiki. syncleus. com/ index. php/

DANN:Backprop_Feedforward_Neural_Network)

• Feedforward Neural Networks: An Introduction (http:/ / media. wiley. com/ product_data/ excerpt/ 19/ 04713491/

0471349119. pdf)

http://media.wiley.com/product_data/excerpt/19/04713491/0471349119.pdf

http://media.wiley.com/product_data/excerpt/19/04713491/0471349119.pdf

http://wiki.syncleus.com/index.php/DANN:Backprop_Feedforward_Neural_Network

http://wiki.syncleus.com/index.php/DANN:Backprop_Feedforward_Neural_Network

http://www.emilstefanov.net/Projects/NeuralNetworks.aspx

http://en.wikipedia.org/w/index.php?title=Chemometrics_and_intelligent_laboratory_systems

http://www.igi.tugraz.at/harry/psfiles/biopdelta-07.pdf

http://en.wikipedia.org/w/index.php?title=Regression_analysis

http://en.wikipedia.org/w/index.php?title=Least_squares

http://en.wikipedia.org/w/index.php?title=Mean_squared_error

http://en.wikipedia.org/w/index.php?title=Stanford_University

http://en.wikipedia.org/w/index.php?title=Ted_Hoff

http://en.wikipedia.org/w/index.php?title=Bernard_Widrow

http://en.wikipedia.org/w/index.php?title=Machine_learning

http://en.wikipedia.org/w/index.php?title=Local_minimum

http://en.wikipedia.org/w/index.php?title=Early_stopping

http://en.wikipedia.org/w/index.php?title=Early_stopping

http://en.wikipedia.org/w/index.php?title=Heuristic

http://en.wikipedia.org/w/index.php?title=Computational_learning_theory

http://en.wikipedia.org/w/index.php?title=Overfitting



Article Sources and Contributors 4

Article Sources and ContributorsFeedforward neural network Source: http://en.wikipedia.org/w/index.php?oldid=533954405 Contributors: Alai, Arny, BD2412, Banus, Cambion, CesarB, Cesarsouza, Chakmeshma,

DarkAudit, Ddxc, Edaeda, Eihjia, Hongooi, Mdanh2002, Paskari, RPHv, Rama, Riyad parvez, Ronz, Saravask, SchreiberBike, Sct72, Seliopou, Shinosin, Tabletop, Twexcom, Tyrell turing,

UltimateXiphias, Welshcorgi, Wheresmysocks, Zawersh, 34 anonymous edits

Image Sources, Licenses and ContributorsImage:Feed forward neural net.gif Source: http://en.wikipedia.org/w/index.php?title=File:Feed_forward_neural_net.gif License: GNU Free Documentation License Contributors: Paskari

Image:XOR perceptron net.png Source: http://en.wikipedia.org/w/index.php?title=File:XOR_perceptron_net.png License: Public Domain Contributors: Nv8200p, Robert.ensor, 2 anonymous

edits

License

Creative Commons Attribution-Share Alike 3.0 Unported //creativecommons.org/licenses/by-sa/3.0/

feedforward systems

Documents