artificial intelligence ml u4

Upload: sidhu-sid

Post on 03-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 artificial intelligence ML u4

    1/100

    Machine Learning

  • 8/12/2019 artificial intelligence ML u4

    2/100

    Introduction

    What is learning ? Learning is any process by which a system

    improves performance from experience.

    !erbert "imon

    Learning is constructing or modifyingrepresentations of what is being experienced.

    #ys$ard Michals%i

  • 8/12/2019 artificial intelligence ML u4

    3/100

    &

    Why learn?

    'uild software agents that can adapt to their users or to othersoftware agents or to changing environments

    (ersonali$ed news or mail filter

    (ersonali$ed tutoring

    Mars robot

    )iscover new thingsor structure that were previously un%nown to

    humans

    *xamples+ data mining, scientific discovery

    ML as a subfield of -I is concerned with design and development

    of algorithms and techniues that allow computer to learn.

    "imulation of Intelligence reuires features such as %nowledge

    acuisition, inference, updating or refinement of %nowledge base

    etc. /hus we can sum up by saying that learning is an important

    aspect of intelligence.

  • 8/12/2019 artificial intelligence ML u4

    4/100

    /ypes of Learning Methodologies

    Inductive learning+

    #euired rules and patterns are extracted from

    massive data sets.

    )eductive learning+ )educing new %nowledge from already existing

    %nowledge.

  • 8/12/2019 artificial intelligence ML u4

    5/100

    0

    -pplications

    -ssign ob1ect2event to one of a given finiteset of categories. Medical diagnosis

    3redit card applications or transactions 4raud detection in e5commerce

    "pam filtering in email

    #ecommended boo%s, movies, music

    4inancial investments

    6ame playing

    !andwritten letters

  • 8/12/2019 artificial intelligence ML u4

    6/100

    Machine5Learning "ystems

    3omponents of a Learning "ystem

    1. Learning component+ /o ma%e changes or

    improvements to the system depending on

    its performance.

    2. Performance element+ It performs the tas%

    ofchoosing the actions that need to be ta%en.

    &. Critic+ /he 1ob of the critic is to inform the

    learning component regarding itserformance

  • 8/12/2019 artificial intelligence ML u4

    7/100

    7. Problem generator+ It suggests

    problems or

    actions that would lead to generation of

    new

    examples or experiences.

    0. Sensors and effectors+ 'oth thesecomponents are external to the system.

  • 8/12/2019 artificial intelligence ML u4

    8/100

    8

    - general model of learning

    agents

  • 8/12/2019 artificial intelligence ML u4

    9/100

    9

    Ma1or paradigms of machine

    learning Rote learning Learning by memori$ation.

    eg5 3aching. "teps+ :rgani$ation, 6enerali$ation, "tability of ;nowledge.

    Learning by taking advice /a%ing high level and abstract advice and thenconverting it into rules. eg. *xpert "ystems "teps+ #euest, Interpret, :perationali$e, Integrate, *valuation.

    Learning by Parameter Adjustment "teps+

    Initially start with some estimate of the correct weight settings. /hen modify the weight in the program on the basis of accumulated experiences. Increase or decrease the weights of features that appear to be good or bad

    predictors respectively.

    Learning by acro!"perators "imilar to rote learning, instead we avoidexpensive re5computation by using macro5operators that are learnt forsubseuent use.

    Learning by Analogy )etermine correspondence between two differentrepresentations . Identified as 3-"* '-"*) #*":

  • 8/12/2019 artificial intelligence ML u4

    10/100

    "upervised and unsupervised

    Learning Supervised learning =se specific examples

    to reach general conclusions or extract generalrules

    3lassification >3oncept learning

    #egression

    #nsupervised learning $Clustering%=nsupervised identification of natural groups indata

  • 8/12/2019 artificial intelligence ML u4

    11/100

    1) Neural Network Based Learning

    It is a system loosely modeled based on the

    human brain.

    The basic computational element (model

    neuron) is often called a node or unit. It

    receives input from some other units, or

    perhaps from an external source. Each input

    has an associated weight w, which can bemodified by the learning methods.

  • 8/12/2019 artificial intelligence ML u4

    12/100

    @A

    2% Supervised concept

    learning 6iven a training set of positive and

    negative examples of a concept

    3onstruct a description that will accurately

    classify whether future examples are

    positive or negative

    /hat is, learn some good estimate of

    function f given a training set B>x@, y@, >xA,

    yA, ..., >xn, ynC where each yiis either D

    >positive or 5 >negative, or a probability

    distribution over D25

  • 8/12/2019 artificial intelligence ML u4

    13/100

    &% Probability appro'imating

    Correct Learning

    In the (-3 model, we specify two small

    parameters, E and F, and reuire that with

    probability at least >@ F a system learns

    a concept with error at most E.

  • 8/12/2019 artificial intelligence ML u4

    14/100

    7 #einforcement Learning

    )ecision ma%ing >robot, chess machine

    'asic ;inds+ =tility 4unction

    -ction Galue 4unction

  • 8/12/2019 artificial intelligence ML u4

    15/100

    @0

    /he inductive learning

    problem *xtrapolate from a given set of

    examples to ma%e accurate predictionsabout future examples

    "upervised versus unsupervisedlearning Learn an un%nown function f>H J, where

    H is an input example and J is the desiredoutput.

    Supervised learningimplies we are givena training setof >H, J pairs by a

    teacher.

  • 8/12/2019 artificial intelligence ML u4

    16/100

    @K

    /he inductive learning

    problem *xtrapolate from a given set of examples to ma%e accurate

    predictions about future examples

    "upervised versus unsupervised learning

    Learn an un%nown function f>H J, where H is an inputexample and J is the desired output.

    Supervised learningimplies we are given a training setof>H, J pairs by a teacher

    #nsupervised learningmeans we are only given the Hsand some >ultimate feedbac% function on our performance.

    3oncept learning or classification

    6iven a set of examples of some concept2class2category,

    determine if a given example is an instance of the concept ornot

    If it is an instance, we call it a positive example

    If it is not, it is called a negative example

    :r we can ma%e a probabilistic prediction >e.g., using a

    'ayes net

  • 8/12/2019 artificial intelligence ML u4

    17/100

    @

    Inductive learning framewor%

    #aw input data from sensors are typically preprocessedto obtain a feature vector, H, that adeuately describesall of the relevant features for classifying examples

    *ach x is a list of >attribute, value pairs. 4or example,H (erson+"ue, *ye3olor+'rown, -ge+Joung,

    "ex+4emaleN

    /he number of attributes is fixed

    *ach attribute has a fixed, finite number of possiblevalues >or could be continuous

    *ach example can be interpreted as a point in an n5dimensional feature space, where n is the number ofattributes.

  • 8/12/2019 artificial intelligence ML u4

    18/100

    @8

    Learning decision trees6oal+ 'uild a decision treeto classifyexamples as positive or negative

    instances of a concept using supervised

    learning from a training set

    - decision treeis a tree where

    each non5leaf node has associatedwith it an attribute >feature

    each leaf node has associated with it a

    classification >D or 5

    each arc has associated with it one of

    the possible values of the attribute atthe node from which the arc is directed

    6enerali$ation+ allow for OA classes

    e.g., Bsell, hold, buyC

    Color

    S(apeSi)e *

    *! Si)e

    *!

    *

    big

    big small

    small

    roundsuare

    redgreen blue

  • 8/12/2019 artificial intelligence ML u4

    19/100

    @9

    )ecision tree5induced partition

    example

    Color

    S(apeSi)e *

    *! Si)e

    *!

    *

    big

    big small

    small

    roundsuare

    redgreen brown

    I

  • 8/12/2019 artificial intelligence ML u4

    20/100

    AP

    Inductive learning and bias

    "uppose that we want to learn a function f>x y and we are

    given some sample >x,y pairs, as in figure >a

    /here are several hypotheses we could ma%e about this

    function, e.g.+ >b, >c and >d

    - preference for one over the others reveals the biasof ourlearning techniue, e.g.+

    prefer piece5wise functions

    prefer a smooth function

    prefer a simple function and treat outliers as noise

  • 8/12/2019 artificial intelligence ML u4

    21/100

    A@

    3hoosing the best attribute

    /he %ey problem is choosing which attributeto split a given set of examples

    "ome possibilities are+ Random+"elect any attribute at random

    Least!,alues+3hoose the attribute with thesmallest number of possible values

    ost!,alues+3hoose the attribute with thelargest number of possible values

    a'!-ain+3hoose the attribute that has thelargest expected information gaini.e., theattribute that will result in the smallest expectedsi$e of the subtrees rooted at its children

    /he I)& algorithm uses the Max56ain method

    of selecting the best attribute

  • 8/12/2019 artificial intelligence ML u4

    22/100

    )eductive learning

    Wor%ing on already existing facts and

    %nowledge and simply deducing new

    %nowledge from the existing one.

    If - >assertion then '>conclusion.

    1%Probability based learning $ayesian

    Learning%

    2%Adaptive dynamic learning

  • 8/12/2019 artificial intelligence ML u4

    23/100

    3lustering -lgorithms

    *xclusive 3lustering

    ;5means

    :verlapping 3lustering

    4u$$y 35means >43M

    !ierarchical 3lustering

  • 8/12/2019 artificial intelligence ML u4

    24/100

    "upport Gector Machines

    /he classifier is a separating hyperplane.

    Most important training points are support vectorsQ they define

    the hyperplane.

    Ruadratic optimi$ation algorithms can identify which training points

    'iare support vectors with non5$ero Lagrangian multipliers i.

    'oth in the dual formulation of the problem and in the solution

    training points appear only inside inner products+

    4ind 1Nsuch that

    />0Si 5 SSijyiyj'i3'jis maximi$ed

    and

    >@ Siyi P

    >A P 4i4 Cfor all i

    f>' Siyi'i3' * b

  • 8/12/2019 artificial intelligence ML u4

    25/100

    Linear 3lassifiers

    Copyright 2001, 2003,Andrew W. Moore

    fx

    yest

    denotes +1

    denotes -1

    f(x,w,b) = sign(w. x- b)

    How woud you

    !"ssi#y this d"t"$

  • 8/12/2019 artificial intelligence ML u4

    26/100

    Linear 3lassifiers

    Copyright 2001, 2003,Andrew W. Moore

    fx

    yest

    denotes +1

    denotes -1

    f(x,w,b) = sign(w. x- b)

    How woud you

    !"ssi#y this d"t"$

  • 8/12/2019 artificial intelligence ML u4

    27/100

    Linear 3lassifiers

    Copyright 2001, 2003,Andrew W. Moore

    fx

    yest

    denotes +1

    denotes -1

    f(x,w,b) = sign(w. x- b)

    How woud you

    !"ssi#y this d"t"$

  • 8/12/2019 artificial intelligence ML u4

    28/100

    Linear 3lassifiers

    Copyright 2001, 2003,Andrew W. Moore

    fx

    yest

    denotes +1

    denotes -1

    f(x,w,b) = sign(w. x- b)

    How woud you

    !"ssi#y this d"t"$

  • 8/12/2019 artificial intelligence ML u4

    29/100

    Linear 3lassifiers

    Copyright 2001, 2003,Andrew W. Moore

    fx

    yest

    denotes +1

    denotes -1

    f(x,w,b) = sign(w. x- b)

    Any o# these

    woud %e #ine..

    ..%ut whi!h is

    %est$

  • 8/12/2019 artificial intelligence ML u4

    30/100

    3lassifier Margin

    Copyright 2001, 2003,Andrew W. Moore

    fx

    yest

    denotes +1

    denotes -1

    f(x,w,b) = sign(w. x- b)

    &e#ine the '"rgin

    o# " ine"r

    !"ssi#ier "s thewidth th"t the

    %ound"ry !oud %e

    in!re"sed %y

    %e#ore hitting "d"t"point.

  • 8/12/2019 artificial intelligence ML u4

    31/100

    Maximum Margin

    Copyright 2001, 2003,Andrew W. Moore

    fx

    yest

    denotes +1

    denotes -1

    f(x,w,b) = sign(w. x- b)

    (he '")i'u'

    '"rgin ine"r

    !"ssi#ieris theine"r !"ssi#ier

    with the, u',

    '")i'u' '"rgin.

    (his is thesi'pest *ind o#

    M C"ed "n

    M/

    Linear "GM

  • 8/12/2019 artificial intelligence ML u4

    32/100

    Maximum Margin

    Copyright 2001, 2003,Andrew W. Moore

    fx

    yest

    denotes +1

    denotes -1

    f(x,w,b) = sign(w. x- b)

    (he '")i'u'

    '"rgin ine"r

    !"ssi#ieris theine"r !"ssi#ier

    with the, u',

    '")i'u' '"rgin.

    (his is thesi'pest *ind o#

    M C"ed "n

    M/

    upport e!tors

    "re those

    d"t"points th"t

    the '"rginpushes up

    "g"inst

    Linear "GM

  • 8/12/2019 artificial intelligence ML u4

    33/100

    *stimate the Margin

    What is the distance expression for a point '

    to a line5'Db P?

    Copyright 2001, 2003,Andrew W. Moore

    denotes +1

    denotes -1x wx +% 0

    2 2

    2

    ( )d

    ii

    b bd

    w=

    + += =

    x w x w

    x

    w

  • 8/12/2019 artificial intelligence ML u4

    34/100

    *stimate the Margin

    What is the expression for margin?

    Copyright 2001, 2003,Andrew W. Moore

    denotes +1

    denotes -1 wx +% 0

    2

    margin min ( ) mindD D

    ii

    bd

    w

    =

    + =

    x xx w

    x

    Margin

  • 8/12/2019 artificial intelligence ML u4

    35/100

    Maximi$e Margin

    Copyright 2001, 2003,Andrew W. Moore

    denotes +1

    denotes -1 wx +% 0

    ,

    ,

    2,

    argmax margin( , , )

    ! argmax min ( )

    argmax min

    i

    i

    b

    iDb

    i

    dDbii

    b D

    d

    b

    w

    =

    + =

    w

    xw

    xw

    w

    x

    x w

    Margin

  • 8/12/2019 artificial intelligence ML u4

    36/100

    Maximi$e Margin

    Copyright 2001, 2003,Andrew W. Moore

    denotes +1

    denotes -1 wx +% 0

    ( )

    2,

    argmax min

    sub"ect to # $

    i

    i

    dDbi

    i

    i i i

    b

    w

    D y b

    =

    +

    + >

    xw

    x w

    x x w

    Margin

  • 8/12/2019 artificial intelligence ML u4

    37/100

    Maximi$e Margin

    "trategy+

    Copyright 2001, 2003,Andrew W. Moore

    denotes +1

    denotes -1

    wx +% 0

    ( )

    2,

    argmax min

    sub"ect to # $

    i

    i

    dDb

    ii

    i i i

    b

    wD y b

    =

    +

    + xw

    x w

    x x w

    Margin

    # i iD b + x x w( )

    2

    ,

    argmin

    sub"ect to #

    d

    iib

    i i i

    w

    D y b

    =

    + w

    x x w

  • 8/12/2019 artificial intelligence ML u4

    38/100

    Maximum Margin Linear

    3lassifier

    !ow to solve such a convex optimi$ation

    problem ?Copyright 2001, 2003,

    Andrew W. Moore

    ( )

    ( )

    ( )

    % % 2

    ,

    2 2

    & , '! argmin

    sub"ect to

    ....

    d

    kkw b

    N N

    w b w

    y w x b

    y w x b

    y w x b

    =

    +

    +

    +

    r

    r

    r r

    r r

    r r

  • 8/12/2019 artificial intelligence ML u4

    39/100

    Lagrange Multiplier Method

    /he new ob1ective function is called the

    Lagrangian for the optimi$ation problem+

    Lp TUUWUUV5 Xi >yi>w.xiD b @5555>@

    Xi555 Lagrange multiplier

    (artially )ifferentiating Lp w.r.t YwZ and YbZ weget5

    555>A

    'ecause the La ran e multi liers areCopyright 2001, 2003,

    Andrew W. Moore

    w =iyixi and iyi= 0

  • 8/12/2019 artificial intelligence ML u4

    40/100

    It can be handled only when

    [i\ P,

    [i yi>w.xi D b @N P

    /hese are %nown as the ;arush5;uhn5

    /uc%er >;;/ conditions.

    4rom the above euation YbZ can be

    calculated.

    "ubtituting the values from en. >A in en.

    >@, we get5

    Copyright 2001, 2003,

    Andrew W. Moore

    Linear "GM+

  • 8/12/2019 artificial intelligence ML u4

    41/100

    Linear "GM+

  • 8/12/2019 artificial intelligence ML u4

    42/100

    "upport Gector Machine >"GM for

  • 8/12/2019 artificial intelligence ML u4

    43/100

  • 8/12/2019 artificial intelligence ML u4

    44/100

    "upport Gector Machine for

  • 8/12/2019 artificial intelligence ML u4

    45/100

    "GM ;ernel 4unctions

    K(a,b)=(a. b1)!is an example of an

    "GM ;ernel 4unction

    'eyond polynomials there are other very

    high dimensional basis functions that can

    be made practical by finding the right

    ;ernel 4unction

    #adial5'asis5style ;ernel 4unction+

  • 8/12/2019 artificial intelligence ML u4

    46/100

    ;ernel /ric%s

    #eplacing dot product with a %ernel

    function

    a >b

    3ould ;>a,b >a5b&be a %ernel function ?

    3ould ;>a,b >a5b7 >aDbAbe a %ernel

    function?

    Copyright 2001, 2003,Andrew W. Moore

  • 8/12/2019 artificial intelligence ML u4

    47/100

  • 8/12/2019 artificial intelligence ML u4

    48/100

  • 8/12/2019 artificial intelligence ML u4

    49/100

    3ontd..

    -n -rtificial

  • 8/12/2019 artificial intelligence ML u4

    50/100

  • 8/12/2019 artificial intelligence ML u4

    51/100

    /he

  • 8/12/2019 artificial intelligence ML u4

    52/100

    'ias of a

  • 8/12/2019 artificial intelligence ML u4

    53/100

  • 8/12/2019 artificial intelligence ML u4

    54/100

    c

    b

    a

    +tep unction

  • 8/12/2019 artificial intelligence ML u4

    55/100

    c d

    b

    a

    1amp unction

  • 8/12/2019 artificial intelligence ML u4

    56/100

    +igmoid function

    The Gaussian function is the probability function of the

    http://en.wikipedia.org/wiki/Image:Logistic-curve.png
  • 8/12/2019 artificial intelligence ML u4

    57/100

    The Gaussian function is the probability function of the

    normal distribution. +ometimes also called the fre*uency

    curve.

  • 8/12/2019 artificial intelligence ML u4

    58/100

    -rtificial

  • 8/12/2019 artificial intelligence ML u4

    59/100

  • 8/12/2019 artificial intelligence ML u4

    60/100

    Perceptron+ 7euron odel

  • 8/12/2019 artificial intelligence ML u4

    61/100

    Perceptron+ 7euron odel>"pecial form of single layer feed forward

    /he perceptron was first proposed by #osenblatt >@908 is asimple neuron that is used to classify its input into one of twocategories.

    - perceptron uses a step functionthat returns D@ ifweighted sum of its input P and 5@ otherwise

    x1

    x2

    xn

    w2

    w1

    wn

    b (bias)

    v y (v)

  • 8/12/2019 artificial intelligence ML u4

    62/100

  • 8/12/2019 artificial intelligence ML u4

    63/100

    Learning (rocess for (erceptron

    Initially assign random weights to inputs between 5P.0and DP.0

    /raining data is presented to perceptron and its output isobserved.

    If output is incorrect, the weights are ad1ustedaccordingly using following formula.wi wi D >a^ xi ^e, where YeZ is error produced

    and YaZ >5@

  • 8/12/2019 artificial intelligence ML u4

    64/100

    *xample+ (erceptron to learn :#

    function

    Initially consider w@ 5P.A and wA P.7 /raining data say, x@ P and xA P, output is P. 3ompute y "tep>w@^x@ D wA^xA P. :utput is correct

    so weights are not changed. 4or training data x@P and xA @, output is @

    3ompute y "tep>w@^x@ D wA^xA P.7 @. :utput iscorrect so weights are not changed.

  • 8/12/2019 artificial intelligence ML u4

    65/100

    (erceptron+ Limitations

    /he perceptron can only model linearly separablefunctions, those functions which can be drawn in A5dim graph and

    single straight line separates values in two part.

    'oolean functions given below are linearly

    separable+ -

  • 8/12/2019 artificial intelligence ML u4

    66/100

    H:#

  • 8/12/2019 artificial intelligence ML u4

    67/100

    These two classes (true and false) cannot be separated using a

    line. 9ence 6/1 is non linearly separable.

    6 62 6 6/1 62

    $ $ $

    $

    $

    $

    6

    true false

    false true$ 62

  • 8/12/2019 artificial intelligence ML u4

    68/100

    Multi layer feed5forward 44

  • 8/12/2019 artificial intelligence ML u4

    69/100

    44@,5@ and >5@,@.

    /he output node is used to combine the outputs of the two hidden

    nodes.

    Input nodes 9idden layer /utput layer /utput

    H 7$.4X

    7 Y

    7 92X2

  • 8/12/2019 artificial intelligence ML u4

    70/100

  • 8/12/2019 artificial intelligence ML u4

    71/100

    44

  • 8/12/2019 artificial intelligence ML u4

    72/100

    /raining -lgorithm+

    'ac%propagation

    /he 'ac%propagation algorithm learns in the same wayas single perceptron.

    It searches for weight values that minimi$e the totalerror of the networ% over the set of training examples>training set.

    'ac%propagation consists of the repeated application ofthe following two passes+ ;or5ard pass+ In this step, the networ% is activated on one

    example and the error of each neuron at the output layer iscomputed.

    ack5ard pass+ In this step the networ% error is used forupdating the weights. /he error is propagated bac%wards fromthe output layer through the networ% layer by layer. /his isdone by recursively computing the local gradient of eachneuron.

    < i

  • 8/12/2019 artificial intelligence ML u4

    73/100

    ac

  • 8/12/2019 artificial intelligence ML u4

    74/100

    3ontd..

    3onsider a networ% of three layers. Let us use i to represent nodes in input layer, 1 to

    represent nodes in hidden layer and % represent nodesin output layer.

    wi1 refers to weight of connection between a node ininput layer and node in hidden layer.

    /he following euation is used to derive the outputvalue J1 of node 1

    where, H1 xi . wi15 1 , @ i nQ n is the number of inputs to

    node 1, and 1is threshold for node 1

    jXe+=

    ="

  • 8/12/2019 artificial intelligence ML u4

    75/100

  • 8/12/2019 artificial intelligence ML u4

    76/100

    Weight =pdate #ule

    /he 'ac%prop weight update rule is based on thegradient descent method+

    It ta%es a step in the direction yielding the maximum

    decrease of the networ% error *.

    /his direction is the opposite of the gradient of *. Iteration of the 'ac%prop algorithm is usually

    terminated when the sum of suares of errors of the

    output values for all training data in an epoch is less

    than some threshold such as P.P@

    ijijij www +=i"

    i"w

    :w

    = #

    "t i it i

  • 8/12/2019 artificial intelligence ML u4

    77/100

    "topping criterions

    /otal mean suared error change+ 'ac%5prop is considered to have converged when the

    absolute rate of change in the average suared error per

    epoch is sufficiently small >in the range P.@, P.P@N.

    6enerali$ation based criterion+ -fter each epoch, the

  • 8/12/2019 artificial intelligence ML u4

    78/100

  • 8/12/2019 artificial intelligence ML u4

    79/100

    #adial 'asis 4unction #'4 ifits output depends on the distance of the input from agiven stored vector. /he #'4 neural networ% has an input layer, a hidden layer and

    an output layer.

    In such #'4 networ%s, the hidden layer uses neurons with#'4s as activation functions. /he outputs of all these hidden neurons are combined linearly

    at the output node.

    /hese networ%s have a wide variety of applicationssuch as function approximation, time series prediction, control and regression, pattern classification tas%s for performing complex >non5linear.

    #'4 -rchitecture

  • 8/12/2019 artificial intelligence ML u4

    80/100

    #'4 -rchitecture

    "ne (idden layer 5it( R; activation functions

    "utput layer 5it( linear activation function.

    x!

    x"

    x1

    y

    w"1

    w1

    m

    ... m

    @@)(@@...@@)(@@ mmm txwtxwy ++=

    txxxtx m centerfrom),...,(ofdistance@@@@ =

    3ont

  • 8/12/2019 artificial intelligence ML u4

    81/100

    3ont...

    !ere we reuire weights, ifrom the hidden layer to theoutput layer only.

    /he weights ican be determined with the help of anyof the standard iterative methods described earlier forneural networ%s.

    !owever, since the approximating function given belowis linear w. r. t. i, it can be directly calculated using thematrix methods of linear least suares without having toexplicitly determine iiteratively.

    It should be noted that the approximate function f(*) isdifferentiable with respect to i.

    )()(= ==

    N

    iiii tXwXfY

    Aomparison

  • 8/12/2019 artificial intelligence ML u4

    82/100

    RBF NN FF NN

    Non-linear layered feed-forwardnetwor*s.

    Non-linear layered feed-forwardnetwor*s

    Hidden "yer o# 456 is non-linear,the output "yer o# 456 is linear.

    Hidden "nd output "yers o#6677 "re usu"y non-linear.

    8ne singlehidden "yer M"y h"e morehidden "yers.

    7euron 'ode o# the hidden neuronsis different#ro' the one o# theoutput nodes.

    Hidden "nd output neuronssh"re " common neuron model.

    A!ti"tion #un!tion o# e"!h hiddenneuron in " 456 77 !o'putes theEuclidean distance %etween inpute!tor "nd the !enter o# th"t unit.

    A!ti"tion #un!tion o# e"!hhidden neuron in " 6677!o'putes the inner product o#input e!tor "nd the syn"pti!weight e!tor o# th"t neuron

    Aomparison

    ;; BE+IC; I++DE+

  • 8/12/2019 artificial intelligence ML u4

    83/100

    )ata representation

  • 8/12/2019 artificial intelligence ML u4

    84/100

    )ata representation depends on the problem. In general -

  • 8/12/2019 artificial intelligence ML u4

    85/100

    /he number of layers and neurons depend on thespecific tas%.

    In practice this issue is solved by trial and error.

    /wo types of adaptive algorithms can be used+

    start from a large networ% and successively remove some

    neurons and lin%s until networ% performance degrades.

    begin with a small networ% and introduce new neurons until

    performance is satisfactory.

    ;etwor< Topology

  • 8/12/2019 artificial intelligence ML u4

    86/100

    Initiali$ation of weights

  • 8/12/2019 artificial intelligence ML u4

    87/100

    Initiali$ation of weights

    In general, initial weights are randomly chosen, withtypical values between [email protected] and @.P or 5P.0 and P.0.

    If some inputs are much larger than others, random

    initiali$ation may bias the networ% to give much more

    importance to larger inputs. In such a case, weights can be initiali$ed as follows+

    =

    =Ni

    N,...,

    @x@

    2

    i"i

    wor weights from the input to the first layer

    or weights from the first to the second layer=

    =Ni

    Ni,...,)xw(

    2

    "

  • 8/12/2019 artificial intelligence ML u4

    88/100

    /he right value of depends on the

    application.

    Galues between P.@ and P.9 have beenused in many applications.

    :ther heuristics is that adapt during the

    training as described in previous slides.

    Ahoice of learning rate

    / i i

  • 8/12/2019 artificial intelligence ML u4

    89/100

    /raining

    #ule of thumb+ the number of training examples should be at least five to

    ten times the number of weights of the networ%.

    :ther rule+

    @@! number of weights

    a!expected accuracy on test seta):(

    @E@; >

    # t < t %

  • 8/12/2019 artificial intelligence ML u4

    90/100

    #ecurrent

  • 8/12/2019 artificial intelligence ML u4

    91/100

    #ecurrent

  • 8/12/2019 artificial intelligence ML u4

    92/100

    !opfield

  • 8/12/2019 artificial intelligence ML u4

    93/100

    !opfield

  • 8/12/2019 artificial intelligence ML u4

    94/100

    -ctivation -lgorithm

    -ctive unit represented by @ and inactive by P.

    +epeat 3hoose any unit randomly. /he chosen unit may be

    active or inactive.

    4or the chosen unit, compute the sum of the weightson the connection to the active neighbours only, if any. If sum O P >threshold is assumed to be P, then the

    chosen unit becomes active, otherwise it becomesinactive.

    If chosen unit has no active neighbours then ignore it,and status remains same.

    ntilthe networ% reaches to a stable state

  • 8/12/2019 artificial intelligence ML u4

    95/100

    Aurrent +tate +elected Dnit fromcurrent state

    Aorresponding ;ew +tate

    :2

    :2

    :2

    :2

    :2

    :2

    +um ! 7 2 ! F $Gactivated

    :2

    :2

    :2

    9ere, the sum of weights of

    active neighbours of aselected unit is calculated. :2

    :2

    :2

    :2

    :2

    :2

    +um ! 72 H $G deactivated

  • 8/12/2019 artificial intelligence ML u4

    96/100

    2

    72

    )

    6 ! I$ J

    2

    72

    )

    6 !I $J

    2

    72

    )

    6 !I$ $ $J

    +table ;etwor

  • 8/12/2019 artificial intelligence ML u4

    97/100

    *xample

  • 8/12/2019 artificial intelligence ML u4

    98/100

    *xample

    Let us now consider a !opfield networ% with four unitsand three training input vectors that are to be learned by

    the networ%.

    3onsider three input examples, namely, H@, HA, and H&

    defined as follows+

    7

    X! 7 X2! X!

    7 7

    7

    E ! 6. (6)T3 62. (62)

    T3 6. (6)

    T7 .I

    X! I 7 7 J : 2

  • 8/12/2019 artificial intelligence ML u4

    99/100

    7 7 $ $ $ $ 7 7 &! 7 7 . 7 $ $ $ ! 7 $ 7

    7 7 $ $ $ 7 $ 7

    7 7 $ $ $ 7 7 $

    : :

    : 5

    X! I: :J

    : 2

    : :

    : 5

    +table positions of the networwhich is athamming distance @.

    4inally, with the obtained weights and

    stable states >H@ and H&, we canstabili$e any new >partial pattern to one

    of those