lecture 1: perceptron - ut · • the perceptron learning procedure is still widely used today for...

19
Seminar in Deep Learning Lecture 1: Perceptron Alexander Tkachenko University of Tartu 16 September, 2014

Upload: others

Post on 22-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Seminar in Deep Learning

    Lecture 1: Perceptron

    Alexander Tkachenko

    University of Tartu

    16 September, 2014

  • Neuron model

    1 2 3 1 2 3

    1 1 2 2 3 3

    Let ( , , ) be input vector, ( , , ) be weights vector, and - bias term.

    First inputs are linearly aggregated: .

    Then the output is obtained as: ( ).

    x x x x w w w w b

    a x w x w x w b

    y y f a

    = =

    = + + +

    =

  • Binary Threshold Neuron for Classification

    1 if 0 ( )

    0 otherwise

    ay f a

    ≥= =

    1 2 3 1 2 3

    1 1 2 2 3 3

    Let ( , , ) be input vector, ( , , ) be weights vector, and - bias term.

    First inputs are linearly aggregated: .

    Then the output is obtained as: ( ).

    x x x x w w w w b

    a x w x w x w b

    y y f a

    = =

    = + + +

    =

  • Geometrical view

    1 if 0 1 if 0 ( ) ( )

    0 otherwise 0 otherwise

    Ta w x by f a y x

    ≥ + ≥= = ⇔ =

    0T

    w x b+ >

    0T

    w x b+ <

    0Tw x b+ =

  • Eliminating bias term

    1 2 3

    1 2 3

    ( , , )

    ( , , )

    T

    x x x x

    w w w w

    a w x b

    =

    =

    = +

    1 2 3

    0 1 2 3

    (1, , , )

    ( , , , )

    T

    x x x x

    w w w w w

    a w x

    =

    =

    =

  • Geometrical view

    1 if 0 1 if 0 ( ) ( )

    0 otherwise 0 otherwise

    Ta w xy f a y x

    ≥ ≥= = ⇔ =

    0Tw x > 0Tw x >

    0T

    w x <

    0Tw x =

  • Learning a hyperplane

    { }Given training data , , { 1,1} find such that ( ) 0 for all .Ti i i ix y y w y w x i∈ − >

  • Learning a hyperplane

    { }Given training data , , { 1,1} find such that ( ) 0 for all .Ti i i ix y y w y w x i∈ − >

    1, 1x +

  • Learning a hyperplane

    { }Given training data , , { 1,1} find such that ( ) 0 for all .Ti i i ix y y w y w x i∈ − >

    1,1x1, 1x +

    2 , 1x −

  • Learning a hyperplane

    { }Given training data , , { 1,1} find such that ( ) 0 for all .Ti i i ix y y w y w x i∈ − >

    • What if there is no solution?

    • If more than one solution w exists, then which one should we

    choose?

    • How to solve the problem in a simple and efficient way?

  • Perceptron algorithm

    initialize 0w =

    { }Given training data , , { 1,1} find such that ( ) 0 for all .Ti i i ix y y w y w x i∈ − >

    while any training example , is not classified correctly

    set :

    x y

    w w yx= +

  • Perceptron: online version

    { }Given an infinite stream of training examples ,

    where time is indexed 1,2,3..., this version is as follows:

    initialize 0

    i ix y

    t

    w

    =

    =initialize 0

    for 1, 2,3,...

    if ( ) 0 then set :t t t t

    w

    t

    y wx w w y x

    =

    =

    ≤ = +

  • Perceptron convergence theorem [Novikoff, 1962]

    R

    δ*

    w

  • Perceptron convergence theorem [Novikoff, 1962]

  • The history of perceptron

    • They were popularised by Frank Rosenblatt in the early 1960’s.– They appeared to have a very powerful learning algorithm.

    – Lots of grand claims were made for what they could learn to do.

    • In 1969, Minsky and Papert published a book called “Perceptrons” that analysed what they could do and showed their limitations.– Many people thought these limitations applied to all neural network models.– Many people thought these limitations applied to all neural network models.

    • The perceptron learning procedure is still widely used today for tasks with enormous feature vectors that contain many millions of features.

  • Perceptron Limitations

    Binary threshold neuron cannot learn XOR

  • Limitations of a single neuron network

    • Can a binary threshold

    unit discriminate between

    different patterns that different patterns that

    have the same number of

    on pixels?

    – Not if the patterns can

    translate with wrap-

    around!

  • Learning with hidden units

    • Networks without hidden units are very limited in the input-output mappings they can learn to model.– More layers of linear units do not help. Its still linear.

    – Fixed output non-linearities are not enough.

    • We need multiple layers of adaptive, non-linear hidden units. But how can we train such nets?

    • We need multiple layers of adaptive, non-linear hidden units. But how can we train such nets?– We need an efficient way of adapting all the weights, not just the last

    layer. This is hard.

    – Learning the weights going into hidden units is equivalent to learning features.

    – This is difficult because nobody is telling us directly what the hidden units should do.

  • Materials

    • Perceptron Classifiers, Charles Elkan, 2010

    http://www.cs.columbia.edu/~smaskey/CS6998-

    0412/supportmaterial/perceptron.pdf

    • Neural Networks for Machine Learning, Lecture 2, Geoffrey Hinton

    https://d396qusza40orc.cloudfront.net/neuralnets/lecture_slides/lec2.pdfhttps://d396qusza40orc.cloudfront.net/neuralnets/lecture_slides/lec2.pdf