wed june 12
DESCRIPTION
Wed June 12. Goals of today’s lecture. Learning Mechanisms Where is AI and where is it going? What to look for in the future? Status of Turing test? Material and guidance for exam. Discuss any outstanding problems on last assignment. Automated Learning Techniques. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/1.jpg)
Wed June 12
• Goals of today’s lecture.– Learning Mechanisms
– Where is AI and where is it going? What to look for in the future? Status of Turing test?
– Material and guidance for exam.
– Discuss any outstanding problems on last assignment.
![Page 2: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/2.jpg)
Automated Learning Techniques
• ID3 : A technique for automatically developing a good decision tree based on given classification of examples and counter-examples.
![Page 3: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/3.jpg)
Automated Learning Techniques
• Algorithm W (Winston): an algorithm that develops a “concept” based on examples and counter-examples.
![Page 4: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/4.jpg)
Automated Learning Techniques
• Perceptron: an algorithm that develops a classification based on examples and counter-examples.
• Non-linearly separable techniques (neural networks, support vector machines).
![Page 5: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/5.jpg)
Perceptrons
Learning in Neural Networks
![Page 6: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/6.jpg)
Natural versus Artificial Neuron
• Natural Neuron McCullough Pitts Neuron
![Page 7: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/7.jpg)
One NeuronMcCullough-Pitts
• This is very complicated. But abstracting the details,we have
w1
w2
wn
x1
x2
xn
hresholdntegrate
Integrate-and-fire Neuron
![Page 8: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/8.jpg)
•Pattern Identification
•(Note: Neuron is trained)
•weights
field. receptive in the is letter The Axw ii
Perceptron
![Page 9: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/9.jpg)
Three Main Issues
• Representability
• Learnability
• Generalizability
![Page 10: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/10.jpg)
One Neuron(Perceptron)
• What can be represented by one neuron?
• Is there an automatic way to learn a function by examples?
![Page 11: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/11.jpg)
•weights
field receptivein threshold Axw ii
Feed Forward Network
•weights
![Page 12: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/12.jpg)
Representability
• What functions can be represented by a network of McCullough-Pitts neurons?
• Theorem: Every logic function of an arbitrary number of variables can be represented by a three level network of neurons.
![Page 13: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/13.jpg)
Proof
• Show simple functions: and, or, not, implies
• Recall representability of logic functions by DNF form.
![Page 14: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/14.jpg)
Perceptron
• What is representable? Linearly Separable Sets.
• Example: AND, OR function
• Not representable: XOR
• High Dimensions: How to tell?
• Question: Convex? Connected?
![Page 15: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/15.jpg)
AND
![Page 16: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/16.jpg)
OR
![Page 17: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/17.jpg)
XOR
![Page 18: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/18.jpg)
Convexity: Representable by simple extension of perceptron
• Clue: A body is convex if whenever you have two points inside; any third point between them is inside.
• So just take perceptron where you have an input for each triple of points
![Page 19: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/19.jpg)
Connectedness: Not Representable
![Page 20: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/20.jpg)
Representability
• Perceptron: Only Linearly Separable– AND versus XOR– Convex versus Connected
• Many linked neurons: universal– Proof: Show And, Or , Not, Representable
• Then apply DNF representation theorem
![Page 21: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/21.jpg)
Learnability
• Perceptron Convergence Theorem:– If representable, then perceptron algorithm
converges– Proof (from slides)
• Multi-Neurons Networks: Good heuristic learning techniques
![Page 22: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/22.jpg)
Generalizability
• Typically train a perceptron on a sample set of examples and counter-examples
• Use it on general class• Training can be slow; but execution is fast.
• Main question: How does training on training set carry over to general class? (Not simple)
![Page 23: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/23.jpg)
Programming: Just find the weights!
• AUTOMATIC PROGRAMMING (or learning)
• One Neuron: Perceptron or Adaline
• Multi-Level: Gradient Descent on Continuous Neuron (Sigmoid instead of step function).
![Page 24: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/24.jpg)
Perceptron Convergence Theorem
• If there exists a perceptron then the perceptron learning algorithm will find it in finite time.
• That is IF there is a set of weights and threshold which correctly classifies a class of examples and counter-examples then one such set of weights can be found by the algorithm.
![Page 25: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/25.jpg)
Perceptron Training Rule
• Loop: Take an positive example or negative example. Apply to network. – If correct answer, Go to loop.
– If incorrect, Go to FIX.
• FIX: Adjust network weights by input example– If positive example Wnew = Wold + X; increase threshold
– If negative example Wnew = Wold - X; decrease threshold
• Go to Loop.
![Page 26: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/26.jpg)
Perceptron Conv Theorem (again)
• Preliminary: Note we can simplify proof without loss of generality– use only positive examples (replace example
X by –X)– assume threshold is 0 (go up in dimension by
encoding X by (X, 1).
![Page 27: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/27.jpg)
Perceptron Training Rule (simplified)
• Loop: Take a positive example. Apply to network. – If correct answer, Go to loop. – If incorrect, Go to FIX.
• FIX: Adjust network weights by input example– If positive example Wnew = Wold + X
• Go to Loop.
![Page 28: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/28.jpg)
Proof of Conv Theorem• Note:
1. By hypothesis, there is a such that V*X > for all x in F 1. Can eliminate threshold (add additional dimension to input) W(x,y,z) > threshold if and only
if W* (x,y,z,1) > 0
2. Can assume all examples are positive ones (Replace negative examples by their negated vectors) W(x,y,z) <0 if and only if W(-x,-y,-z) > 0.
![Page 29: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/29.jpg)
Perceptron Conv. Thm.(ready for proof)
• Let F be a set of unit length vectors. If there is a (unit) vector V* and a value >0 such that V*X > for all X in F then the perceptron program goes to FIX only a finite number of times (regardless of the order of choice of vectors X).
• Note: If F is finite set, then automatically there is such an
![Page 30: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/30.jpg)
Proof (cont).
• Consider quotient V*W/|V*||W|.
(note: this is cosine between V* and W.)
Recall V* is unit vector .
= V*W*/|W|
Quotient <= 1.
![Page 31: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/31.jpg)
Proof(cont)
• Consider the numerator
Now each time FIX is visited W changes via ADD.
V* W(n+1) = V*(W(n) + X)
= V* W(n) + V*X
> V* W(n) + Hence after n iterations:
V* W(n) > n
![Page 32: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/32.jpg)
Proof (cont)
• Now consider denominator:• |W(n+1)|2 = W(n+1)W(n+1) =
( W(n) + X)(W(n) + X) =
|W(n)|**2 + 2W(n)X + 1 (recall |X| = 1)
< |W(n)|**2 + 1 (in Fix because W(n)X < 0)
So after n times
|W(n+1)|2 < n (**)
![Page 33: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/33.jpg)
Proof (cont)
• Putting (*) and (**) together:
Quotient = V*W/|W| > n sqrt(n) = sqrt(n)
Since Quotient <=1 this means n < 1/This means we enter FIX a bounded number of times. Q.E.D.
![Page 34: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/34.jpg)
Geometric Proof
• See hand slides.
![Page 35: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/35.jpg)
Additional Facts
• Note: If X’s presented in systematic way, then solution W always found.
• Note: Not necessarily same as V*• Note: If F not finite, may not obtain
solution in finite time• Can modify algorithm in minor ways and
stays valid (e.g. not unit but bounded examples); changes in W(n).
![Page 36: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/36.jpg)
Percentage of Boolean Functions Representable by a
Perceptron
• Input Perceptrons Functions
1 4 42 16 143 104 2564 1,882 65,5365 94,572 10**96 15,028,134 10**19
7 8,378,070,864 10**388 17,561,539,552,946 10**77
![Page 37: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/37.jpg)
What wont work?
• Example: Connectedness with bounded diameter perceptron.
• Compare with Convex with
(use sensors of order three).
![Page 38: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/38.jpg)
What wont work?
• Try XOR.
![Page 39: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/39.jpg)
What about non-linear separableproblems?
• Find “near separable solutions”
• Use transformation of data to space where they are separable (SVM approach)
• Use multi-level neurons
![Page 40: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/40.jpg)
Multi-Level Neurons
• Difficulty to find global learning algorithm like perceptron
• But …– It turns out that methods related to gradient
descent on multi-parameter weights often give good results. This is what you see commercially now.
![Page 41: Wed June 12](https://reader035.vdocuments.site/reader035/viewer/2022062309/568152b4550346895dc0d663/html5/thumbnails/41.jpg)
Applications
• Detectors (e. g. medical monitors)
• Noise filters (e.g. hearing aids)
• Future Predictors (e.g. stock markets; also adaptive pde solvers)
• Learn to steer a car!
• Many, many others …