mini-course on artificial neural networks and bayesian networks

114
Mini-course on Artificial Neural Networks and Bayesian Networks Michal Rosen-Zvi Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 דדדדד דדד דדדד דדדדדדדדדד דדדד דדדד דדדדדד דדדדד דדד דדדדד דדדדד דדדד

Upload: tasha-vang

Post on 31-Dec-2015

38 views

Category:

Documents


8 download

DESCRIPTION

תשס״ד בר־ אילן אוניברסיטת המוח לחקר ברשתות המרכז הרב תחומי מרוכז קורס. Mini-course on Artificial Neural Networks and Bayesian Networks. Michal Rosen-Zvi. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mini-course on Artificial Neural Networks and Bayesian Networks

Mini-course on Artificial Neural Networks and Bayesian Networks

Michal Rosen-Zvi

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 2: Mini-course on Artificial Neural Networks and Bayesian Networks

Section 1: Introduction

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 3: Mini-course on Artificial Neural Networks and Bayesian Networks

Networks (1)

Networks serve as a visual way for displaying relationships:

Social networks are examples of ‘flat’ networks where the only information is relation between entities

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 4: Mini-course on Artificial Neural Networks and Bayesian Networks

Example: collaboration network

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

1. Analyzing Cortical Activity using Hidden Markov Models

Itay Gat, Naftali Tishby, and Moshe Abeles"Network, Computation in Neural Systems", August 1997.2. Cortical Activity Flips Among Quasi Stationary StatesMoshe Abeles, Hagai Bergman, Itay Gat, Isaac Meilijson, Eyal Seidemann, Naftali Tishby, Eilon VaadiaPrepared: Feb 1, 1995, Appeared in the Proceedings of the National Academy of Science (PNAS)3. Rigorous Learning Curve Bounds from Statistical Mechanics

David Haussler, Michael Kearns, H. Sebastian Seung, and Naftali TishbyPrepared: July 1994. Full version, Machine Learning (1997).

4. H. S. Seung, Haim Sompolinsky, Naftali Tishby: Learning Curves in Large Neural Networks. COLT 1991: 112-127

5. Yann LeCun, Ido Kanter, Sara A. Solla: Second Order Properties of Error Surfaces. NIPS 1990: 918-924

6. Esther Levin, Naftali Tishby, Sara A. Solla: A Statistical Approach to Learning and Generalization in Layered Neural Networks. COLT 1989: 245-260

7. Litvak V, Sompolinsky H, Segev I, and Abeles M (2003) On the Transmission of Rate Code in Long Feedforward Networks with Excitatory-Inhibitory Balance. Journal of Neuroscience, 23(7):3006-30158. Senn, W., Segev, I., and Tsodyks, M. (1998). Reading neural synchrony with depressing synapses. Neural Computation 10: 815-819

8. Tsodkys, M., I.Mit'kov, H.Sompolinsky (1993): Pattern of synchrony in inhomogeneous networks of oscillators with pulse interactions. Phys. Rev. Lett.,

9. Memory Capacity of Balanced Networks (Yuval Aviel, David Horn and Moshe Abeles)10. The Role of Inhibition in an Associative Memory Model of the Olfactory Bulb.

(Ofer Hendin, David Horn and Misha Tsodyks)11 Information Bottleneck for Gaussian Variables

Gal Chechik, Amir Globerson, Naftali Tishby and Yair WeissPrepared: June 2003. Submitted to NIPS-2003

[matlab]

Page 5: Mini-course on Artificial Neural Networks and Bayesian Networks

Networks (2)

Artificial Neural Networks represent rules – deterministic relations - between input and output

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 6: Mini-course on Artificial Neural Networks and Bayesian Networks

Networks (3)

Bayesian Networks represent probabilistic relations - conditional independencies and dependencies between variables

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 7: Mini-course on Artificial Neural Networks and Bayesian Networks

Outline

Introduction/Motivation Artificial Neural Networks

– The Perceptron, multilayered FF NN and recurrent NN– On-line (supervised) learning– Unsupervised learning and PCA– Classification– Capacity of networks

Bayesian networks (BN)– Bayes rules and the BN semantics– Classification using Generative models

Applications: Vision, Text

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 8: Mini-course on Artificial Neural Networks and Bayesian Networks

Motivation

The research of ANNs is inspired by neurons in the brain and (partially) driven by the need for models of the reasoning in the brain.

Scientists are challenged to use machines more effectively for tasks traditionally solved by humans (example - driving a car, inferring scientific referees to papers and many others)

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 9: Mini-course on Artificial Neural Networks and Bayesian Networks

Questions

How can a network learn? What will be the learning rate? What are the limitations on the network capacity? How networks can be used to classify results with no

labels (unsupervised learning)? What are the relations and differences between

learning in ANN and learning in BN? How can network models explain high-level

reasoning?

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 10: Mini-course on Artificial Neural Networks and Bayesian Networks

History of (modern) ANNs and BNs

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

1940 1950 1960 1970 1980 1990 2000

McCulloch and Pitts Model

Hebbian Learning rule

Minsky and Papert’s book

Perceptron Hopfield Network

Pearl’s Book

Statistical Physics

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 11: Mini-course on Artificial Neural Networks and Bayesian Networks

Section 2: On-line Learning

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Based on slides from Michael Biehl’s summer course

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 12: Mini-course on Artificial Neural Networks and Bayesian Networks

Section 2.1: The Perceptron

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 13: Mini-course on Artificial Neural Networks and Bayesian Networks

The Perceptron

Input:

Adaptive Weights JJ

Output: SMini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 14: Mini-course on Artificial Neural Networks and Bayesian Networks

Perceptron: binary output

Implements a linearly separable classification of inputs

Milestones:Perceptron convergence theorem, Rosenblatt

(1958)Capacity, winder (1963) Cover(1965)Statistical Physics of perceptron weights,

Gardner (1988)

How does this device learn?

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

W

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 15: Mini-course on Artificial Neural Networks and Bayesian Networks

Learning a linearly separable rule from reliable examples

Unknown rule: ST()=sign(BB) =±1

Defines the correct classification.

Parameterized through a teacher perceptron with weights BBRN, (BBBB=1)

Only available information: example data

D= { , ST()=sign(BB) for =1…P }

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 16: Mini-course on Artificial Neural Networks and Bayesian Networks

Learning a linearly… (Cont.)

Training: finding the student weights JJ– J J parameterizes a hypothesis SS()=sign(JJ) – Supervised learning is based on the student

performance with respect to the training data DD– Binary error measure

T(JJ)= [S

S(),ST()]

T(JJ)=1 if S

S()ST()

T(WW)=0 if SS()=S

T()

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 17: Mini-course on Artificial Neural Networks and Bayesian Networks

Off-line learning

Guided by the minimization of a cost function H(JJ), e.g., the training error

H(JJ) tT(JJ)

Equilibrium statistical mechanics treatment:– Energy H of N degrees of freedm– Ensemble of systems is in thermal equilibrium at formal

temperature– Disorder avg. over random examples (replicas) assumes

distribution over the inputs– Macroscopic description, order parameters– Typical properties of large sustems, P= N

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 18: Mini-course on Artificial Neural Networks and Bayesian Networks

On-line training

Single presentation of uncorrelated (new) {,S

T()} Update of student weights:

Learning dynamics in discrete time

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 19: Mini-course on Artificial Neural Networks and Bayesian Networks

On-line training - Statistical Physics approach

Consider sequence of independent, random Thermodynamic limit Disorder average over latest example self-

averaging properties Continuous time limit

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 20: Mini-course on Artificial Neural Networks and Bayesian Networks

Generalization

Performance of the student (after training) with respect to arbitrary, new input

In practice: empirical mean of mean error measure over a set of test inputs

In the theoretical analysis: average over the (assumed) probability density of inputs

Generalization error:

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 21: Mini-course on Artificial Neural Networks and Bayesian Networks

Generalization (cont.)

The simplest model distribution:

Isotropic density P(), uncorrelated with B B and JJ

Consider vectors of independent identically distributed (iid) components jj with

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 22: Mini-course on Artificial Neural Networks and Bayesian Networks

Geometric argument

Projection of data into (BB, JJ)-plane yields isotropic density of inputs

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

BBJJ

ST()=SS()

g=/

For |BB|=1

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 23: Mini-course on Artificial Neural Networks and Bayesian Networks

Overlap Parameters

Sufficient to quantify the success of learning

R=BBJ J Q=JJJ J

Random guessing R=0, g=1/2

Perfect generalization , g=0

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 24: Mini-course on Artificial Neural Networks and Bayesian Networks

Derivation for large N

Given BB, JJ, and uncorrelated random input i=0, i j =ij, consider student/teacher fields that are sums of (many) independent random quantities:

x=JJ=∑iJiI

y=BB=∑iBii

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 25: Mini-course on Artificial Neural Networks and Bayesian Networks

Central Limit Theorem

Joint density of (x,y) is for N→∞, a two dimensional Gaussian, fully specified by the first and the second moments

x=∑iJii=0 y=∑iBii=0

x2 = ∑ijJiJjij = ∑iJi2 = Q

y2 = ∑ijBiBjij = ∑iBi2 = 1

xy = ∑ijJiBjij = ∑iJiBi = R

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 26: Mini-course on Artificial Neural Networks and Bayesian Networks

Central Limit Theorem (Cont.)

Details of the input are irrelevant.

Some possible examples: binary, i1, with equal prob. Uniform, Gaussian.

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 27: Mini-course on Artificial Neural Networks and Bayesian Networks

Generalization Error

The isotropic distribution is also assumed to describe the statistics of the example data inputs

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Exercise: Derive the generalization error as a function of R,Q use Mathematical notes

Page 28: Mini-course on Artificial Neural Networks and Bayesian Networks

Assumptions about the data

No spatial correlatins No distinguished directions in the input space No temporal correlations No correlations with the rule Single presentation without repeatitionsConsequences: Average over data can be performed step by step Actual choice of B B is irrelevant, it is not necessary to

averaged over the teacher

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 29: Mini-course on Artificial Neural Networks and Bayesian Networks

Hebbian learning (revisited) Hebb 1949

Off-line interpretation Vallet 1989

Choice of student weights given D={,ST}=1

P

JJ(P) = ∑ST/N

Equivalent On-line interpretation

Dynamics upon single presentation of examples

JJ() = JJ(-1) + ST/N

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 30: Mini-course on Artificial Neural Networks and Bayesian Networks

Hebb: on-line

From microscopic to macroscopic: recursions for overlaps

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Exercise: Derive the update equations of R,Q

Page 31: Mini-course on Artificial Neural Networks and Bayesian Networks

Hebb: on-line (Cont.)

Average over the latest example …

The random input, enters only through the fields

The random input and JJ(-1), BB are statistically independent

The Central Limit Theorems applies and obtains the joint density

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 32: Mini-course on Artificial Neural Networks and Bayesian Networks

Hebb: on-line (Cont.)

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Exercise: Derive the update equations of R,Q as a function of use Mathematical notes [off-line]

Page 33: Mini-course on Artificial Neural Networks and Bayesian Networks

Hebb: on-line (Cont.)

Continuous time limit, N→∞, = /N, d=1/N

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Initial conditions - tabula rasa R(0)=Q(0)=0

What are the mean values after training with N examples???

[See matlab code]

Page 34: Mini-course on Artificial Neural Networks and Bayesian Networks

Hebb: on-line mean values

The order parameters, Q and R, are self averagingself averaging for infinite N

Self average properties of A(JJ):– The observation of a value of A different from its mean

occurs with vanishing probability

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 35: Mini-course on Artificial Neural Networks and Bayesian Networks

Learning Curve: dependent of the order parameters

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Exercise: Solve the differential equations for R and Q

Exercise: Find the function ()

Page 36: Mini-course on Artificial Neural Networks and Bayesian Networks

Learning Curve: dependent of the order parameters

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

The normalized overlap between the two vectors, BB, J J provides the angle between the vectors two vectors

1cos1 g

Page 37: Mini-course on Artificial Neural Networks and Bayesian Networks

Learning Curve: dependent of the order parameters

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Exercise: Find asymptotic behavior of ()

Page 38: Mini-course on Artificial Neural Networks and Bayesian Networks

Asymptotic expansion [draw w. matlab]

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 39: Mini-course on Artificial Neural Networks and Bayesian Networks

Questions:

What are other learning algorithms that can be used for efficient learning?

What training algorithm will provide the best learning/ the fastest asymptotic decrease?

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 40: Mini-course on Artificial Neural Networks and Bayesian Networks

Modified Hebbian learning

The training algorithm is defined by a modulation function f

JJ() = JJ(-1) +f(…) ST/N

Restriction: f may depend on available quantities: f(JJ(-1),,S

T)

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 41: Mini-course on Artificial Neural Networks and Bayesian Networks

Perceptron Rosenblatt 1959

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

If classification is correct don’t change the weights.

If classification is incorrect

– if the right class for the

example is 1 JJ(). increases.– if right class for the example

is -1 JJ(). decreases

w

Page 42: Mini-course on Artificial Neural Networks and Bayesian Networks

Perceptron

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Only informative points are used (mistake driven) The solution is a linear combination of the training

points Converges only for linearly separable data

Exercise: Derive the update equations of ,Q as a function of , J,B and

Page 43: Mini-course on Artificial Neural Networks and Bayesian Networks

On-line dynamics Biehl and Riegler 1994

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Page 44: Mini-course on Artificial Neural Networks and Bayesian Networks

Questions:

Find the asymptotic behavior (by simulations and/or analytically) of the generalization error for the perceptron algorithm and Hebb algorithm, which one is better?

What training algorithm will provide the best learning/ the fastest asymptotic decrease?

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 45: Mini-course on Artificial Neural Networks and Bayesian Networks

Learning Curve - Hebb and Perceptron

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 46: Mini-course on Artificial Neural Networks and Bayesian Networks

Section 2.2: On-line by gradient descent

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 47: Mini-course on Artificial Neural Networks and Bayesian Networks

Introduction

Commonly used in practical applications:

Multilayered neural network with continuous activation functions, where output is a differentiable function of the adaptive parameters

Can be used for fitting a function to a data

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 48: Mini-course on Artificial Neural Networks and Bayesian Networks

Linear perceptron and linear regression (1D)

x=J

Using a quadratic loss function and gradient descent for finding the best curve to fit a data set [see ◘ , off-line]

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

y

Page 49: Mini-course on Artificial Neural Networks and Bayesian Networks

Simple case: ‘Linear perceptron’

Teacher: ST()=y=BBStudent: SS()=x=JJ

Training and performance evaluation are based on the quadratic error

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Consider the training dynamics

Exercise: Derive the update equations of R,Q as a function of

Page 50: Mini-course on Artificial Neural Networks and Bayesian Networks

‘Linear perceptron’ (cont.)

Some exercises: Write a matlab code for the linear perceptron,

teacher-student scenario. Show that Investigate the role of the learning rate Find the asymptotic decrease to zero errors

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 51: Mini-course on Artificial Neural Networks and Bayesian Networks

Adatron: binary output JJ() = JJ(-1) +f(…) ST/N

Some exercises: Write a matlab code for the linear perceptron,

teacher-student scenario. Find the asymptotic decrease to zero errors Compare with the performance of the

Perceptron and Hebb rule

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 52: Mini-course on Artificial Neural Networks and Bayesian Networks

Multilayered feed-forward NN

Example architecture: the soft-committee machine

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 53: Mini-course on Artificial Neural Networks and Bayesian Networks

Multilayered ff NN (cont.)

Transfer function: sigmoidal g(x)

e.g., g(x)=tanh(x) or g(x)=

Error function is defined:

The total output:

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 54: Mini-course on Artificial Neural Networks and Bayesian Networks

Teacher-Student scenario

If teacher and student have the same architecture but student has K hidden units and teacher as M hidden units,

Can the student learn the rules?

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

K < M

Unlearnable rule

K = M

Learnable rule Overlearnable rule

K > M

In the following we will discuss matching architectures

Page 55: Mini-course on Artificial Neural Networks and Bayesian Networks

The error measure

One (obvious) choice for continuous outputs

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 56: Mini-course on Artificial Neural Networks and Bayesian Networks

On-line gradient descent

Assuming the same learning rate, , over all the network, the update equations are for fixed known {v}

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 57: Mini-course on Artificial Neural Networks and Bayesian Networks

Assumptions and definitions

Isotropic uncorrelated input data The number of input components N is huge

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

The rule is specified by the norms of the teacher, say all 1s

Page 58: Mini-course on Artificial Neural Networks and Bayesian Networks

Order parameters role

The set of order parameters and weights is sufficient for describing the learning, this is the macroscopic set of parameters

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Microscopic: KN+K degrees of freedom

Macroscopic: K(K-1)/2+KM+K different order parameters

Page 59: Mini-course on Artificial Neural Networks and Bayesian Networks

Generalization error: erf function Saad & Solla 1995

Reflect symmetries of the soft committee machine

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 60: Mini-course on Artificial Neural Networks and Bayesian Networks

Permutation Symmetry

The generalization error is characterized by invariance under permutations of branches

How do you think this feature affects learning performance?

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 61: Mini-course on Artificial Neural Networks and Bayesian Networks

A simple case

Hidden to output weights are fixed and known wi=vi=1

The update rule is

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 62: Mini-course on Artificial Neural Networks and Bayesian Networks

Update of the order parameters

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 63: Mini-course on Artificial Neural Networks and Bayesian Networks

Differential Equations

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 64: Mini-course on Artificial Neural Networks and Bayesian Networks

Learning curves

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 65: Mini-course on Artificial Neural Networks and Bayesian Networks

Section 3: Unsupervised learning

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Based on slides from Michael Biehl’s summer course

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 66: Mini-course on Artificial Neural Networks and Bayesian Networks

Introduction

Learning without a teacher!?

Real world data is, in general, not isotropic and structure less in input space.

Unsupervised learning = extraction of information from unlabelled inputs

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 67: Mini-course on Artificial Neural Networks and Bayesian Networks

Potential aims

Correlation analysis Clustering of data – grouping according to

some similarity criterion Identification of prototypes – represent large

amount of data by few examples Dimension reduction – represent high

dimensional data by few relevant features

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 68: Mini-course on Artificial Neural Networks and Bayesian Networks

A simple example

Prototypes for high dimensional data – directions in the space

Assume data points are distributed as

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 69: Mini-course on Artificial Neural Networks and Bayesian Networks

A simple example (cont)

The student task is to find the directions BB11 and BB22 The data looks different in different planes!

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 70: Mini-course on Artificial Neural Networks and Bayesian Networks

Student scenario

Search for the two vectors, using a two student vectors:– Define set of possible learning rules– Analyze learning abilities– Compare and choose the best learning

It would provide the two principle principle componentscomponents of the data

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 71: Mini-course on Artificial Neural Networks and Bayesian Networks

PCA: General setting [matlab]

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

N

n

Tnn

N 1

))((1

xxxxΣ

Given a set of data points: X X

1. Compute the covariance matrix

2. Compute the eigenvalues and eigenvectors of the covariance matrix

3. Arrange the egienvalues from the biggest to the smallest. Take the first d eigenvectors as principle components if the input dimensionality is to be reduced to d.

4. Project the input data onto the principle components, which forms the representation of input data.

Page 72: Mini-course on Artificial Neural Networks and Bayesian Networks

Principle Component Analysis

Algebraic view point: Given data find a linear transformation such that the sum of squared distances is minimized over all linear transformations

Statistical view point: Given data assume that each point is a random variable sampled from a Gaussian with unit covariance and mean. Find the ML estimator of the means under the constraint that there are K different means that are linearly related to the data.

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 73: Mini-course on Artificial Neural Networks and Bayesian Networks

Example: vision

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 74: Mini-course on Artificial Neural Networks and Bayesian Networks

Example: vision (cont)

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Average results for each of the 6400 pixels

Page 75: Mini-course on Artificial Neural Networks and Bayesian Networks

First nine eigen faces

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 76: Mini-course on Artificial Neural Networks and Bayesian Networks

Dimensionality Reduction

The goal is to compress information with minimal loss

Methods: – Unsupervised learning

Principle Component Analysis

– Nonnegative Matrix Factorization Bayesian Models (Matrices are probabilities)

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 77: Mini-course on Artificial Neural Networks and Bayesian Networks

Section 4: Bayesian Networks

Some slides are from Baldi’s course on Neural Networks

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 78: Mini-course on Artificial Neural Networks and Bayesian Networks

Bayesian Statistics

Bayesian framework for induction: we start with hypothesis space and wish to express relative preferences in terms of background information (the Cox-Jaynes axioms).

Axiom 0: Transitivity of preferences. Theorem 1: Preferences can be represented by a real number π (A). Axiom 1: There exists a function f such that

π(non A) = f(π(A)) Axiom 2: There exists a function F such that

π (A,B) = F(π(A), π(B|A)) Theorem2: There is always a rescaling w such that p(A)=w(π(A)) is in

[0,1], and satisfies the sum and product rules.

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 79: Mini-course on Artificial Neural Networks and Bayesian Networks

Probability as Degree of Belief

Sum Rule:P(non A) = 1- P(A)

Product Rule:P(A and B) = P(A) P(B|A)

BayesTheorem:P(B|A)=P(A|B)P(B)/P(A)

Induction Form:P(M|D) = P(D|M)P(M)/P(D)

Equivalently:log[P(M|D)] = log[P(D|M)]+log[P(M)]-log[P(D)]

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 80: Mini-course on Artificial Neural Networks and Bayesian Networks

The Asia problem

“Shortness-of-breath (dyspnoea) may be due to Tuberculosis, Lung cancer or bronchitis, or none of them. A recent visit to Asia increases the chances of tuberculosis, while Smoking is known to be a risk factor for both lung cancer and Bronchitis. The results of a single chest X-ray do not discriminate between lung cancer and tuberculosis, as neither does the presence or absence of Dyspnoea.”

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Lauritzen & Spiegelhalter 1988

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 81: Mini-course on Artificial Neural Networks and Bayesian Networks

Graphical models

“Successful marriage between Probabilistic Theory and Graph Theory”

M. I. Jordan

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

P(x1,x2,x3) P(x1,x3) P(x2,x3)

P(x1,x2,x3) (x1,x3) (x2,x3)x1

x3x2Applications: Vision, Speech

Recognition, Error correcting codes, Bioinformatics

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 82: Mini-course on Artificial Neural Networks and Bayesian Networks

Directed acyclic Graphs

Involves conditional dependencies

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

x1

x3x2

P(x1,x2,x3) = P(x1)P(x2)P(x3|x1,x2)

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 83: Mini-course on Artificial Neural Networks and Bayesian Networks

Directed Graphical Models (2)

Each node is associated with a random variable

Each arrow is associated with conditional dependencies (Parents–child)

Shaded nodes illustrates an observed variable

Plates stand for repetitions of i.i.d. drawings of the random variables

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 84: Mini-course on Artificial Neural Networks and Bayesian Networks

Classification problem

This is problem is ‘unsupervised’ where one is searching for best labels that fit the data, and does not have any examples that contain labels

Perceptron and Support vector machines are widely used for classifications. These are discriminative methods

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 85: Mini-course on Artificial Neural Networks and Bayesian Networks

Classification: assigning labels to data

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Discrete Classifier, modeling the boundaries between different classes of the data

Prediction of Categorical output e.g., SVM

Density Estimator: modeling the distribution of the data points themselves

Generative Models e.g. NB

Page 86: Mini-course on Artificial Neural Networks and Bayesian Networks

Density estimator

The simplest model for density estimation is the Naïve Bayes classifier

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Assumes that each of the data points is distributed independently:

Results in a trivial learning algorithm

Usually does not suffer from overfitting

Page 87: Mini-course on Artificial Neural Networks and Bayesian Networks

Directed graph: ‘real world’ example

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

x

z

w

D

A

T

da

NdThe author topic model

Statistical modeling of data mining:

Huge corpus, authors and words are observed, topics and relations are learned.

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 88: Mini-course on Artificial Neural Networks and Bayesian Networks

Goal

Automatically extract topical content of documents and learn association of topics to authors of documents

Expand existing probabilistic topic models to include author information

Some queries that model should be able to answer: – What topics does author X work on?– Which authors work on topic X? – What are interesting temporal patterns in topics?

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 89: Mini-course on Artificial Neural Networks and Bayesian Networks

Previous topic-based models

Hoffman (1999): Probabilistic Latent Semantic Indexing (pLSI)

– EM implementation– Problem of overfitting

Latent Dirichlet Allocation (LDA) Blei, Ng, & Jordan (2003): Griffiths

& Steyvers, (PNAS 2004): – Clarified the pLSI model– Variational EM, Scalability?– Gibbs sampling technique for inference

Computationally simple, Efficient (linear with size of data), Can easily be applied to >100K documents

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 90: Mini-course on Artificial Neural Networks and Bayesian Networks

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 91: Mini-course on Artificial Neural Networks and Bayesian Networks

Classification

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 92: Mini-course on Artificial Neural Networks and Bayesian Networks

Topics Model for Semantic Representation

Based on a Professor Mark Steyver’s slides, a joint work of Mark Steyver’s (UCI) and Tom Griffiths (Stanford)

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 93: Mini-course on Artificial Neural Networks and Bayesian Networks

The DRM Paradigm

The Deese (1959), Roediger, and McDermott (1995) Paradigm:

Subjects hear a series of word lists during the study phase, each comprising semantically related items strongly related to another non-presented word (“false target”).

Subjects (later) receive recognition tests for all words plus other distracted words including the false target.

DRM experiments routinely demonstrate that subjects claim to recognize false tagets.

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 94: Mini-course on Artificial Neural Networks and Bayesian Networks

Example: test of false memory effects in the DRM Paradaigm

STUDY: Bed, Rest, Awake, Tired, Dream, Wake, Snooze, Blanket, Doze, Slumber, Snore, Nap, Peace, Yawn, Drowsy

FALSE RECALL: “Sleep” 61%

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 95: Mini-course on Artificial Neural Networks and Bayesian Networks

A Rational Analysis of Semantic Memory

Our associative/semantic memory system might arise from the need to efficiently predict word usage with just a few basis functions (i.e., “concepts” or “topics”)

The topics model provides such a rational analysis

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 96: Mini-course on Artificial Neural Networks and Bayesian Networks

A Spatial Representation: Latent Semantic Analysis (Landauer & Dumais, 1997)

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Document/Term count matrix

1

16

0

SCIENCE

6190RESEARCH

2012SOUL

3034LOVE

Doc3 … Doc2Doc1

High dimensional space

SOUL

RESEARCH

LOVE

SCIENCE

SVD

EACH WORD IS A SINGLE POINT IN A SEMANTIC SPACE

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 97: Mini-course on Artificial Neural Networks and Bayesian Networks

Triangle Inequality constraint on words with multiple meanings

Euclidian distance: AC AB + BC

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

FIELD MAGNETIC

SOCCER

AB

BC

AC

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 98: Mini-course on Artificial Neural Networks and Bayesian Networks

A generative model for topics

Each document (i.e. context)

is a mixture of topics.

Each topic is a distribution

over words.

Each word is chosen

from a single topic.

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

z

wN

D

T

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 99: Mini-course on Artificial Neural Networks and Bayesian Networks

A toy example

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

P( z = 1 ) P( z = 2 )

wi

P( w | z )SCIENTIFIC 0.4 KNOWLEDGE 0.2WORK 0.1RESEARCH 0.1MATHEMATICS 0.1MYSTERY 0.1

P( w | z )HEART 0.3 LOVE 0.2SOUL 0.2TEARS 0.1MYSTERY 0.1JOY 0.1

Words can occur in multiple topics

TOPIC MIXTURE

Page 100: Mini-course on Artificial Neural Networks and Bayesian Networks

All probability to topic 1…

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

P( z = 1 )=1 P( z = 2 )=0

wi

P( w | z )SCIENTIFIC 0.4 KNOWLEDGE 0.2WORK 0.1RESEARCH 0.1MATHEMATICS 0.1MYSTERY 0.1

P( w | z )HEART 0.3 LOVE 0.2SOUL 0.2TEARS 0.1MYSTERY 0.1JOY 0.1

One TOPIC

Document: HEART, LOVE, JOY, SOUL, HEART, ….

Page 101: Mini-course on Artificial Neural Networks and Bayesian Networks

All probability to topic 2…

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

P( z = 1 )=0 P( z = 2 )=1

wi

P( w | z )SCIENTIFIC 0.4 KNOWLEDGE 0.2WORK 0.1RESEARCH 0.1MATHEMATICS 0.1MYSTERY 0.1

P( w | z )HEART 0.3 LOVE 0.2SOUL 0.2TEARS 0.1MYSTERY 0.1JOY 0.1

Document: SCIENTIFIC, KNOWLEDGE, SCIENTIFIC, RESEARCH, ….

Page 102: Mini-course on Artificial Neural Networks and Bayesian Networks

Application to corpus data

TASA corpus: text from first grade to college– representative sample of text

26,000+ word types (stop words removed) 37,000+ documents 6,000,000+ word tokens

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 103: Mini-course on Artificial Neural Networks and Bayesian Networks

Fitting the model

Learning is unsupervised Learning means inverting the generative model

– We estimate P( z | w ) – assign each word in the corpus to one of T topics

– With T=500 topics and 6x106 words, the size of the discrete state space is (500)6,000,000 HELP!

– Efficient sampling approach Markov Chain Monte Carlo (MCMC)

– Time & Memory requirements linear with T and N

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 104: Mini-course on Artificial Neural Networks and Bayesian Networks

Gibbs Sampling & MCMCsee Griffiths & Steyvers, 2003 for details

Assign every word in corpus to one of T topics

Sampling distribution for z:

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

number of times word w assigned to topic j

number of times topic j used in document d

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 105: Mini-course on Artificial Neural Networks and Bayesian Networks

A selection from 500 topics [P(w|z = j)]

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

THEORYSCIENTISTSEXPERIMENTOBSERVATIONSSCIENTIFICEXPERIMENTSHYPOTHESISEXPLAINSCIENTISTOBSERVEDEXPLANATIONBASEDOBSERVATIONIDEAEVIDENCETHEORIESBELIEVEDDISCOVERED

SPACEEARTHMOONPLANETROCKETMARSORBITASTRONAUTSFIRSTSPACECRAFTJUPITERSATELLITESATELLITESATMOSPHERESPACESHIPSURFACESCIENTISTSASTRONAUT

ARTPAINTARTISTPAINTINGPAINTEDARTISTSMUSEUMWORKPAINTINGSSTYLEPICTURESWORKSOWNSCULPTUREPAINTERARTSBEAUTIFULDESIGNS

BRAINNERVESENSESENSESARENERVOUSNERVESBODYSMELLTASTETOUCHMESSAGESIMPULSESCORDORGANSSPINALFIBERSSENSORY

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 106: Mini-course on Artificial Neural Networks and Bayesian Networks

Polysemy: words with multiple meanings represented in different topics

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

FIELDMAGNETIC

MAGNETWIRE

NEEDLECURRENT

COILPOLESIRON

COMPASSLINESCORE

ELECTRICDIRECTION

FORCEMAGNETS

BEMAGNETISM

SCIENCESTUDY

SCIENTISTSSCIENTIFIC

KNOWLEDGEWORK

RESEARCHCHEMISTRY

TECHNOLOGYMANY

MATHEMATICSBIOLOGY

FIELDPHYSICS

LABORATORYSTUDIESWORLD

SCIENTIST

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELD

PLAYERBASKETBALL

COACHPLAYEDPLAYING

HITTENNISTEAMSGAMESSPORTS

JOBWORKJOBS

CAREEREXPERIENCE

EMPLOYMENTOPPORTUNITIES

WORKINGTRAINING

SKILLSCAREERS

POSITIONSFIND

POSITIONFIELD

OCCUPATIONSREQUIRE

OPPORTUNITY

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 107: Mini-course on Artificial Neural Networks and Bayesian Networks

Predicting word association

LSA: finds the closest word

Topics Model: do inference – given that one word was observed what will be the next word with the highest probability?

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 108: Mini-course on Artificial Neural Networks and Bayesian Networks

Word Association (norms from Nelson et al. 1998)

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Associate N. People:1 EARTH2 STARS3 SPACE4 SUN5 MARS

CUE: PLANET

Model STARS

SUN EARTH SPACE

SKY

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

First associate “EARTH” is in the set of 5 associates (from the model)

Page 109: Mini-course on Artificial Neural Networks and Bayesian Networks

P( set contains first associate )

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

100

101

102

103

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P(s

et

con

tain

s fir

st a

sso

cia

te)

Set size

LSATOPICS

Page 110: Mini-course on Artificial Neural Networks and Bayesian Networks

Explaining variability in false recall

One factor: mean associative strength of list items to critical item (Deese 1959; Roediger et al. 2001).

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

BEDREST

AWAKETIRED

…DROWSY

.638

.475

.618

.493

.551

For 55 DRM lists, R = .69 (with the given lexicon)

Mean = .431

SLEEP

Page 111: Mini-course on Artificial Neural Networks and Bayesian Networks

One recall component: inference

Encoding: study words lead to topics distribution (“gist”)

Retrieval: infer words from stored topics distribution

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

Page 112: Mini-course on Artificial Neural Networks and Bayesian Networks

Predictions for the “Sleep” list

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

0 0.05 0.1 0.15 0.2 0.25

SLEEP

RESTTIRED

BEDWAKE

AWAKENAP

DREAMYAWNDROWSYBLANKETSNORESLUMBERDOZEPEACE

SLEEPNIGHT

HOURSMORNING

ASLEEPSLEEPYAWAKENEDSPENT

STUDYLIST

EXTRALIST

(top 8)

studyrecallwP w|

Page 113: Mini-course on Artificial Neural Networks and Bayesian Networks

Correlation between intrusion rates and predictions

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

LSA

# Dimensions

0 200 400 600 800

Cor

rela

tion

0.2

0.3

0.4

0.5

0.6

0.7

0.8

TOPICS MODEL

# Topics

0 400 800 1200 1600 2000

Cor

rela

tion

0.2

0.3

0.4

0.5

0.6

0.7

0.8

(word association) (word association)

Page 114: Mini-course on Artificial Neural Networks and Bayesian Networks

Other recall components??? One possibility: two routes add strength

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

קורס מרוכז תחומי הרב ברשתות המרכז לחקר המוח אוניברסיטת בר־ אילן תשס״ד

0 0.05 0.1 0.15 0.2 0.25

SLEEP

RESTTIRED

BEDWAKE

AWAKENAP

DREAMYAWNDROWSYBLANKETSNORESLUMBERDOZEPEACE

SLEEPNIGHT

HOURSMORNING

ASLEEPSLEEPYAWAKENEDSPENT

STUDYLIST

EXTRALIST

(top 8)