cmsc 671 fall 2001

24
1 CMSC 671 CMSC 671 Fall 2001 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29

Upload: constance-wilkerson

Post on 31-Dec-2015

27 views

Category:

Documents


0 download

DESCRIPTION

CMSC 671 Fall 2001. Class #25-26 – Tuesday, November 27 / Thursday, November 29. Today’s class. Neural networks Bayesian learning. Machine Learning: Neural and Bayesian. Chapter 19. Some material adapted from lecture notes by Lise Getoor and Ron Parr. Neural function. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CMSC 671 Fall 2001

1

CMSC 671CMSC 671Fall 2001Fall 2001

Class #25-26 – Tuesday, November 27 / Thursday, November 29

Page 2: CMSC 671 Fall 2001

2

Today’s class

• Neural networks• Bayesian learning

Page 3: CMSC 671 Fall 2001

3

Machine Learning: Neural and Bayesian

Chapter 19

Some material adapted from lecture notes by Lise Getoor and Ron Parr

Page 4: CMSC 671 Fall 2001

4

Neural function• Brain function (thought) occurs as the result of the firing of

neurons• Neurons connect to each other through synapses, which

propagate action potential (electrical impulses) by releasing neurotransmitters

• Synapses can be excitatory (potential-increasing) or inhibitory (potential-decreasing), and have varying activation thresholds

• Learning occurs as a result of the synapses’ plasticicity: They exhibit long-term changes in connection strength

• There are about 1011 neurons and about 1014 synapses in the human brain

Page 5: CMSC 671 Fall 2001

5

Biology of a neuron

Page 6: CMSC 671 Fall 2001

6

Brain structure• Different areas of the brain have different functions

– Some areas seem to have the same function in all humans (e.g., Broca’s region); the overall layout is generally consistent

– Some areas are more plastic, and vary in their function; also, the lower-level structure and function vary greatly

• We don’t know how different functions are “assigned” or acquired– Partly the result of the physical layout / connection to inputs (sensors) and

outputs (effectors)

– Partly the result of experience (learning)

• We really don’t understand how this neural structure leads to what we perceive as “consciousness” or “thought”

• Our neural networks are not nearly as complex or intricate as the actual brain structure

Page 7: CMSC 671 Fall 2001

7

Comparison of computing power

• Computers are way faster than neurons…

• But there are a lot more neurons than we can reasonably model in modern digital computers, and they all fire in parallel

• Neural networks are designed to be massively parallel

• The brain is effectively a billion times faster

Page 8: CMSC 671 Fall 2001

8

Neural networks

• Neural networks are made up of nodes or units, connected by links

• Each link has an associated weight and activation level

• Each node has an input function (typically summing over weighted inputs), an activation function, and an output

Page 9: CMSC 671 Fall 2001

9

Layered feed-forward network

Output units

Hidden units

Input units

Page 10: CMSC 671 Fall 2001

10

Neural unit

Page 11: CMSC 671 Fall 2001

11

“Executing” neural networks

• Input units are set by some exterior function (think of these as sensors), which causes their output links to be activated at the specified level

• Working forward through the network, the input function of each unit is applied to compute the input value– Usually this is just the weighted sum of the activation on the links

feeding into this node

• The activation function transforms this input function into a final value– Typically this is a nonlinear function, often a sigmoid function

corresponding to the “threshold” of that node

Page 12: CMSC 671 Fall 2001

12

Learning neural networks

• Backpropagation

• Cascade correlation: adding hidden units

Page 13: CMSC 671 Fall 2001

13

Learning Bayesian networks

• Given training set

• Find B that best matches D– model selection

– parameter estimation

]}[],...,1[{ MxxD

Data D

InducerInducerInducerInducer

C

A

EB

][][][][

]1[]1[]1[]1[

MCMAMBME

CABE

Page 14: CMSC 671 Fall 2001

14

Parameter estimation• Assume known structure

• Goal: estimate BN parameters – entries in local probability models, P(X | Parents(X))

• A parameterization is good if it is likely to generate the observed data:

• Maximum Likelihood Estimation (MLE) Principle: Choose so as to maximize L

m

mxPDPDL )|][()|():(

i.i.d. samples

Page 15: CMSC 671 Fall 2001

15

Parameter estimation in BNs• The likelihood decomposes according to the structure of

the network→ we get a separate estimation task for each parameter

• The MLE (maximum likelihood estimate) solution:– for each value x of a node X

– and each instantiation u of Parents(X)

– Just need to collect the counts for every combination of parents and children observed in the data

– MLE is equivalent to an assumption of a uniform prior over parameter values

)(

),(*| uN

uxNux sufficient statistics

Page 16: CMSC 671 Fall 2001

16

Sufficient statistics: Example

• Why are the counts sufficient?

Earthquake Burglary

Alarm

Moon-phase

Light-level

Page 17: CMSC 671 Fall 2001

17

Model selection

Goal: Select the best network structure, given the data

Input:

– Training data

– Scoring function

Output:

– A network that maximizes the score

Page 18: CMSC 671 Fall 2001

18

Structure selection: Scoring• Bayesian: prior over parameters and structure

– get balance between model complexity and fit to data as a byproduct

• Score (G:D) = log P(G|D) log [P(D|G) P(G)]

• Marginal likelihood just comes from our parameter estimates

• Prior on structure can be any measure we want; typically a function of the network complexity

Same key property: Decomposability

Score(structure) = i Score(family of Xi)

Marginal likelihoodPrior

Page 19: CMSC 671 Fall 2001

19

Heuristic search

B E

A

C

B E

A

C

B E

A

C

B E

A

C

Δscore(C)Add EC

Δscore(A)

Delete EAΔscore(A)

Reverse EA

Page 20: CMSC 671 Fall 2001

20

Exploiting decomposability

B E

A

C

B E

A

C

B E

A

C

Δscore(C)Add EC

Δscore(A)

Delete EAΔscore(A)

Reverse EA

B E

A

C

Δscore(A)

Delete EA

To recompute scores, only need to re-score familiesthat changed in the last move

Page 21: CMSC 671 Fall 2001

21

Variations on a theme

• Known structure, fully observable: only need to do parameter estimation

• Unknown structure, fully observable: do heuristic search through structure space, then parameter estimation

• Known structure, missing values: use expectation maximization (EM) to estimate parameters

• Known structure, hidden variables: apply adaptive probabilistic network (APN) techniques

• Unknown structure, hidden variables: too hard to solve!

Page 22: CMSC 671 Fall 2001

22

Handling missing data

• Suppose that in some cases, we observe earthquake, alarm, light-level, and moon-phase, but not burglary

• Should we throw that data away??

• Idea: Guess the missing valuesbased on the other data

Earthquake Burglary

Alarm

Moon-phase

Light-level

Page 23: CMSC 671 Fall 2001

23

EM (expectation maximization)

• Guess probabilities for nodes with missing values (e.g., based on other observations)

• Compute the probability distribution over the missing values, given our guess

• Update the probabilities based on the guessed values

• Repeat until convergence

Page 24: CMSC 671 Fall 2001

24

EM example

• Suppose we have observed Earthquake and Alarm but not Burglary for an observation on November 27

• We estimate the CPTs based on the rest of the data

• We then estimate P(Burglary) for November 27 from those CPTs

• Now we recompute the CPTs as if that estimated value had been observed

• Repeat until convergence! Earthquake Burglary

Alarm