bayesian models as a tool for revealing inductive biases tom griffiths university of california,...
Post on 22-Dec-2015
217 views
TRANSCRIPT
![Page 1: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/1.jpg)
Bayesian models as a tool for revealing inductive biases
Tom GriffithsUniversity of California, Berkeley
![Page 2: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/2.jpg)
Inductive problems
blicket toma
dax wug
blicket wug
S X Y
X {blicket,dax}
Y {toma, wug}
Learning languages from utterances
Learning functions from (x,y) pairs
Learning categories from instances of their members
![Page 3: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/3.jpg)
Revealing inductive biases
• Many problems in cognitive science can be formulated as problems of induction– learning languages, concepts, and causal relations
• Such problems are not solvable without bias(e.g., Goodman, 1955; Kearns & Vazirani, 1994; Vapnik, 1995)
• What biases guide human inductive inferences?
How can computational models be used to investigate human inductive biases?
![Page 4: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/4.jpg)
Models and inductive biases
• Transparent
![Page 5: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/5.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Reverend Thomas Bayes
Bayesian models
![Page 6: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/6.jpg)
Bayes’ theorem
€
P(h | d) =P(d | h)P(h)
P(d | ′ h )P( ′ h )′ h ∈H
∑
Posteriorprobability
Likelihood Priorprobability
Sum over space of hypothesesh: hypothesis
d: data
![Page 7: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/7.jpg)
Three advantages of Bayesian models
• Transparent identification of inductive biases through hypothesis space, prior, and likelihood
• Opportunity to explore a range of biases expressed in terms that are natural to the problem at hand
• Rational statistical inference provides an upper bound on human inferences from data
![Page 8: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/8.jpg)
Two examples
Causal induction from small samples(Josh Tenenbaum, David Sobel, Alison Gopnik)
Statistical learning and word segmentation(Sharon Goldwater, Mark Johnson)
![Page 9: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/9.jpg)
Two examples
Causal induction from small samples(Josh Tenenbaum, David Sobel, Alison Gopnik)
Statistical learning and word segmentation(Sharon Goldwater, Mark Johnson)
![Page 10: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/10.jpg)
Blicket detector (Dave Sobel, Alison Gopnik, and colleagues)
See this? It’s a blicket machine. Blickets make it go.
Let’s put this oneon the machine.
Oooh, it’s a blicket!
![Page 11: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/11.jpg)
– Two objects: A and B– Trial 1: A B on detector – detector active– Trial 2: B on detector – detector inactive– 4-year-olds judge whether each object is a blicket
• A: a blicket (100% say yes)
• B: almost certainly not a blicket (16% say yes)
“One cause” (Gopnik, Sobel, Schulz, & Glymour, 2001)
AB TrialB TrialA B A Trial
![Page 12: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/12.jpg)
Hypotheses: causal models
Defines probability distribution over variables(for both observation, and intervention)
E
BA
E
BA
E
BA
E
BA
(Pearl, 2000; Spirtes, Glymour, & Scheines, 1993)
![Page 13: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/13.jpg)
Prior and likelihood: causal theory
• Prior probability an object is a blicket is q– defines a distribution over causal models
• Detectors have a deterministic “activation law”– always activate if a blicket is on the detector– never activate otherwise
(Tenenbaum & Griffiths, 2003; Griffiths, 2005)
![Page 14: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/14.jpg)
Prior and likelihood: causal theory
P(E=1 | A=0, B=0): 0 0 0 0
P(E=0 | A=0, B=0): 1 1 1 1P(E=1 | A=1, B=0): 0 0 1 1P(E=0 | A=1, B=0): 1 1 0 0P(E=1 | A=0, B=1): 0 1 0 1P(E=0 | A=0, B=1): 1 0 1 0P(E=1 | A=1, B=1): 0 1 1 1P(E=0 | A=1, B=1): 1 0 0 0
E
BA
E
BA
E
BA
E
BA
P(h00) = (1 – q)2 P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2
![Page 15: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/15.jpg)
Modeling “one cause”
P(E=1 | A=0, B=0): 0 0 0 0
P(E=0 | A=0, B=0): 1 1 1 1P(E=1 | A=1, B=0): 0 0 1 1P(E=0 | A=1, B=0): 1 1 0 0P(E=1 | A=0, B=1): 0 1 0 1P(E=0 | A=0, B=1): 1 0 1 0P(E=1 | A=1, B=1): 0 1 1 1P(E=0 | A=1, B=1): 1 0 0 0
E
BA
E
BA
E
BA
E
BA
P(h00) = (1 – q)2 P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2
![Page 16: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/16.jpg)
Modeling “one cause”
P(E=1 | A=0, B=0): 0 0 0 0
P(E=0 | A=0, B=0): 1 1 1 1P(E=1 | A=1, B=0): 0 0 1 1P(E=0 | A=1, B=0): 1 1 0 0P(E=1 | A=0, B=1): 0 1 0 1P(E=0 | A=0, B=1): 1 0 1 0P(E=1 | A=1, B=1): 0 1 1 1P(E=0 | A=1, B=1): 1 0 0 0
E
BA
E
BA
E
BA
P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2
![Page 17: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/17.jpg)
Modeling “one cause”
P(E=1 | A=0, B=0): 0 0 0 0
P(E=0 | A=0, B=0): 1 1 1 1P(E=1 | A=1, B=0): 0 0 1 1P(E=0 | A=1, B=0): 1 1 0 0P(E=1 | A=0, B=1): 0 1 0 1P(E=0 | A=0, B=1): 1 0 1 0P(E=1 | A=1, B=1): 0 1 1 1P(E=0 | A=1, B=1): 1 0 0 0
E
BA
P(h10) = q(1 – q)
A is definitely a blicketB is definitely not a blicket
![Page 18: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/18.jpg)
– Two objects: A and B– Trial 1: A B on detector – detector active– Trial 2: B on detector – detector inactive– 4-year-olds judge whether each object is a blicket
• A: a blicket (100% say yes)
• B: almost certainly not a blicket (16% say yes)
“One cause” (Gopnik, Sobel, Schulz, & Glymour, 2001)
AB TrialB TrialA B A Trial
![Page 19: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/19.jpg)
Building on this analysis
• Transparent
![Page 20: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/20.jpg)
Other physical systems
From stick-ball machines…
…to lemur colonies
(Kushnir, Schulz, Gopnik, & Danks, 2003)(Griffiths, Baraff, & Tenenbaum, 2004)
(Griffiths & Tenenbaum, 2007)
![Page 21: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/21.jpg)
Two examples
Causal induction from small samples(Josh Tenenbaum, David Sobel, Alison Gopnik)
Statistical learning and word segmentation(Sharon Goldwater, Mark Johnson)
![Page 22: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/22.jpg)
Bayesian segmentation• In the domain of segmentation, we have:
– Data: unsegmented corpus (transcriptions).– Hypotheses: sequences of word tokens.
• Optimal solution is the segmentation with highest prior probability
= 1 if concatenating words forms corpus, = 0 otherwise.
Encodes assumptions about the structure of language
![Page 23: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/23.jpg)
Brent (1999)
• Describes a Bayesian unigram model for segmentation.– Prior favors solutions with fewer words, shorter words.
• Problems with Brent’s system:– Learning algorithm is approximate (non-optimal).– Difficult to extend to incorporate bigram info.
![Page 24: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/24.jpg)
A new unigram model (Dirichlet process)
Assume word wi is generated as follows:
1. Is wi a novel lexical item?
αα +
=n
yesP )(
α +=
n
nnoP )(
Fewer word types = Higher probability
![Page 25: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/25.jpg)
A new unigram model (Dirichlet process)
Assume word wi is generated as follows:
2. If novel, generate phonemic form x1…xm :
If not, choose lexical identity of wi from previously occurring words:
∏=
==m
iimi xPxxwP
11 )()...(
n
lcountlwP i
)()( ==
Shorter words = Higher probability
Power law = Higher probability
![Page 26: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/26.jpg)
Unigram model: simulations
• Same corpus as Brent (Bernstein-Ratner, 1987):– 9790 utterances of phonemically transcribed
child-directed speech (19-23 months).– Average utterance length: 3.4 words.– Average word length: 2.9 phonemes.
• Example input:yuwanttusiD6bUklUkD*z6b7wIThIzh&t&nd6dOgiyuwanttulUk&tDIs...
![Page 27: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/27.jpg)
Example results
![Page 28: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/28.jpg)
What happened?
• Model assumes (falsely) that words have the same probability regardless of context.
• Positing amalgams allows the model to capture word-to-word dependencies.
P(D&t) = .024 P(D&t|WAts) = .46 P(D&t|tu) = .0019
![Page 29: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/29.jpg)
What about other unigram models?
• Brent’s learning algorithm is insufficient to identify the optimal segmentation.– Our solution has higher probability under his
model than his own solution does.– On randomly permuted corpus, our system
achieves 96% accuracy; Brent gets 81%.
• Formal analysis shows undersegmentation is the optimal solution for any (reasonable) unigram model.
![Page 30: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/30.jpg)
Bigram model (hierachical Dirichlet process)
Assume word wi is generated as follows:1. Is (wi-1,wi) a novel bigram?
2. If novel, generate wi using unigram model (almost).
If not, choose lexical identity of wi from words previously occurring after wi-1.
ββ
+=
−1
)(iwn
yesPβ +
=−
−
1
1)(i
i
w
w
n
nnoP
)'(
),'()'|( 1 lcount
llcountlwlwP ii === −
![Page 31: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/31.jpg)
Example results
![Page 32: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/32.jpg)
Conclusions
• Both adults and children are sensitive to the nature of mechanisms in using covariation
• Both adults and children can use covariation to make inferences about the nature of mechanisms
• Bayesian inference provides a formal framework for understanding how statistics and knowledge interact in making these inferences– how theories constrain hypotheses, and are learned
![Page 33: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/33.jpg)
![Page 34: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/34.jpg)
A probabilistic mechanism?
• Children in Gopnik et al. (2001) who said that B was a blicket had seen evidence that the detector was probabilistic– one block activated detector 5/6 times
• Replace the deterministic “activation law”…– activate with p = 1- if a blicket is on the detector– never activate otherwise
![Page 35: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/35.jpg)
Deterministic vs. probabilisticP
rob a
b ili
t y o
f be
ing
a b l
i ck e
t
One cause
Deterministic
Probabilistic
mechanism knowledge affects intepretation of contingency data
![Page 36: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/36.jpg)
At end of the test phase, adults judge the probability that each object is a blicket
AB Trial B TrialBA
I. Familiarization phase: Establish nature of mechanism
II. Test phase: one cause
Manipulating mechanisms
same block
![Page 37: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/37.jpg)
Pro
b ab i
lit y
of
bein
g a
b li c
k et
One cause
BayesPeople
Deterministic
Probabilistic
Manipulating mechanisms(n = 12 undergraduates per condition)
![Page 38: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/38.jpg)
Pro
b ab i
lit y
of
bein
g a
b li c
k et
One cause One control Three control
Deterministic
Probabilistic
BayesPeople
Manipulating mechanisms (n = 12 undergraduates per condition)
![Page 39: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/39.jpg)
At end of the test phase, adults judge the probability that each object is a blicket
AB Trial B TrialBA
I. Familiarization phase: Establish nature of mechanism
II. Test phase: one cause
Acquiring mechanism knowledge
same block
![Page 40: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/40.jpg)
Results with children
• Tested 24 four-year-olds (mean age 54 months)• Instead of rating, yes or no response• Significant difference in one cause B responses
– deterministic: 8% say yes– probabilistic: 79% say yes
• No significant difference in one control trials– deterministic: 4% say yes– probabilistic: 21% say yes
(Griffiths & Sobel, submitted)
![Page 41: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/41.jpg)
![Page 42: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/42.jpg)
Comparison to previous results
• Proposed boundaries are more accurate than Brent’s, but fewer proposals are made.
• Result: word tokens are less accurate.
Boundary Precision
Boundary Recall
Brent .80 .85
GGJ .92 .62
Token F-score
Brent .68
GGJ .54
Precision: #correct / #found = [= hits / (hits + false alarms)]
Recall: #found / #true = [= hits / (hits + misses)]
F-score: an average of precision and recall.
![Page 43: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/43.jpg)
Quantitative evaluation
• Compared to unigram model, more boundaries are proposed, with no loss in accuracy:
• Accuracy is higher than previous models:
Boundary Precision
Boundary Recall
GGJ (unigram) .92 .62
GGJ (bigram) .92 .84
Token F-score Type F-score
Brent (unigram) .68 .52
GGJ (bigram) .77 .63
![Page 44: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley](https://reader038.vdocuments.site/reader038/viewer/2022110323/56649d805503460f94a64cb4/html5/thumbnails/44.jpg)
Two examples
Causal induction from small samples(Josh Tenenbaum, David Sobel, Alison Gopnik)
Statistical learning and word segmentation(Sharon Goldwater, Mark Johnson)