Download - Bayesian Network Classifier
BAYESIAN NETWORK CLASSIFIERNot so naïve any more
orBringing causality into the
equation
Bayes Network 2
Review Before covering Bayesian Belief
Networks…
8/29/03
A little review of the Naïve Bayesian
Classifier
Bayes Network 3
Approach Looks at the probability that an instance
belongs to each class given its value in each of its dimensions
8/29/03
P P P P P P P P P P P PP P P P P P P P P P P PP P P P P P P P P P P P
P P P P P P P P P P P P
Bayes Network 4
0 2 4 6 8
0.00
000.00
050.00
100.00
150.00
20
Distribution of Redness Values
Redness
Den
sity
Fruit
ApplesPeachesOrangesLemons
Example: Redness If one of the dimensions was “redness”
For a given redness value which is the most probable fruit
8/29/03
Bayes Network 5
Bayes Theorem
8/29/03
Above from the book h is hypothesis, D is training Data
𝑃 (h|𝐷 )=𝑃 (𝐷|h ) 𝑃 (h)
𝑃 (𝐷)
𝑃 (𝑎𝑝𝑝𝑙𝑒|𝑟𝑒𝑑𝑛𝑒𝑠𝑠=4.05 )=𝑃 (𝑟𝑒𝑑𝑛𝑒𝑠𝑠=4.05|𝑎𝑝𝑝𝑙𝑒 )𝑃 (𝐴𝑝𝑝𝑙𝑒)
𝑃 (𝑅𝑒𝑑𝑛𝑒𝑠𝑠=4.05)
Bayes Network 6
Redness of Apples and Oranges
Redness
1 2 3 4 5 6 7
050
100
150
200
250
If Non-Parametric… 2506 apples 2486 oranges Probability that
redness would be 4.05 if know an apple About 10/2506
P(apple)? 2506/(2506+2486)
P(redness=4.05) About
(10+25)/(2506+2486)
8/29/03
𝑃 (𝑎𝑝𝑝𝑙𝑒|𝑟𝑒𝑑𝑛𝑒𝑠𝑠=4.05 )=𝑃 (𝑟𝑒𝑑𝑛𝑒𝑠𝑠=4.05|𝑎𝑝𝑝𝑙𝑒 )𝑃 (𝐴𝑝𝑝𝑙𝑒)
𝑃 (𝑅𝑒𝑑𝑛𝑒𝑠𝑠=4.05)
10(10+25)
=
102506 ∙
2506(2506+2486)
(10+25)(2506+2486)
?
Bayes Network 7
Bayes
I think of the ratio of P(h) to P(D) as an adjustment to the easily determined P(D|h) in order to account for differences in sample size
8/29/03
𝑃 (h|𝐷 )=𝑃 (𝐷|h ) 𝑃 (h)
𝑃 (𝐷)
Prior Probabilities or Priors
Posterior Probability
Bayes Network 8
Naïve Bayes Classifier The Naïve term comes from…
Where vj is class and ai is an attribute Derivation
8/29/03
𝑣𝑁𝐵=argmax𝑣 𝑗∈𝑉
𝑃 (𝑣 𝑗)∏𝑖𝑃 (𝑎𝑖∨𝑣 𝑗)
Bayes Network 9
Can remove the naïveness Go with covariance matrix instead of
standard deviation
8/29/03
𝑓 (�⃗� )= 1(2𝜋)𝑑 /2|Σ|1 /2
exp (− 12 (�⃗�−�⃗� )𝑇 Σ−1( �⃗�− �⃗�))
𝑓 (𝑥 )=𝑒− (𝑥−𝜇 )2 /2𝜎2
𝜎 √2𝜋
Bayes Network 108/29/03
Solution
Red Yellow Mass Volapples
0 235 106 3
peaches
0 262 176 57
oranges
9 263 143 7
lemons
22 239 239 184
Total 31 999 664 251apples
0 0.24 0.16 0.01 0
peaches
0 0.26 0.27 0.23 0
oranges
0.29 0.26 0.22 0.28 0.0004
lemons
0.71 0.24 0.36 0.73 0.0044
Bayes Network 11
Lot of work Need Bayes rule Instead of simply multiplying each
dimensional probability… Must compute a multivariate covariance
matrix (num dim x num dim) Calculate multivariate PDF and all priors
Includes getting inverse of covariance matrix Only useful if covariance is strong and
predictive component of the data
8/29/03
Bayes Network 12
Other ways of removing naiveté
Sometimes useful to infer causal relationships
8/29/03
Dim x Dim yCauses
Bayes Network 13
If can figure out… …causal relationship between dimensions
no longer independent Conditional In terms of bins?
8/29/03
Dim x Dim yCauses
𝑃 (𝑦 𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝐴𝑔𝑖𝑣𝑒𝑛 h𝑡 𝑎𝑡 𝑐𝑎𝑢𝑠𝑒𝑑𝑏𝑦 𝑥 )=𝑃 (𝑦 𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝐴∨𝑥𝑖𝑠 𝑐𝑙𝑎𝑠𝑠𝐴 )
Bayes Network 14
Problem Determine a dependency network from
the data Use the dependencies to determine
probabilities that an instance is a given class
Use those probabilities to classify
8/29/03
Dim 4
Dim 3
Dim 2
Dim 1
Dim 5
Bayes Network 158/29/03
DAG Directed acyclic graph
Used to represent a dependency network Known as a Bayesian Belief Network
Structure
Dim 4
Dim 3
Dim 2
Dim 1
Dim 5
Bayes Network 168/29/03
Algorithm Not in book: 39 page
paper A Bayesian Method for th
e Induction of Probabilistic networks from Data
Known as the K2 algorithm
Bayes Network 178/29/03
Similar to Decision Tree Each node is a dimension Instead of representing a
decision it represents a conditional relationship
Algorithm for selecting nodes is greedy
Quote from paper: “The algorithm is named K2 because it evolved from a system named Kutató (Herskovits & Cooper, 1990) that applies the same greedy-search heuristics. As we discuss in section 6.1.3, Kutató uses entropy to score network structures.”
Dim 4
Dim 3
Dim 2
Dim 1 Dim 5
Bayes Network 188/29/03
General Approach Determine the no-parent score
for a given node (dimension) For each remaining dimension
(node) Determine the probability (score)
that each dimension is the parent for the given node(does a dependency appear to exist)
Compare the score of the best to the no-parent score If better, keep as parent and
repeat (see if can add another parent)
Otherwise done
Greedy
Find the parents of each node
Find best parentIf improve, keep
Find next “best parent”
Bayes Network 198/29/03
How Score? The probability that a
given data “configuration” could belong to a given DAG Do records with a given
value in one dimension tend to have a specific value in another dimension
Storm BusTourGroup
Campfire
Lightning
Thunder ForestFire
Example from Book
Bayes Network 208/29/03
Bayesian Belief Network BS Belief Network Structure
How probable?
𝑃 (𝐵𝑆 ,𝐷 )=𝑃 (𝐵𝑆)∏𝑖=1
𝑛
∏𝑗=1
𝑞 𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟 𝑖
𝑁 𝑖𝑗𝑘!
Don’t panic!We’ll get through it
Bayes Network 218/29/03
Proof Last 5 pages in the paper (39 pages total) A Bayesian Method for the Induction of Pr
obabilistic networks from Data
Bayes Network 228/29/03
Bayesian Belief Network BS Belief Network Structure
How probable?
𝑃 (𝐵𝑆 ,𝐷 )=𝑃 (𝐵𝑆)∏𝑖=1
𝑛
∏𝑗=1
𝑞 𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟 𝑖
𝑁 𝑖𝑗𝑘!
n=num dims (nodes)q=num unique instantiations of parents
• If one parent, q = number of distinct vals in parent• If two parents, q = num in p1*num in p2
r=num distinct vals possible in dimensionNijk=number of records with value = k in current dim that match parental instantiationNij=Number of records that match parental instantiation (sum of Nijk’s)
Bayes Network 238/29/03
Intuition Think of as a random match probability
What are the chances that the values seen in a dimension (for records that match the parental instantiation) could occur randomly?
Think of the as an adjustment upward (since it will show up in the numerator) indicating how the data is actually organized How organized is the data in the child dimension? Example 6!0! is 720 while 3!3! is 36 Sound familiar?
𝑃 (𝐵𝑆 ,𝐷 )=𝑃 (𝐵𝑆)∏𝑖=1
𝑛
∏𝑗=1
𝑞 𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟 𝑖
𝑁 𝑖𝑗𝑘!
Bayes Network 248/29/03
𝑔 (𝑖 ,𝜋𝑖)=∏𝑗=1
𝑞𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟𝑖
𝑁 𝑖𝑗𝑘 !
Algorithm Greedy algorithm for finding parents
For a given dimension Check no parent probability, store in Pold Then choose parent that maximizes g If that probability is greater than Pold, add to
list of parents, update Pold Keep adding till can’t increase probability
Bayes Network 258/29/03
No Parent Probability?
There is only one “instantiation” No parent filtering, so Nij is all
training samples Nijk is number in training set where
current dimension is value vj
𝑔 (𝑖 ,𝜋𝑖)=∏𝑗=1
𝑞𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟𝑖
𝑁 𝑖𝑗𝑘 !Orp
han
Bayes Network 268/29/03
Example from paper Three nodes Two instantiations for
parent X2 (of child X3) Parent X2 has value
absent Parent X2 has value
present Two instantiations for
parent X1 Parent X1 has value
absent Parent X1 has value
present
Bayes Network 278/29/03
X2 instantiation with val absent Number of X3 absents that
were X2 absents: 4 Number of X3 presents that
were X2 absents: 1 X2 instantion: val present
X3 absent|X2 present: 0 X3 present|X2 present: 5
Some numbers
Bayes Network 288/29/03
For dimension (i) 3
X3 Calculations
𝑃 (𝐵𝑆 ,𝐷 )=𝑃 (𝐵𝑆)∏𝑖=1
𝑛
∏𝑗=1
𝑞 𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟 𝑖
𝑁 𝑖𝑗𝑘!
(2−1 )! 0 !5 !(5+2−1 )!
(2−1 ) !4 !1!(5+2−1 ) !
Bayes Network 298/29/03
X1 instantiation with val absent Number of X2 absents that
were X1 absents: 4 Number of X2 presents that
were X1 absents: 1 X1 instantion: val present
X2 absent|X1 present: 1 X2 present|X1 present: 4
Some more numbers
Bayes Network 308/29/03
For dimension (i) 2
X2 Calculations
𝑃 (𝐵𝑆 ,𝐷 )=𝑃 (𝐵𝑆)∏𝑖=1
𝑛
∏𝑗=1
𝑞 𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟 𝑖
𝑁 𝑖𝑗𝑘!
(2−1 )!1 ! 4 !(5+2−1 )!
(2−1 ) ! 4 !1 !(5+2−1 ) !
Bayes Network 318/29/03
Dimension 1 has no parents Number of X1 absents: 5 Number of X1 presents: 5
Some more numbers
Bayes Network 328/29/03
For dimension (i) 1
Xi Calculations
𝑃 (𝐵𝑆 ,𝐷 )=𝑃 (𝐵𝑆)∏𝑖=1
𝑛
∏𝑗=1
𝑞 𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟 𝑖
𝑁 𝑖𝑗𝑘!
(2−1 )!5 !5 !(10+2−1 )!
Bayes Network 338/29/03
The whole enchilada The article calls this BS1
Putting it all Together
𝑃 (𝐵𝑆 ,𝐷 )=𝑃 (𝐵𝑆)∏𝑖=1
𝑛
∏𝑗=1
𝑞 𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟 𝑖
𝑁 𝑖𝑗𝑘!
(2−1 )!5 !5 !(10+2−1 )!
(2−1 )!1 ! 4 !(5+2−1 )!
(2−1 ) ! 4 !1!(5+2−1 ) !
(2−1 )! 0 !5 !(5+2−1 )!
(2−1 ) !4 !1!(5+2−1 ) !
X1 X2 X3
X3X2X1
𝑃 (𝐵𝑆1 ,𝐷 )=𝑃 (𝐵𝑆 1)2.23∗10− 9
Bayes Network 348/29/03
Assume that P(BS1)=P(BS2)S1 ten times more probable than S2
X3
X2
X1
Comparing networks Article compares BS1 to BS2
Bayes Network 358/29/03
Remember Not calculating whole tree Just a set of parents for a single node
No need for first product
𝑔 (𝑖 ,𝜋𝑖)=∏𝑗=1
𝑞𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟𝑖
𝑁 𝑖𝑗𝑘 !
Bayes Network 368/29/03
Result We get a list of most likely parents for
each nodeStorm BusTour
Group
Campfire
Lightning
Thunder ForestFire
Bayes Network 378/29/03
1. procedure K2;2, {Input: A set of n nodes, an ordering on the nodes, an upper bound u on the3. number of parents a node may have, and a database D containing m cases. }4. {Output: For each node, a printout of the parents of the node.}5. For i := 1 to n Do 6. πi: =∅;7. Pold := g(i, πi); {This function is computed using equation (12).}8. OKToProceed := true9. while OKToProceed and |πi| < u do10. let z be the node in Pred(xi) - πi that maximizes g(i, πi U {z});11. Pnew := g(i, πi U {z});12. if Pnew > Pold then13. Pold := Pnew;14. πi := πi U {z}!5. else OKToProceed := false;16. end {while};17. write('Node:', xi, 'Parents of this node:; πi)18. end {for};19. end {K2};
K2 algorithm (more formally)
The algorithm is named K2 because it evolved from a system named Kutató (Herskovits & Cooper, 1990) that applies the same greedy-search heuristics. As we discuss in section 6.1.3, Kutató uses entropy to score network structures.
𝑔 (𝑖 ,𝜋𝑖)=∏𝑗=1
𝑞𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟𝑖
𝑁 𝑖𝑗𝑘 !
Bayes Network 388/29/03
The “g” function
𝑔 (𝑖 ,𝜋𝑖)=∏𝑗=1
𝑞𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟𝑖
𝑁 𝑖𝑗𝑘 !
Function g(i,set of parents){Set score = 1If set of parents is empty
Nij is the size of entire training set and Sv is entire training setScore *= For each child instantiation (e.g. 0 and 1)
Get count of training record subset items (in Sv)That will be Nijk (in the case of two instantiations there will be two Nijk’s, and ri = 2)Score *=
ElseGet parental instantiations (e.g. 00,01,10,11)For each parental instantiation
Get the training records that match (Sv)Size of that set is Nij
Score *= For each child instantiation (e.g. 0 and 1)
Get count of training record subset items (in Sv)That will be Nijk (in the case of two instantiations there will be two Nijk’s, and ri = 2)Score *=
Return Score}
Bayes Network 398/29/03
Implementation Straightforward
Pred(i): returns all nodes that come before I in ordering
g: our formula
𝑔 (𝑖 ,𝜋𝑖)=∏𝑗=1
𝑞𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟𝑖
𝑁 𝑖𝑗𝑘 !
Bayes Network 408/29/03
How get instantiations If no parents
Work with all records in training set to accumulate counts for current dimension
No-parent approach
(𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟 𝑖
𝑁 𝑖𝑗𝑘 !
Bayes Network 418/29/03
How get instantiations With parents
My first attempt
What’s wrong with this?
For each parentFor each possible value in that parent’s dimension
Accumulate values End for
End for
Bayes Network 428/29/03
Instantiations Have to know which values to
use for every parent when accumulating counts
Must get all instantiations first
parents: 0 2 3 4instantiations0 0 0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
For each instantiationCompute first portion of numerator log((ri – 1)!)For each possible value in the current dim
Get counts that match instantiation and val in current dim
Update a sum of counts (for Nij)Update sum of log factorials (for Nijk’s)
End forAdd sum of log factorials to original numeratorCompute denominator log(Nij + ri -1)!)Subtract from numerator
End for
Bayes Network 438/29/03
parents: 0 2 3 4instantiations0 0 0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
But how… How generate the
instantiations? What if had different number’s
of legal values in each dimension?
I generated an increment function
Bayes Network 448/29/03
what’s this Ordering Nonsense?
It’s got to be a DAG (Acyclic) The algorithm ensures this with an
ordering An assumed order Pred(i) returns all nodes that occur earlier in
the ordering
An ordering on the nodes…
Bayes Network 458/29/03
How get around order issue?
Paper gives a couple of suggestions Could randomly shuffle order Do several times Take best scoring network
Perhaps whole different approach to generating network Start with fully connected Remove edge that increases the P(BS)
the most Continue until can’t increase Use whichever is better (random or
reverse)
Rand
omBa
ckw
ards
Bayes Network 468/29/03
Even with ordering … Number of possible structures grows exponentially
Paper states Even with an ordering constraint there are networks Once again binary membership (1 means edge is part of
graph, 0 not) All unidirectional edges:
Think distance matrix (from row to column) From Descartes
Book states Exact inference of probabilities in general for an
arbitrary Bayesian network is known to be NP-hard (guess who… Cooper 1990)
NP-hard
Bayes Network 478/29/03
is one over the count of all Bayesian Belief Structures (BS’s )
That’s a lot of networks and is at the heart of the derivation
𝑃 (𝐵𝑆 ,𝐷 )=𝑃 (𝐵𝑆)∏𝑖=1
𝑛
∏𝑗=1
𝑞 𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟 𝑖
𝑁 𝑖𝑗𝑘!
P(BS)
Bayes Network 488/29/03
How do we know the order? A priori, no knowledge of network
structureStorm BusTour
Group
Campfire
Lightning
Thunder ForestFire
Bayes Network 498/29/03
Bigger example What if have a thousand training When determining first Pold (no parents)
What will Nij be? How get around?
𝑔 (𝑖 ,𝜋𝑖)=∏𝑗=1
𝑞𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟𝑖
𝑁 𝑖𝑗𝑘 !
Perl presents 1000! as “inf”
Bayes Network 508/29/03
Paper discussed in time complexity section
Switching to log values Could add and subtract instead of
multiply and divide (faster) They even pre-calculated every log-
factorial value Up to the number of training values plus the
maximum number of distinct values
Approach that helped time complexity also helps in managing extremely large numbers
Bayes Network 518/29/03
Formula for log factorial Easy enough to write your own function
log (𝑛 ! )=∑𝑖=1
𝑛
log ( 𝑖)
Bayes Network 528/29/03
How does this … Impact implementation?
Bayes Network 538/29/03
𝑔 (𝑖 ,𝜋𝑖)=∏𝑗=1
𝑞𝑖 (𝑟 𝑖−1 ) !(𝑁 𝑖𝑗+𝑟 𝑖−1 ) !∏𝑘=1
𝑟𝑖
𝑁 𝑖𝑗𝑘 !
Revisit Algorithm Greedy algorithm for finding parents
For a given dimension Check no parent probability, store in Pold Then choose parent that maximizes g If that probability is greater than Pold, add to
list of parents, update Pold Keep adding till can’t increase probability
Bayes Network 548/29/03
Have Network—Now What?
Training For dimensions (nodes) with no parents,
calculate counts as usual (considered independent so naïve process appropriate)
For dimensions with parents, will need to calculate counts for all possible combinations of parent values
TRAININGOne parent:• Just count records with par val
• Present• Absent
Two parents• Just count records with par val
• Present, present• Present, absent• Absent, present• Absent,absent
Bayes Network 558/29/03
I didn’t do this I made my algorithm lazy I did my “with-parents” counts
during the test process
Lazy
Total = 0;Num in class = 0;For each training record
Count this record = trueFor each parent
For each possible value in parentIf parent value (or bin) not equal to test value (or bin)
Count this record = false
if count this record is trueTotal ++If training class equals test class
Num in class ++
Bayes Network 568/29/03
Whole Algorithm Train
Build network Generate counts
with all possible parental value combinations Test
Just like naïve For each dimension (node) calculate probability for each
class Note that those with parents will be conditional probabilities
i.e. just look at training samples that match parental values For each class, multiply together the probabilities that
the test instance is that class across all nodes (dimensions)
Choose class with maximum probability as your prediction
Bayes Network 578/29/03
Lazy Algorithm Train
Build network Generate counts
Test For each dimension (node) calculate
probability for each class Note that those with parents will have to be
calculated on the fly (lazy) For each class, multiply together the
probabilities that the test instance is that class across all nodes (dimensions)
Choose class with maximum probability as your prediction
Bayes Network 588/29/03
Lazy and Random Ordering Algorithm Train
Try multiple random orderings Keep best network
Generate counts Test
For each dimension (node) calculate probability for each class Note that those with parents will have to be
calculated on the fly (lazy) For each class, multiply together the
probabilities that the test instance is that class across all nodes (dimensions)
Choose class with maximum probability as your prediction
Bayes Network 598/29/03
Formulaic Representation Probability representation, note the
comma
Book utilizesFor instance if parents were dimensions 1 and 2
Bayes Network 608/29/03
Terminology Quotes from paper
Cases occur independently given a belief network
Nodes not connected represent variables which are conditionally independent of each other
From book A Bayesian belief network describes the
probability distribution governing a set of variables by specifying a set of conditional independence assumptions along with a set of conditional probabilities
Frame in term
s of independence vs.
dependence
Bayes Network 618/29/03
Can be Given a Network Don’t necessarily have to “learn” it Could be the result of domain knowledge
Storm BusTourGroup
Campfire
Lightning
Thunder ForestFire
Bayes Network 628/29/03
Causality? Do edges really represent causal
relationships? Maybe not
Non “conditional independence” does not necessarily imply causality
"Correlation does not imply causation" Both could be symptoms of some other
cause …though it is necessary for causationDim
xDim
yCauses
Bayes Network 638/29/03
Famous example Studies showed that women
who were taking combined hormone replacement therapy (HRT) also had a lower-than-average incidence of coronary heart disease (CHD) Led doctors to propose that HRT was protective
against CHD Randomized controlled trials showed that HRT
caused a small but statistically significant increase in risk of CHD
Subjects were more likely to be from higher socio-economic groups
Example
Bayes Network 648/29/03
Another Third Factor Example
Sleeping with one's shoes on is strongly correlated with waking up with a headache.
Therefore, sleeping with one's shoes on causes headache.
Maybe drunk people are more likely to sleep with shoes on and wake up with headache
Bayes Network 658/29/03
Another third factor As ice cream sales
increase, the rate of drowning deaths increases sharply.
Therefore, ice cream causes drowning.
Ice cream sales increase greatly in summer as do drownings
Bayes Network 668/29/03
Directionality Also tough to determine
direction of a causal relationship
Example The more firemen fighting
a fire, the bigger the fire is observed to be.
Therefore firemen cause fire.
Bayes Network 678/29/03
Project note The project has only 7 positives (993 non-
forest-fires) Both Weka and my first version
achieved .993 percent accuracy It learned that it could simply predict
negative every time and still get .993 Solution?
99.3% accuracy ain’t badOr is it?
Bayes Network 688/29/03
Summary Greedy algorithm
Learn the network (using as a scoring mechanism)
Learn the underlying probabilities from the training data given the network Or just the counts and learn the
probabilities lazily Classify new instances based upon
these probabilities Must have an assumed order
Can try several random orders and choose best
Network is consistent the implied causal relationships but…
Dim 4
Dim 3
Dim 2
Dim 1
Dim 5
Bayes Network 698/29/03