7/14/2006uai06-inference evaluationpage 1 uai 2006 report from the 1 st evaluation of probabilistic...
Post on 22-Dec-2015
215 views
TRANSCRIPT
7/14/2006 UAI06-Inference Evaluation Page 1
UAI 2006report from the
1st Evaluation of Probabilistic InferenceJuly 14th, 2006
B,C,D,FF,D,G
D,F
7/14/2006 UAI06-Inference Evaluation Page 2
What is this presentation about?
• Goal: The purpose of this evaluation is to compare the performance of a variety of different software systems on a single set of Bayesian network (BN) problems.
• By creating a friendly evaluation (as is often done in other communities such as SAT, and also in speech recognition and machine translation with their DARPA evaluations), we hope to foster new research in fast inference methods for performing a variety of queries in graphical models.
• Over the past few months, the 1st such an evaluation took place at UAI.
• This presentation summarizes the outcome of this evaluation.
7/14/2006 UAI06-Inference Evaluation Page 3
Who we are
• Evaluators– Jeff Bilmes – University of Washington, Seattle– Rina Dechter – University of California, Irvine
• Graduate Student Assistance – Chris Bartels – University of Washington, Seattle– Radu Marinescu – University of CA, Irvine– Karim Filali – University of Washington, Seattle
• Advisory Council– Dan Geiger -- Technion - Israel Institute of Technology – Faheim Bacchus – University of Toronto– Kevin Murphy – University of British Columbia
7/14/2006 UAI06-Inference Evaluation Page 4
Outline
• Background, goals.• Scope (rational)• Final chosen queries• The UAI 2006 BN benchmark evaluation corpus • Scoring strategies• Participants and team members• Results for PE and MPE• Team presentations
– team1 (UCLA), team2 (IET), team3 (UBC), team4 (U. Pitt/DSL), team 5 (UCI)
• Conclusion/Open discussion
7/14/2006 UAI06-Inference Evaluation Page 5
Acknowledgements: Graduate Student Help
Radu Marinescu,University of CA, Irvine
Chris Bartels,University of Washington
Karim Filali,University of Washington
Also, thanks to another U. Washington Student, Mukund Narasimhan (now at MSR)
7/14/2006 UAI06-Inference Evaluation Page 6
Background
• Early 2005: Rina Dechter & Dan Geiger decide there should be some form of UAI inference evaluation (like in the SAT community) and discuss the idea (by email) with Adnan Darwiche, Faheim Bacchus, Hector Geffner, Nir Friedman, Thomas Richardson.
• I (Jeff Bilmes) take on the task to run it this first time.– Speech recognition and DARPA evaluations
• evaluation of ASR systems using error rate as a metric.
7/14/2006 UAI06-Inference Evaluation Page 7
Scope
• Many “queries” could be evaluated including:– MAP – maximal a posteriori hypothesis– MPE – most probable explanation (also called Viterbi assignment)– PE – probability of evidence– N-best – compute the N-best of the above
• Many algorithmic variants– Exact inference– Enforced limited time-bounds and/or space bounds– Approximate inference, and tradeoffs between time/space/accuracy
• Classes of models– Static BNs with a generic description (list of CPTs)– More complex description language (e.g., context specific indep.)– Static models vs. Dynamic models (e.g., Dynamic Bayesian
Networks, and DGMs) vs. relational models
7/14/2006 UAI06-Inference Evaluation Page 8
Decisions for this first evaluation.
• Emphasis: Keep things simple.• Focus on exact inference
– exact inference can still be useful. – “Exact inference is NP-complete, so we perform approximate
inference” is often seen in the literature– With smart algorithms, and for fixed (but real-world) problem sizes,
exact is quite doable and can be better for applications.
• Focus on small number of queries:• Original plan: PE, MPE, and MAP for both static and dynamic
models• From final participants list, narrowed this down to:
PE and MPE on static Bayesian networks
7/14/2006 UAI06-Inference Evaluation Page 9
Query: Probability of Evidence (PE)
7/14/2006 UAI06-Inference Evaluation Page 10
Query: Most Probable Explanation (MPE)
7/14/2006 UAI06-Inference Evaluation Page 11
The UAI06 BN Evaluation Corpus
• J=78 BNs used for PE, and J=57 BNs used for MPE queries. The BNs were not exactly the same.
• BNs were the following (more details will appear on web page):– random mutations of the burglar alarm graph– diagnosis network (Druzdzel)– DBNs from speech recognition that were unrolled a fixed amount. – Variations on the Water DBN – Various forms of grids– Variations on the ISCAS 85 electrical circuit– Variations on the ISCAS 89 electrical circuit– Various genetic linkage graphs (Geiger)– BNs from computer-based patient care system (cpcs)– Various randomly generated graphs (F. Cozman’s alg).– Various known-tree-width random k-trees, with determinism (k=24)– Various known-tree-width random positive k-trees, (k=24)– Various linear block coding graphs.
• While some of these have been seen before, BNs were “anonymized” before being distributed.
• BNs distributed in xbif format (basically XML)
7/14/2006 UAI06-Inference Evaluation Page 12
Timing Platform and Limits
• Timing machines: dual-CPU 3.8GHz Pentium Xeons with 8Gb of RAM each, with hyper-threading turned on.
• Single threaded performance only in this evaluation.• Each team had 4 days of dedicated machine usage to
complete there timings (other than this, there was no upper time bound).
• No-one asked for more time than these 4 days -- after timing the BNs, teams could use rest of 4 days as they wish for further tuning. After final numbers were sent to me, no further adjustment of timing numbers have taken place (say based on seeing other’s results).
• Each timing number was the result of running a query 10 times, and then reporting the fastest (lowest) time.
7/14/2006 UAI06-Inference Evaluation Page 13
The Teams
• Thanks to every member of every team: Each member of every team was crucial to making this a successful event!!
7/14/2006 UAI06-Inference Evaluation Page 14
Team 1: UCLA
•David Allen (now at HRL Labs, CA)•Mark Chavira (graduate student)•Adnan Darwiche
• Keith Cascio• Arthur Choi (graduate student)• Jinbo Huang (now at NICTA, Australia)
7/14/2006 UAI06-Inference Evaluation Page 15
Team 2: IET
From right to left in photo: • Masami Takikawa• Hans Dettmar• Francis Fung• Rick Kissh
Other team members:•Stephen Cannon •Chad Bisk•Brandon Goldfedder
Other key contributors:• Bruce D'Ambrosio• Kathy Laskey• Ed Wright• Suzanne Mahoney• Charles Twardy• Tod Levitt
7/14/2006 UAI06-Inference Evaluation Page 16
Team 3: UBC
Jacek Kisynski,University of British Columbia
Michael Chiang ,University of British Columbia
David Poole,University of British Columbia
7/14/2006 UAI06-Inference Evaluation Page 17
Team 4: U. Pittsburgh, DSL
Tomasz Sowinski,University of Pittsburgh, DSL Marek J. Druzdzel,
University of Pittsburgh, DSL
UAI Software Competition
Team 4: Decision Systems LaboratoryTeam 4: Decision Systems Laboratory
7/14/2006 UAI06-Inference Evaluation Page 19
Team 5: UCI
Robert Mateescu ,University of CA, Irvine
Rina Dechter,University of CA, IrvineRadu Marinescu,
University of CA, Irvine
7/14/2006 UAI06-Inference Evaluation Page 20
The Results
7/14/2006 UAI06-Inference Evaluation Page 21
Definition Correct Answer
7/14/2006 UAI06-Inference Evaluation Page 22
Definition of “FAIL”
• Each team had 4 days to complete the evaluation• No time-limit placed on any particular BN.• A “FAILED” score meant that either the system failed to
complete the query, or that the system underflowed their own numeric precision.
– some of the networks were designed not to fit within IEEE 64-bit double precision, so either scaling or log-arithmetic needed to be used (which is a speed hit for PE).
• Teams had the option to submit multiple separate submissions, none did.
• Systems were allowed to “backoff” from no-scaling to, say, a log-arithmetic mode (but that was included in the time charge)
7/14/2006 UAI06-Inference Evaluation Page 23
Definition of Average Speedup
7/14/2006 UAI06-Inference Evaluation Page 24
Results – PE and MPE
78 PE BNs, and 57 MPE BNs
1. Failure rates – number of networks that each team failed to produce a score or a correct score (including underflow)
2. Speedup results on categorized BNs– a BN is categorized based on how many teams failed to produce a
score (so 0-5 for PE, or 0-3 for MPE)– Average speedup of the best performance over a particular team’s
performance on each category.
3. Rank scores– Number of times a particular team was rank n for various n
4. Workload scores– In what workload region (if any) is a particular team optimal.
7/14/2006 UAI06-Inference Evaluation Page 25
PE Results
7/14/2006 UAI06-Inference Evaluation Page 26
PE Failure Rate Results (low is best)
0
10.26
19.23
14.1
19.23
0
2
4
6
8
10
12
14
16
18
20F
ailu
re R
ate
(per
cen
t)
Reminder: 78 BNs total for PE
7/14/2006 UAI06-Inference Evaluation Page 27
Avg. Speedups: BNs that ran on all 5 systems
85.59
52.44
77.33
1.02
17.15
0
10
20
30
40
50
60
70
80
90S
pee
du
p o
f fa
stes
t o
ver
team
j
61 out of 78 BNs ran on all systems.
Remember: lower is better!
7/14/2006 UAI06-Inference Evaluation Page 28
Avg. Speedups: BNs that ran on all systems
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Sp
eed
up
sta
ts (
log
10)
Team 1Team 2
Team 3Team 4
Team 5
Average Std Min Max
7/14/2006 UAI06-Inference Evaluation Page 29
Avg. Speedups: BNs that ran on (3-4)/5 systems
4.93
12.95
1.851
18.42
0
2
4
6
8
10
12
14
16
18
20S
pee
du
p o
f fa
stes
t o
ver
team
j
8 out of 78 BNs ran on only 3 or 4 systems.
7/14/2006 UAI06-Inference Evaluation Page 30
Avg. Speedups: BNs that ran on (3-4)/5 systems
0
10
20
30
40
50
60
Sp
eed
up
sta
ts
Team 1Team 2
Team 3Team 4
Team 5
Average Std Min Max
7/14/2006 UAI06-Inference Evaluation Page 31
Avg. Speedups: BNs that ran on (1-2)/5 systems
1
11.28
0
2
4
6
8
10
12S
pee
du
p o
f fa
stes
t o
ver
team
j
9 out of 78 BNs ran on only 1 or 2 systems.
Only 2 teams (team 1 and 2) had systems that could run this category BN.
These include the genetic linkage BNs.
7/14/2006 UAI06-Inference Evaluation Page 32
Rank Proportions (how often was each team a particular rank, rank 1 is best)
0%
20%
40%
60%
80%
100%
Ran
k P
rop
ort
ion
s
Team 1Team 2
Team 3Team 4
Team 5
Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Fail
7/14/2006
UAI06-Inference Evaluation
33
Team 1 Team 2 Team 3 Team 4 Team 5 Team 1 Team 2 Team 3 Team 4 Team 5
BN_0 Alarm graph 0.476 1.680 0.400 0.010 0.020 0.17700 0.07000 0.17000 0.00065 0.02000
BN_1 0.548 2.130 0.520 0.010 0.040 0.25500 0.07700 0.29000 0.00133 0.03000BN_2 0.507 2.160 0.500 0.010 0.010 0.17500 0.06800 0.22000 0.00065 0.01000BN_3 0.484 3.020 0.490 0.010 0.020 0.19900 0.09500 0.21000 0.00144 0.02000BN_4 0.564 2.220 0.690 0.010 0.020 0.26700 0.10600 0.36000 0.00097 0.02000BN_5 0.623 2.490 0.670 0.010 0.060 0.26400 0.12500 0.32000 0.00217 0.05000BN_6 0.785 2.970 0.820 0.010 0.070 0.36400 0.16000 0.41000 0.00241 0.06000BN_7 0.781 3.000 0.810 0.010 0.140 0.44300 0.11100 0.53000 0.00392 0.13000BN_8 0.675 1.780 0.540 0.010 0.080 0.36600 0.09000 0.28000 0.00278 0.08000BN_9 0.876 3.200 0.530 0.010 0.170 0.56800 0.10000 0.27000 0.00568 0.17000BN_10 0.669 1.610 0.540 0.010 0.060 0.37400 0.09100 0.29000 0.00200 0.06000BN_11 0.916 2.070 0.690 0.010 0.220 0.55700 0.14600 0.37000 0.00808 0.21000BN_12 0.636 2.540 0.780 0.010 0.200 0.30100 0.11000 0.51000 0.00438 0.20000BN_13 0.628 2.240 0.790 0.020 0.120 0.28800 0.11700 0.47000 0.01415 0.11000BN_14 0.979 3.580 1.780 0.100 1.790 0.64700 0.19800 1.45000 0.09400 1.79000BN_15 0.908 2.370 0.920 0.060 1.270 0.55400 0.16300 0.57000 0.05521 1.26000
BN_16 Diagnosis 1.581 4.430 1.110 0.070 0.170 0.43600 0.32300 0.16000 0.00225 0.02000
BN_18 1.583 3.590 1.050 0.070 0.170 0.45800 0.29600 0.12000 0.00229 0.02000
BN_20 Speech 66.103 28.300 56.910 FAIL 467.440 60.07600 20.53900 46.81000 FAIL 463.24000
BN_22 recognition 8.070 11.110 11.710 1.960 13.640 2.72100 5.40400 4.07000 0.72258 10.44000BN_24 7.030 8.290 9.680 1.420 7.340 2.44000 2.36000 3.50000 0.34183 4.50000BN_26 52.795 16.630 19.850 FAIL 123.120 47.56100 8.35000 11.84000 FAIL 119.23000
BN_28 Water DBN 3.723 3.620 3.630 0.720 1.620 0.22400 0.09400 0.41000 0.02208 0.21000
BN_30 Grids 2.744 4.630 FAIL 0.420 FAIL 1.86700 2.21700 FAIL 0.39248 FAILBN_32 3.192 6.350 FAIL 0.520 FAIL 2.25000 3.99400 FAIL 0.47857 FAILBN_34 3.158 30.190 FAIL 0.490 FAIL 2.21000 26.72500 FAIL 0.45130 FAILBN_36 3.124 5.470 FAIL 0.100 FAIL 2.14500 3.47500 FAIL 0.39153 FAILBN_38 3.042 6.850 FAIL 0.590 FAIL 2.06600 4.29300 FAIL 0.55135 FAILBN_40 3.074 5.810 FAIL 0.320 FAIL 2.07200 3.36700 FAIL 0.28734 FAIL
WALL TIME INFERENCE TIMEAnother look at PE:
7/14/2006
UAI06-Inference Evaluation
34
Team 1 Team 2 Team 3 Team 4 Team 5 Team 1 Team 2 Team 3 Team 4 Team 5
BN_42 iscas85 1.296 2.960 0.950 0.020 0.130 0.59800 0.48400 0.34000 0.00738 0.11000
BN_43 1.212 3.040 0.990 0.030 0.100 0.53700 0.48600 0.41000 0.00707 0.08000BN_44 1.608 2.890 1.060 0.130 0.300 0.91200 0.52700 0.46000 0.10763 0.27000BN_45 0.981 3.030 0.680 0.020 0.050 0.34400 0.32700 0.14000 0.00217 0.03000BN_46 1.192 2.880 1.210 0.160 1.370 0.63800 0.43600 0.71000 0.15363 1.36000
BN_47 iscas89 1.373 3.480 2.170 0.190 0.750 0.70000 0.55200 1.52000 0.17385 0.74000
BN_49 1.325 3.410 9.120 0.420 0.650 0.64200 0.52800 8.36000 0.40277 0.63000BN_51 1.056 2.960 1.240 0.030 0.200 0.44100 0.50400 0.68000 0.01871 0.18000BN_53 0.834 2.740 0.630 0.010 0.040 0.21000 0.26000 0.16000 0.00104 0.02000BN_55 1.161 2.180 1.450 0.140 0.720 0.55700 0.47200 0.88000 0.12664 0.70000BN_57 0.817 2.950 0.630 0.020 0.020 0.21000 0.25400 0.12000 0.00097 0.01000BN_59 0.748 2.470 0.580 0.010 0.020 0.17700 0.27100 0.07000 0.00067 0.01000BN_61 1.100 2.700 1.390 0.100 0.100 0.41600 0.47200 0.78000 0.08385 0.08000BN_63 0.974 2.330 1.010 0.030 0.260 0.36700 0.42900 0.48000 0.02013 0.25000BN_65 0.739 2.490 0.630 0.010 0.030 0.24200 0.24500 0.19000 0.00327 0.02000BN_67 0.938 2.630 0.910 0.040 0.370 0.47600 0.33000 0.46000 0.02433 0.36000
BN_69 Genetic 5.103 FAIL FAIL FAIL FAIL 3.99500 FAIL FAIL FAIL FAIL
BN_70 linkage 8.069 75.300 FAIL FAIL FAIL 6.38800 72.02900 FAIL FAIL FAILBN_71 8.844 FAIL FAIL FAIL FAIL 7.27100 FAIL FAIL FAIL FAILBN_72 16.581 FAIL FAIL FAIL FAIL 14.79000 FAIL FAIL FAIL FAILBN_73 16.634 FAIL FAIL FAIL FAIL 15.00800 FAIL FAIL FAIL FAILBN_74 10.532 FAIL FAIL FAIL FAIL 9.58500 FAIL FAIL FAIL FAILBN_75 25.656 FAIL FAIL FAIL FAIL 24.10700 FAIL FAIL FAIL FAILBN_76 31.216 FAIL FAIL FAIL FAIL 29.22600 FAIL FAIL FAIL FAILBN_77 59.385 FAIL FAIL FAIL FAIL 57.87100 FAIL FAIL FAIL FAIL
WALL TIME INFERENCE TIME
7/14/2006
UAI06-Inference Evaluation
35
Team 1 Team 2 Team 3 Team 4 Team 5 Team 1 Team 2 Team 3 Team 4 Team 5BN_78 CPCS 0.409 1.520 0.350 0.010 0.010 0.14800 0.04900 0.12000 0.00044 0.01000BN_80 0.903 1.740 0.970 0.040 0.080 0.19200 0.12800 0.22000 0.00112 0.02000BN_82 0.910 2.080 1.150 0.050 0.090 0.25200 0.12300 0.29000 0.00116 0.03000BN_84 1.004 1.940 1.070 0.040 0.090 0.29500 0.15700 0.34000 0.00162 0.03000BN_86 2.397 4.520 3.220 0.780 1.790 0.47400 0.32800 0.45000 0.04229 0.67000BN_88 2.590 4.470 3.330 0.760 1.820 0.47000 0.31100 0.46000 0.02511 0.74000BN_90 2.621 4.370 3.210 0.780 1.800 0.49300 0.30800 0.40000 0.04246 0.74000BN_92 2.691 4.260 3.300 0.800 1.870 0.52900 0.27700 0.54000 0.07392 0.77000BN_94 Random 0.754 1.550 0.730 0.040 0.090 0.11500 0.03400 0.05000 0.00040 0.01000BN_96 graphs 0.603 1.570 0.880 0.020 0.090 0.19200 0.08200 0.48000 0.00472 0.07000BN_98 0.771 1.590 0.940 0.050 0.140 0.19300 0.07800 0.23000 0.00435 0.05000BN_100 1.089 1.630 1.040 0.060 0.270 0.49200 0.09900 0.32000 0.01411 0.18000BN_102 1.748 2.020 3.190 0.330 1.700 0.62000 0.15900 1.64000 0.10794 1.27000
BN_104 k-trees 15.708 21.630 14.550 4.020 5.930 0.34800 0.86300 0.19000 0.00699 0.34000
BN_106 (k=24) 11.097 18.500 20.170 3.650 6.880 0.54900 0.87300 8.53000 0.38662 0.19000BN_108 determinism 9.527 17.080 13.390 3.300 5.670 0.48900 0.87000 3.00000 0.42090 0.87000BN_110 7.704 14.930 11.250 2.790 4.120 0.40000 0.69100 1.62000 0.17202 0.73000BN_112 10.310 16.790 19.390 3.430 6.990 0.57100 0.84000 8.69000 0.44628 0.40000
BN_114 k-trees 5.345 7.550 7.620 1.880 6.480 1.04400 0.49500 1.96000 0.14516 3.50000
BN_116 (k=24) 8.592 12.350 10.040 3.140 5.790 0.63700 0.60200 0.38000 0.01697 0.58000BN_118 positive 4.928 8.790 6.630 1.920 4.290 0.41500 0.42700 0.76000 0.04710 1.04000BN_120 5.457 8.220 7.380 2.020 5.060 0.78500 0.44400 1.06000 0.06795 1.67000BN_122 6.891 12.020 8.890 2.580 5.440 0.68400 0.52500 0.84000 0.04319 1.22000BN_124 4.297 8.020 6.810 1.700 4.340 0.43400 0.39900 1.44000 0.07180 1.54000
WALL TIME INFERENCE TIME
7/14/2006 UAI06-Inference Evaluation Page 36
MPE Results
• Only three teams participated:– Team 1– Team 2– Team 5
• 57 BNs, not the same ones, but some are variations of the same original BN.
7/14/2006 UAI06-Inference Evaluation Page 37
MPE Failure Rate Results
0
38.6
7.01
0
5
10
15
20
25
30
35
40
Fai
lure
Rat
e (p
erce
nt)
Team 1Team 2
Team 5
57 B
Ns
tota
l
7/14/2006 UAI06-Inference Evaluation Page 38
MPE Avg. Speedups: BNs that ran on all 3 systems
5.38
2.75
7.37
0
1
2
3
4
5
6
7
8S
pee
du
p o
f fa
stes
t o
ver
team
j
31 out of 57 BNs ran on all systems.
7/14/2006 UAI06-Inference Evaluation Page 39
MPE Avg. Speedups: BNs that ran on all 3 systems
0
10
20
30
40
50
60
70
80
90
Sp
eed
up
sta
ts
Team 1Team 2
Team 5
Average Std Min Max31 out of 57 BNs ran on all systems.
7/14/2006 UAI06-Inference Evaluation Page 40
6.6
1.2
47.46
0
5
10
15
20
25
30
35
40
45
50S
pee
du
p o
f fa
stes
t o
ver
team
j
26 out of 57 BNs ran on 2 systems.
MPE Avg. Speedups: BNs that ran on 2/3 systems
7/14/2006 UAI06-Inference Evaluation Page 41
MPE Avg. Speedups: BNs that ran on 2/3 systems
0
100
200
300
400
500
600
700
800
Sp
eed
up
sta
ts
Team 1Team 2
Team 5
Average Std Min Max
7/14/2006 UAI06-Inference Evaluation Page 42
Rank Proportions (how often was each team a particular rank, rank 1 is best)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Ran
k P
rop
ort
ion
s
Team 1Team 2
Team 5
Rank 1 Rank 2 Rank 3 Fail
7/14/2006 UAI06-Inference Evaluation 43
team 1 team 2 team 5 team 1 team 2 team 5BN_17 46.193 FAIL 0.79 44.988 FAIL 0.75BN_19 47.436 FAIL 0.72 46.259 FAIL 0.68 Diagnosis graphBN_21 34.689 24.82 u/flow 28.182 17.66 u/floeBN_23 9.514 12.46 u/flow 4.389 5.857 u/flowBN_25 8.284 8.47 u/flow 3.765 2.951 u/flowBN_27 10.963 16.76 u/flow 5.688 8.843 u/flow Speech recognitionBN_29 3.324 4.07 1.54 0.4 0.153 0.17 Water DBNBN_31 2.951 3.31 74.47 2.073 0.835 74.46BN_33 3.253 21.98 38.76 2.261 19.551 38.76BN_35 3.192 5.1 41.81 2.203 2.243 41.8BN_37 3.263 4.73 12.39 2.296 2.895 12.38BN_39 3.254 7.84 137.24 2.265 5.158 137.23BN_41 3.08 4.58 21.7 2.097 2.29 21.68 GridsBN_48 1.216 FAIL 0.25 0.559 FAIL 0.25BN_50 1.36 FAIL 0.7 0.673 FAIL 0.69BN_52 1.165 FAIL 0.85 0.488 FAIL 0.84BN_54 0.818 FAIL 8.36 0.203 FAIL 8.35BN_56 1.133 FAIL 97.05 0.513 FAIL 97.04BN_58 0.81 FAIL 0.78 0.205 FAIL 0.77BN_60 0.758 FAIL 118.39 0.168 FAIL 118.38BN_62 1.107 FAIL 5.01 0.454 FAIL 5BN_64 0.925 FAIL 16.7 0.375 FAIL 16.69BN_66 0.693 FAIL 4.81 0.233 FAIL 4.81BN_68 1.047 FAIL 0.43 0.54 FAIL 0.42 ISCAS 85BN_79 0.462 2.15 0.01 0.173 0.056 0.01BN_81 1.481 3.65 0.11 0.814 0.513 0.05BN_83 1.687 2.85 0.13 1.003 0.544 0.07BN_85 2.067 2.91 0.19 1.403 0.514 0.13BN_87 9.019 5.9 1.42 6.415 1.21 0.33BN_89 3.523 4.94 1.43 1.254 0.644 0.32BN_91 8.531 5.61 1.39 6.28 1.183 0.32BN_93 5.016 5.42 1.67 2.958 0.902 0.57 CPCSBN_95 1.789 2.7 0.27 0.966 0.541 0.19BN_97 1.896 2.56 0.27 1.433 0.396 0.25BN_99 6.213 3.06 1.01 5.607 1.021 0.93BN_101 1.648 2.37 0.19 1.087 0.259 0.1BN_103 3.257 3.47 0.94 2.226 1 0.52 RandomBN_105 14.65 22.23 6.88 0.473 1.108 0.17BN_107 37.053 19.16 6.25 0.992 1.929 0.82BN_109 9.782 16.33 5.64 0.774 1.206 0.88BN_111 8.319 14.94 4.96 0.606 1.002 0.53BN_113 10.093 17.3 5.69 0.835 1.516 0.96 k-trees w determinismBN_115 6.955 7.69 4.5 2.438 1.005 1.38BN_117 8.57 12.71 5.88 0.888 0.738 0.6BN_119 5.616 8.49 4.06 1.099 0.607 0.88BN_121 6.36 8.05 4.2 1.168 0.681 0.84BN_123 6.958 11.52 5.07 0.925 0.674 0.67BN_125 4.866 8.29 3.6 1.114 0.589 0.81 k-trees positiveBN_126 30.904 FAIL 30.12 30.276 FAIL 30.11BN_127 23.6 FAIL 111.65 22.979 FAIL 111.64BN_128 8.306 FAIL 2.07 7.713 FAIL 2.07BN_129 42.683 FAIL 147.32 42.106 FAIL 147.32BN_130 37.685 FAIL 13.59 37.071 FAIL 13.58BN_131 10.146 FAIL 16.47 9.577 FAIL 16.46BN_132 78.846 FAIL 164.41 78.243 FAIL 164.4BN_133 24.212 FAIL 9.86 23.6 FAIL 9.86BN_134 21.243 FAIL 130.48 20.655 FAIL 130.48 Coding
WALL TIME INFERENCE TIME
Raw
MP
E R
esul
ts
7/14/2006 UAI06-Inference Evaluation Page 44
PE and MPE Results
7/14/2006 UAI06-Inference Evaluation Page 45
Workload Scores: PE and MPE
7/14/2006 UAI06-Inference Evaluation Page 46
Workload Scores and Linear Programming
7/14/2006 UAI06-Inference Evaluation Page 47
Workload Scores: PE and MPE
• So each team is a winner, it depends on the workload.
• Could attempt further to rank teams based on volume of workload region where a team wins.
• Which measure, however, should we on the simplex, uniform? Why not something else.
• “A Bayesian approach to performance ranking”– UAI does system performance measures …
7/14/2006 UAI06-Inference Evaluation Page 48
Team technical descriptions
• 5 minute for each team.• Current plan: more details to ultimately appear on the
inference evaluation web site (see main UAI page).
7/14/2006 UAI06-Inference Evaluation Page 49
Team 1: UCLA Technical Description
• presented by Adnan Darwiche
7/14/2006 UAI06-Inference Evaluation 50
Team UCLA Performance summary:
Solved all 78 P(e) networks in 319s: about 4s per instance Solved all 57 MPE networks in 466s: about 8s per instance
MPE approach Prune network If network has treewidth 25 or less, run RC Else if network has enough local structure, run Ace Else run BnB/Ace
P(e) approach Prune network If network has genetic net characteristics, run RC_Link Else if network has treewidth 25 or less, run RC Else run Ace
Approach is powerful enough to solve every network in every suite. Yet, it incurs a fixed overhead that disadvantages it on easy networks
7/14/2006 UAI06-Inference Evaluation 51
RC and RC Link Recursive Conditioning
Conditioning/Search algorithm Based on decomposing the network Inference exponential in treewidth VE/Jointree could have been used for
this!
RC Link RC with local structure exploitation Not necessarily exponential in treewith Download:
http://reasoning.cs.ucla.edu/rc_link
7/14/2006 UAI06-Inference Evaluation 52
Ace
Compiles BN to arithmetic circuit Reduce to logical inference Strength: Local Structure
(determinism & CSI) Strength: online inference Inference not exponential in
treewidth http://reasoning.cs.ucla.edu/ace
7/14/2006 UAI06-Inference Evaluation 53
Branch & Bound
Approximate network by deleting edges to provides an upper bound on MPE
Compile the network using Ace and use to drive search
Use belief propagation to construct seed a static variable order for each variable, an ordering on
values
7/14/2006 UAI06-Inference Evaluation 54
7/14/2006 UAI06-Inference Evaluation Page 55
Team 2: IET Technical Description
• presented by Masami Takikawa
Basic SPI Algorithm
Collect factors
Order nodes
d-separation, barren nodes removal & evidence propagation
Minimum weight heuristic
Multiplication
Summation
Was able to solve 59 out of 78 challenges.BN
factors
Repeat these steps until all variables are eliminated
UAI2006 BN Engine Eval. Copyright©, IET, 2006.
#BNs MaxWeight
4 <=100
2 <=1,000
20 <=10,000
16 <=100,000
13 <=1,000,000
4 <=2,000,000
Team 2Team 2
Extensions (aka additional overhead)
Collect factors
Order nodes
Time-slice ordering if DBN
Multiplication
Normalization
factored graph
factors
Summation
Intra-node factorizationBN
Needed to avoid underflow for BN_20-26.
BN Without INF With INF
BN_30 130M (27) 66K (16)
BN_32 17G (34) 260K (18)
BN_34 1.1G (30) 3.1M (21)
BN_36 4.3G (32) 130K (17)
BN_38 4.3G (32) 520K (19)
BN_40 1.1G (30) 130K (17)
BN Min-Weight Time-slice
BN_70 8.3E19 (56) 9.0E7 (21)
BN_72 2.9E21 (46) 3.5E11 (38)
BN_73 6.4E22 (51) 4.3E9 (26)
BN_75 5.5E18 (43) 1.7E10 (34)
BN_76 1.9E20 (35) 1.7E11 (24)
UAI2006 BN Engine Eval. Copyright©, IET, 2006.
Max Weight (Max
Width)
Max Weight (Max
Width)
Solved additional 11 challenges.
7/14/2006 UAI06-Inference Evaluation Page 58
Team 3: UBC Technical Description
• presented by David Poole
7/14/2006 UAI06-Inference Evaluation 59
Variable Elimination Codeby David Poole and Jacek Kisyński
•This is an implementation of variable elimination in Java 1.5 (without threads).
•We wanted to test how well our base VE system that we were using compared with other systems.
•We use the min-fill heuristic for the elimination orderings and 2GB of memory.
7/14/2006 UAI06-Inference Evaluation 60
The most interesting part of the implementation is in the representation of factors:
• A factor is essentially a list of variables, and a one-dimensional array of values.
• There is a total order of all variables and a total ordering of all values, which gives a canonical order of the values in a factor.
• We can multiply factors and sum out variables without doing random access to the values, but rather using the canonical ordering to enumerate the values.
• Multiplication is done lazily.
• We can sum out multiple variables at once (but this was not actually done for this competition).
7/14/2006 UAI06-Inference Evaluation 61
• This code was written for David Poole and Nevin Lianwen Zhang, ``Exploiting contextual independence in probabilistic inference'', Journal of Artificial Intelligence Research,18, 263-313, 2003. http://www.jair.org/papers/paper1122.html
• This is also the code that is used in the CIspace belief and decision network applet. A new version of the applet will be released in July. See: http://www.cs.ubc.ca/labs/lci/CIspace/
• We plan to release the VE code as open source.
7/14/2006 UAI06-Inference Evaluation Page 62
Team 4: U. Pitt/DSL Technical Description
• presented by Jeff Bilmes (Marek Druzdzel was unable to attend).
UAI Software Competition
Decision Systems LaboratoryUniversity of Pittsburgh
http://dsl.sis.pitt.edu/
UAI-06 Software Evaluation
UAI Competition: Sources of speedupUAI Competition: Sources of speedup
1. Clustering algorithm at the foundation of the program [Lauritzen & Spiegelhalter] (Pr(E) as the normalizing factor).
2. Relevance reasoning, based on conditional independence [Dawid 1979, Geiger 1990], as structured in [Suermondt 1992] and [Druzdzel 1992], summarized in [Druzdzel & Suermondt 1994].
3. For very large models: Relevance-based Decomposition [Lin & Druzdzel 1997] and Relevance-based Incremental Belief Updating [Lin & Druzdzel 1999].
Full references are included in GeNIe on-line help, http://genie.sis.pitt.edu/.
Good theory (in addition to good implementation)
Good engineering (Tomek Sowinski).• Efficient and reliable implementation in C++ (SMILE)
• Tested by over eight years of both academic and industrial use
Relevance steps:
1. In p(E), focusing inference on the evidence nodes
2. Removal of barren nodes
3. Evidence absorption
4. Removal of nuisance nodes
5. Reuse of valid posteriors
UAI-06 Software Evaluation
bn_18 (diagnosis)
Relevance
Triangulation
Find Hosts
Init Potentials
Collect
Distribute
Other
Where our program spent the most time?Where our program spent the most time?
bn_22 (speech dbn)
Relevance
Triangulation
Find Hosts
Init Potentials
Collect
Distribute
Other
bn_82 (cpcs)
Relevance
Triangulation
Find Hosts
Init Potentials
Collect
Distribute
Other
bn_94 (random)
Relevance
Triangulation
Find Hosts
Init Potentials
Collect
Distribute
Other
38x
1,046x
1x
∞
Speedup due to relevance
Eas
y p
rob
lem
s
Har
d p
rob
lem
s
UAI-06 Software Evaluation
Model developer module: GeNIe.
Implemented in Visual C++ in Windows environment.
GeNIe
GeNIeRate
SMILE.NET
Wrappers: SMILE.NET jSMILE, Pocket SMILE
Allow SMILE to be accessed from applications other than C++compiler
jSMILE
Pocket SMILE
Broader context: GeNIe and SMILE
Broader context: GeNIe and SMILE
A developer’s environment for graphical decision models(http://genie.sis.pitt.edu/).
Reasoning engine: SMILE (Structural Modeling, Inference, and Learning Engine).
A platform independent library of C++ classes for graphical models.
SMILE
SMiner
Learning and discovery module: SMiner
Support for model building: ImaGeNIe
ImaGeNIe
Diagnosis: Diagnosis
Diagnosis
Qualitative interface: QGeNIe
UAI Software Competition
UAI Competition: Sources of speedupUAI Competition: Sources of speedup
1. Clustering algorithm at the foundation of the program [Lauritzen & Spiegelhalter] (Pr(E) as the normalizing factor).
2. Relevance reasoning, based on conditional independence [Dawid 1979, Geiger et al, 1990], structured in [Suermondt 1992] and [Druzdzel 1992], summarized in [Druzdzel & Suermondt 1994].
3. For very large models: Relevance-based Decomposition [Lin & Druzdzel 1997] and Relevance-based Incremental Belief Updating [Lin & Druzdzel 1999].
Full references are included in GeNIe on-line help, http://genie.sis.pitt.edu/.
Good theory rather than engineering tricks
Top research programmer (Tomek Sowinski).
Efficient and reliable implementation
in C++ (SMILE).
UAI Software Competition
Model developer module: GeNIe.
Implemented in Visual C++ in Windows environment.
GeNIe
GeNIeRate
SMILE.NET
Wrappers: SMILE.NET jSMILE, Pocket SMILE
Allow SMILE to be accessed from applications other than C++compiler
jSMILE
Pocket SMILE
Broader Context: GeNIe and SMILE
Broader Context: GeNIe and SMILE
A developer’s environment for graphical decision models(http://genie.sis.pitt.edu/).
Reasoning engine: SMILE (Structural Modeling, Inference, and Learning Engine).
A platform independent library of C++ classes for graphical models.
SMILE
SMiner
Learning and discovery module: SMiner
Support for model building: ImaGeNIe
ImaGeNIe
Diagnosis: Diagnosis
Diagnosis
Qualitative interface: QGeNIe
7/14/2006 UAI06-Inference Evaluation Page 69
Team 5: UCI Technical Description
• presented by Rina Dechter
7/14/2006
UAI06-Inference Evaluation
70
PE & MPE – AND/OR Search
0
A
B
0
E C
0 1
D
0 1
D
0 1 0 1
1
E C
0 1
D
0 1
D
0 1 0 1
1
B
0
E C
0 1
D
0 1
D
0 1 0 1
1
E C
0 1
D
0 1
D
0 1 0 1
Bayesian network
A
D
B
CE
[ ]
[A]
[AB]
[BC]
[AB]
A
D
B C
E
0
A
B
0
E C
0 1
D
0 1
D
0 1 0 1
1
E C
0 1
D
0 1
D
0 1 0 1
1
B
0
E C
0 1 0 1
1
E C
0 1 0 1
Pseudo tree
AND/OR search tree Context minimal graph
7/14/2006
UAI06-Inference Evaluation
71
Adaptive caching
context(X) = [X1X2… Xk-i Xk-i+1… Xk]
i-bound < k
i-context(X) = [Xk-i+1…Xk] in conditioned subproblem
i-cache for X is purged for every new instantiation of Xk-i
X1
XK-i
X
XK
XK-i+1
X1
XK-i
X
XK
XK-i+1 ]
7/14/2006
UAI06-Inference Evaluation
72
PE solver - implementation
• C++ implementation• Caching based on context (table caching)
– Adaptive caching when contexts are too large
• Switch to variable elimination for small and nondeterministic problems
• Constraint propagation• No good learning – just caching no goods• Dynamic range support (for very small
probabilities)
7/14/2006
UAI06-Inference Evaluation
73
MPE solver - AOMB(i,j)
Node value v(n): most probable explanation of the sub-problem rooted by n
Caching: identical sub-problems rooted at AND nodes (identified by their contexts) are solved once and the results cached
j-bound (context size) controls the memory used for caching
Heuristics: pruning is based on heuristics estimates which are pre-computed by bounded inference (i.e. mini-bucket approximation)
i-bound (mini-bucket size) controls the accuracy of the heuristic
No constraint propagation
7/14/2006
UAI06-Inference Evaluation
74
AOMB(i,j) – Mini-Bucket Heuristics
• Each node n has a static heuristic estimate h(n) of v(n)
– h(n) is an upper bound on the value v(n)– h(h) is computed based on the augmented bucket
structure generated by the mini-bucket approximation MBE(i)
• For every node n in the AND/OR search graph:– lb(n) – current best solution cost rooted at n– ub(n) – upper bound on the most probable explanation at n– Prune the search space below current node t if ub(m) < lb(m), where m is an
ancestor of t along the current path from the root
• During search, merge nodes based on context (caching); maintain cache tables of size O(exp(j)), where j is a bound on the size of the context.
7/14/2006
UAI06-Inference Evaluation
75
AOMB(i,j) – Implementation• C++ implementation• B&B procedure is recursive
– Could be a bit faster if we simulate the stack• Cache tables implemented as hash tables• No ASM code or other optimizations• Static variable ordering determined by the min-fill
ordering (minimizes the context size)• Choosing (i,j) parameters:
– i-bound: choose i such that the augmented bucket structure generated by MBE(i) fits 2GB of RAM (i < 22)
– j-bound: j = i + 0.75*i (j < 30)• No Constraint propagation
7/14/2006
UAI06-Inference Evaluation
76
References
• 1. Rina Dechter and Robert Mateescu. AND/OR Search Spaces for Grahpical Models. In Artificial Intelligence, 2006. To appear.
• 2. Rina Dechter and Robert Mateescu. Mixtures of Deterministic-Probabilistic Networks and their AND/OR Search Space. In proceedings of UAI-04, Banff, Canada.
• 3. Robert Mateescu and Rina Dechter. AND/OR Cutset Conditioning. In proceedings of IJCAI-05, Edinburgh, Scotland.
• 4. Radu Marinescu and Rina Dechter. Memory Intensive Branch-and-Bound Search for Graphical Models. In proceedings of AAAI-06, Boston, USA.
• 5. Radu Marinescu and Rina Dechter. AND/OR Branch-and-Bound for Graphical Models“. In proceedings of IJCAI-05, Edinburgh, Scotland
7/14/2006 UAI06-Inference Evaluation Page 77
Conclusions
7/14/2006 UAI06-Inference Evaluation Page 78
Conclusions and Discussion
• Most teams said they had fun and it was a learning experience – people also became somewhat competitive
• Teams that used C++ (teams 4-5) arguably had faster times than those who used Java (teams 1-3).
• Use harder BNs and or harder queries next year– hard to find real-world BNs that are easily available but that are hard.
If you have a BN that is hard, please make it available for next year.– Regardless of who runs it next year, please send candidate networks
directly to me for now <[email protected]>
• Provide better resolution and standardized/unified timer (use IPM package).
• Have a dynamic category (needs to be more interest).• Have an approximate inference category, look at
time/space/accuracy tradeoffs