david mackay’s wooden blocks - stellenbosch university€¦ · 1 john skilling...

1

John Skilling ([email protected]) MaxEnt2016 Ghent

David MacKay’s wooden blocks

This talk is dedicated to the memory

of Professor Sir David MacKay FRS.

2

How many “crystal” arrangements are there?

⌦12 = 53060477521960000

⌦4 = 36

⌦2 = 2

⌦30 = 131841545472244027406496188757912375363891696443221279694626947912188459956437700105571773334900360294912000000 ⇡ 10110

Asymptotically, expect ⌦ to be multiplicative,

so log(⌦) will be additive: ⌦ ⇠ exp(size).

Get ⌦ ⇡ exp(0.5829⇥ 12n

2).

⌦(A [B) ⇠ ⌦(A)⌦(B)

⌦(A) ⌦(B)

David wants his son wants to pack

12n

2blocks of size 2⇥1 into a square n⇥n box.

3

Start with states of n⇥n model.

Each of N = 2n(n�1) bonds can be ON or OFF.

Define valency V = # ON bonds at node.

Define energy

E =

X

nodes

|V � 1|

Start with unrestricted bonds, 2

Npossibilities.

. . . Compress . . .

End with E=0 crystals, ⌦n possibilities ⌧ 2

N.

Get ⌦ by computing

proportion X = ⌦n/2N

of crystals.

(X⇤ = proportion of states with E E⇤ :

the computer does not “know” about size 2

N.)

0

4

3

2

2 2

11

1

11

1 1 1

1

0

11

1

11

1 1 1

1

1 1

1 11

1

1

16777216 models

36 crystals

Strategy

4

Energy histogram for 4⇥4 box.

36 crystals

16777216 models

Compression by ÷12 million for 4⇥4.

Compression needs guidance — control by energy.

Use random samples, limited by energy.

# states Entropy Information

Models: 2

N= 16777216 S = log(2

N) I = 0

Crystals: ⌦ = 36 S = log⌦ I = log(2

N/⌦)

5

0

4

3

2

2 2

11

1

11

1 1 1

1

0 ON

0

4

3

2

2 2

11

11

1 1 1

1

1 2OFF

02

2 2

11

11

1 1 1

1

1

3

2

2

OFFON

MCMC exploration

Best moves usually small and simple. Here, random bonds were invertedON/OFF.

6

0

4

3

2

2 2

11

1

11

1 1 1

1

0 ON

0

4

3

2

2 2

11

11

1 1 1

1

1 2OFF

02

2 2

11

11

1 1 1

1

1

3

2

2

OFFON

MCMC exploration

Best moves usually small and simple. Here, random bonds were invertedON/OFF.

Random model Ordered crystal

E = 148 E = 48 E = 2 E = 0Dislocations

Program tested up to 2000⇥2000 (2 million blocks and 2

(8 million)

model states).

Program limited by annihilation of last pair of dislocations.

For each crystal, last pair could be in O(N2

) places:

“Flagpole in Atlantic” problem — solutions wanted!

7

First (random) sample at E1 = 14.

Proportion u1 ⇠ Uniform(0, 1) lies inside.

X1 = u1

A compressive run

8



X1 = u1

Discard E > 14.

A compressive run

9



X1 = u1

Second (E 14) sample at E2 = 10.


X2 = X1u2

Discard E > 14.

A compressive run

10



X1 = u1


Proportion u2 ⇠ Uniform(0, 1) lies inside. Discard E > 10.

X2 = X1u2

Discard E > 14.

Discard E > 10.

A compressive run

11



X1 = u1



Third (E 10) sample at E3 = 8.


X3 = X2u3

X2 = X1u2



X1 = u1



X2 = X1u2

Discard E > 14.

Discard E > 10.

A compressive run

12



X1 = u1

Discard E > 14.



Third (E 10) sample at E3 = 8.


X3 = X2u3

X2 = X1u2

Discard E > 8.

A compressive run

13

X3 = X2u3 and X 03 = X2u0

3

Compression reaches

ˆX3 = X2 min(u3, u03)

Next (E 8) sample also at E3 = 8.

Another proportion u03 ⇠ Uniform(0, 1) lies inside.

Discard E > 8.

14

X3 = X2u3 and X 03 = X2u0

3

Compression reaches

ˆX3 = X2 min(u3, u03)



Another (E 8) sample at E3 = 8.

Yet another proportion u003 ⇠ Uniform(0, 1) lies inside.

X3 = X2u3 and X 03 = X2u0

3 and X 003 = X2u00

3

Compression reaches

ˆX3 = X2 min(u3, u03, u

003)

Discard E > 8.

Discard E > 8.

15

X3 = X2u3 and X 03 = X2u0

3

Compression reaches

ˆX3 = X2 min(u3, u03)





X3 = X2u3 and X 03 = X2u0

3 and X 003 = X2u00

3

Compression reaches

ˆX3 = X2 min(u3, u03, u

003)

Finally, a sample reaches lower to E = 6.


X4 = X̂3u4

Discard E > 8.

Discard E > 8.

Discard E > 8.

16

X3 = X2u3 and X 03 = X2u0

3

Compression reaches

ˆX3 = X2 min(u3, u03)





X3 = X2u3 and X 03 = X2u0

3 and X 003 = X2u00

3

Compression reaches

ˆX3 = X2 min(u3, u03, u

003)



X4 = X̂3u4Discard E > 6.

Discard E > 8.

Discard E > 8.

17

X3 = X2u3 and X 03 = X2u0

3

Compression reaches

ˆX3 = X2 min(u3, u03)





X3 = X2u3 and X 03 = X2u0

3 and X 003 = X2u00

3

Compression reaches

ˆX3 = X2 min(u3, u03, u

003)



X4 = X̂3u4

More samples arrive at E = 6.

Discard E > 8.

Discard E > 8.

Discard E > 6.

Compression reaches

ˆX4 =

ˆX3 min(u4, u04, . . . , u

00004 )

18

Complete run:

71 2112

3

1 1

1 sample at E=14: X = u1

1 sample at E=10: X = u1u2

3 samples at E=8: X = u1u2 min(u3, u03, u

003| {z }

3

)


003| {z }

3

)min(u4, u04, . . . , u

00004| {z }

12

)


003| {z }

3

)min(u4, u04, . . . , u

00004| {z }

12

)min(u5, u05, . . . , u

00005| {z }

21

)


003| {z }

3

)min(u4, u04, . . . , u

00004| {z }

12

)min(u5, u05, . . . , u

00005| {z }

21

)min(u6, u06, . . . , u

00006| {z }

71

)

1 sample at E=0: end

Can infer X by simulation (every u ⇠ Uniform(0, 1)),, but prob(X) is badly skew (X ⌧ 1 but X 6< 0).

logmin(u, u0, . . . , u0000)| {z }

k

) =

kX

i=1

1

i±

kX

i=1

1

i2

!1/2

Use logX instead, which is additive so has meaningful moments.

Get logX = �15.43± 2.86.

Truth is X = 36/16777216 = exp(�13.05) = 2.15⇥ 10

�6. X Without logs, X = (3± 13)⇥ 10

�6. ⇥

19

1 sample at E=14: logX = �1


3 samples at E=8: logX = �3.83





0 2 4 6 8 10 12 14 16

E

logX

0

�5

�10

�15

••

•

•

•

•

Density of states

E1

E2

E3

E4

E5

E6

20








0 2 4 6 8 10 12 14 16

E

logX

0

�5

�10

�15

••

•

•

•

•

�X1

�X2

�X3

�X4

�X5

�X6

E1

E2

E3

E4

E5

E6

Associate each E with its associated compressive range �X.

g(E) = proportion of models per unit energy =

X

i

�(E � Ei)�Xi

= “density of states”

Density of states

We get full relationship between E and X, not just final compression Xend.

This compressive algorithm is called nested sampling.

21

For “model” and “crystal”, read “prior” and “posterior”.

For “energy �E”, read “� logL” where L is likelihood.

For “partition function Z”, read “evidence Z”.

For “X”, read “enclosed proportion of prior”.

Bayesian inference is the elementary special case of unit temperature.

Nested sampling gets sequence (X1, L1), (X2, L2), . . .

Prior⇥ Likelihood = Evidence⇥ Posterior

Bayes

Posterior Pr(i) = Li �Xi/Z.Evidence Z =P

Li �Xi.

22

Physics

Physical applications have meaningful energy and often involve temperature T = 1/�.

Occupancies are modulated proportionally to e��E , and physical properties follow.

Partition function: Z =

Ze��Eg(E)dE evaluated as Z =

Xe��Ei

�Xi

Internal energy: U =

RE e��Eg(E)dERe��Eg(E)dE

⌘ hEi� evaluated as U =

PEi e��Ei�XiPe��Ei�Xi

Specific heat: C =dU

dTetc.

Note that

R. . . e��EdX is a Laplace transform which smooths the operand.

micro-property(E) �� Laplace

transform

��! Macro-property(T )

signal + noise| {z }Energy

�� Laplace

transform

��! smooth property| {z }Temperature

��Thermal

algorithms

��! signal + noise| {z }Temperature

= noisy property| {z }Temperature

Contrast thermal algorithms (annealing) which control on temperature.

Nested sampling controls on energy, which is the right way round.

23

Finale

Wooden blocks would make a great student assignment.

The problem is simply understood and visual, but captures the essence of a real problem.

It’s concerned merely with counting, with no confusion about meaning or philosophy.

Exact results are available for small sizes.

The exercise is reasonably challenging.

It’s open-ended (di↵erent exploration engines, algorithm e�ciency, 3 dimensions, . . . )

The student who can do this will be well placed to program Bayesian inference professionally.

And David would surely have approved with enthusiasm.

david mackay’s wooden blocks - stellenbosch university€¦ · 1 john skilling...

Documents