inside-out: characterisation of computed tomography noise ...sip/miniproject1/report_oxwasp1.pdf ·...

74
Inside-Out: Characterisation of Computed Tomography Noise in Projection and Image Space with Applications to 3D Printing OxWaSP 2015-16 University of Warwick Cohort Mini-Project 1 Report Sherman Ip Supervisor: Dr. J. Brettschneider (Statistics, Warwick) Supervisor: Prof. T. Nichols (Statistics, Warwick) 10th June 2016

Upload: votruc

Post on 04-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Inside-Out: Characterisation ofComputed Tomography Noise inProjection and Image Space with

Applications to 3D Printing

OxWaSP 2015-16 University of Warwick Cohort

Mini-Project 1 Report

Sherman Ip

Supervisor: Dr. J. Brettschneider (Statistics, Warwick)

Supervisor: Prof. T. Nichols (Statistics, Warwick)

10th June 2016

Page 2: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Abstract

X-ray computed tomography can be used to do quality control on 3D printedsamples. However there are sources of error in the 3D printing, how thephotons behave and in the X-ray detector. This project aims to find a rela-tionship between the sample mean and sample variance grey values in imagesobtained from the X-ray detector by fitting linear regressions. In addition,latent variable models such as principle component analysis, factor analysisand the compound Poisson were attempted to be fitted to find sources ofvariance.

Page 3: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Acknowledgements

• Supervisors: Julia Brettschneider and Tom Nichols

• Inside Out Team: Wilfrid Kendall, Audrey Kueh, Jay Warnett andClair Barnes

• OxWaSP 2015 Cohort: Nathan Cunningham, Giuseppe di Benedetto,Beniamino Hadj-Amar, Jack Jewson, Ella Kaye, Leon Law, KasparMartens, Marcin Mider, Xenia Miscouridou, Paul Vanetti and AndiWang.

• EPSRC Funding: EP/L016710/1

Page 4: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Contents

1 Introduction 3

2 Literature Review 42.1 X-Ray Production . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Photon Interactions . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Detection and Reconstruction . . . . . . . . . . . . . . . . . . 8

3 About the Data 93.1 Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Mean and Variance Relationship 154.1 Weighted Least Squares . . . . . . . . . . . . . . . . . . . . . 15

4.1.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.1.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 194.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Resampling Weighted Least Squares . . . . . . . . . . . . . . . 214.2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3 Mixture of Linear Regression . . . . . . . . . . . . . . . . . . . 224.3.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.4 Distance Weighted Least Squares . . . . . . . . . . . . . . . . 264.4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 264.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1

Page 5: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

5 Latent Variable Model 295.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . 29

5.1.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.1.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 315.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2 Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.3 Compound Poisson . . . . . . . . . . . . . . . . . . . . . . . . 365.3.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.3.2 Toy Example . . . . . . . . . . . . . . . . . . . . . . . 44

6 Conclusion 476.1 Mean and Variance Relationship . . . . . . . . . . . . . . . . . 476.2 Latent Variable Model . . . . . . . . . . . . . . . . . . . . . . 48

A Source Code 49

B Proofs and Identities 50B.1 Matrix Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . 50B.2 M Step for Mixture of Linear Regression . . . . . . . . . . . . 51B.3 M Step for Factor Analysis . . . . . . . . . . . . . . . . . . . . 53B.4 Obtaining a One-Layer Compound Poisson Model . . . . . . . 57B.5 Joint Compound Poisson is in the Exponential Family . . . . . 61B.6 M Step for Compound Poisson . . . . . . . . . . . . . . . . . . 62B.7 Cramer-Rao Lower Bounds for the Compound Poisson . . . . 65

Bibliography 68

2

Page 6: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Chapter 1

Introduction

3D printing, formally additive manufacturing [1], has recently become stateof the art technology which manufactures objects of complicated shapes withvery high precision. It was developed in the 1980’s [2] but recently it hasbeen commercialised and is used in medical science [3] and engineering [1].Because of such a wide range of applications, there has been a need for qualitycontrol and this can be done by scanning the 3D printed sample using X-raycomputed tomography.

The Inside-Out group at the University of Warwick aims to develop sta-tistical methodology for using X-ray computed tomography to do qualitycontrol on 3D printing. Recent research include how the penumbra can bemodelled [4], simulating and detecting defects and on the behaviour of deadpixels [5].

Images of X-ray scans of a stationary 3D printed sample were obtained forthis project. The aim of this specific project was to find a mean and variancerelationship for the grey values. This was done by fitting linear regressions.Separately, latent variable models were fitted to the images to find sourcesof variance. Latent variable models included principle component analysis,factor analysis and compound Poisson.

This report was written in a Master dissertation style and assumes knowl-edge up to graduate level in physics and statistics. The mathematics andstatistics used can be reviewed in undergraduate textbooks [6] [7], howevercomplicated matrix calculus results will be stated without proof in AppendixB.1.

3

Page 7: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Chapter 2

Literature Review

Computed tomography (CT) scanning is a 3D imaging technique. It doesthis by reconstructing the geometry of the sample through a series of 2DX-ray images of the sample. The sample rotates after each image taken [8].

Figure 2.1 shows a diagram on how CT scanning works. A 2D image istaken by projecting X-ray photons onto the stationary sample. The photonsare then scattered or absobred by the sample. Some of these photons arethen detected by an X-ray detector on the other side of the sample, whichproduces an image. After an image has been taken, the object rotates andanother image is taken. Finally after a number of images, a 3D reconstructionof the object can be estimated [8].

CT scanning was invented by G. Hounsfield [9] in the 1980’s and it wasmainly used for medical imaging. The setup for CT scanning is differentwhen scanning patients because the detector and X-ray source rotates aroundthe patient [8]. Recently CT has been used industrially for non-destructivetesting in manufacturing [8]. One possible application would be inspecting3D printed samples.

There are many sources of error in CT scanning [8] and this can causeproblems when reconstructing the geometry of the sample. Sources of errorinclude: defects in the detector, environmental noise and the behaviour ofphotons.

This chapter aims to give brief description on how CT scanning works.

4

Page 8: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Figure 2.1: X-ray computed tomography reconstructs the sample by project-ing photons onto a rotating sample. The photons are then detected by thedetector. Source: http: // www. phoenix-xray. com/

2.1 X-Ray Production

Photons in CT scanning are produced in an X-ray tube. In an X-ray tube, acathode, consisting of a heated filament, fires projectile electrons through anelectric potential to a target which forms the anode [10], as shown in Figure2.2. Most of the kinetic energy of the projectile electrons is converted intoheat however some is converted into electromagnetic radiation. This dependson how the projectile electrons interact with the atoms in the anode [8].

Bremsstrahlung radiation is the result of projectile electrons deacceler-ating due to the electrostatic field produced by nucleus of the target. Thekinetic energy of the projectile electrons is then converted to electromag-netic radiation to produce X-ray radiation. As a result, the photon energiesin bremsstrahlung radiation is a continuous spectrum and can range up tothe maximum kinetic energy of the projectile electrons [10].

Characteristic radiation is due to projectile electrons colliding with elec-trons in the target atom and ionizing them. This produce vacancies in theelectron shell and emits photons when the electrons in the target atom dropsdown back to the ground state. The energy of the emitted radiation is mo-noenergetic and depends on the binding energy of the target’s atoms [10].

A typical energy spectrum of photons emitted from an X-ray tube is as

5

Page 9: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Figure 2.2: An X-ray tube produces photons by firing projectile electronsfrom a cathode to an anode. Source: G. Michael (2001) [10]

shown in Figure 2.3. The energy spectrum consist of both bremsstrahlungand characteristic radiation [10].

The voltage and current can be varied in the X-ray tube to producedifferent energy spectrums and rate of photon production. This can vary theresults produced when collecting CT data [8]. Another important factor isthe focal spot size because smaller spot sizes produce sharper edges. Largerspot sizes produce unsharp results and this is know as the penumbra effect, asshown in Figure 2.4. However spot sizes too small can produce concentratedheat [11] and can damage the X-ray tube.

2.2 Photon Interactions

Photons emitted by the X-ray tube are projected onto the sample and inter-acts with it in a number of ways [8].

The sample can effectively absorb photons via the photoelectric effector pair production [8]. In the photoelectric effect, the photons transfersall its energy to a bounded electron and ejects it from the sample’s atom[12]. In pair production, the photons convert into electron-position pairs byinteracting with the Coulomb field of the sample’s atomic nucleus [13]. In

6

Page 10: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Figure 2.3: A typical energy spectrum of photons emitted from an X-raytube. The continuous spectrum is the result of Bremmsstrahlung radiation.The peaks are the result of characteristic radiation. Source: G. Michael(2001) [10]

Figure 2.4: Larger focal spot sizes produces unsharp results. This is know asthe penumbra effect. Source: F. Welkenhuyzen et al. (2009)[11]

7

Page 11: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

addition to getting absorbed, photons can be scattered by the sample. Thishappens when photons collide inelastically with and transfers its energy tothe sample’s electrons. This process is known as Compton scattering [14].

Suppose a mono-energetic X-ray pencil beam attenuating through an ob-ject with varying attenuation coefficient in position µ = µ(x). The beamstarts at x = 0 and is detected at x = L, then the attenuation (decrease inX-ray intensity from I0 to I1) is given as [8]

I1 = I0 exp

[∫ L

0

−µ(x)dx

]. (2.1)

By comparing the intensity before and after attenuation, the integral of theattenuation coefficient along the path of the X-ray can be calculated. Becauseof the discrete nature of pixels, the integral is usually replaced by a sum [10].

However it was shown that the attenuation coefficient does depend on theenergy of the photons [15]. Thus µ = µ(x,E) should be made dependent onthe energy of the photons [8]. In general low energy photons are more likelyto be absorbed than high energy photons, this increases the average energyof the attenuated photons and can be a source of error in CT scanning. Thisis referred to beam hardening and can cause inaccuracies in Equation (2.1)[10]. This can be reduced by placing a thin sheet of filter to absorb lowenergy photons [11] or by correcting it in the data analysis stage [10].

2.3 Detection and Reconstruction

Most X-ray detectors are scintillator-photodiode detectors. The photons in-teract with the scintiallator material and produce visible light. The visiblelight is then detected by photodiodes and coverts it into electricical current[10]. The detectors used in CT scanning are flat bed scanner which consistof an array of panels of photodiodes [8].

The scale of the 2D images can be calibrated with the use of referencestandards [16]. The methods used to reconstruct the 3D geometry of thesample is called the filtered back-projection [10]. The mathematics can bereviewed in [17] and recent methods consider polyenergetic photons [15].

8

Page 12: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Chapter 3

About the Data

A dataset was given for this project. 100 images of a stationary 3D printedcuboid, with its vertices in the middle, were taken using a Perkin Elmer XRD1621 digital X-ray detector. These greyscale images were 1 996×1 996 pixelsin size and have a colour depth of 16 bits. This means each pixel can havegrey values which take integer values from 0 to 216−1 inclusive. In addition,these images were calibrated so that the air has grey value of 60 000. Asa result, the units of the grey values have arbitrary units. Unfortunately ascale, in terms of distance, was not recorded. A sample of the images can beviewed in Figure 3.1.

Figure 3.1: The 1st, 50th and 99th image in the dataset of X-ray images ofa 3D printed cuboid.

The X-ray tube was stated to have the following properties, with thenumber of significant figures stated as given and no error bars given:

9

Page 13: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

• Voltage = V = 100 kV

• Power = P = 33.0 W

• Exposure time = τ = 500 ms

• Target material is tungsten.

From this, some additional properties of the X-ray tube was estimated. It isgiven that the efficiency of photon production η is [10]

η = aV Z = 8.1× 10−3 (3.1)

where a = 1.1× 10−9 V−1 is a constant and Z = 74 is the atomic number oftungsten. The rate of photon production ν and energy per photon E can becalculated

ν =ηP

V e= 1.7× 1013 s−1 (3.2)

E = V e = 1.0× 102 keV (3.3)

where e is the charge per electron.

3.1 Histogram

The distribution of all the grey values from all pixels in the dataset is shown inFigure 3.2. It was observed that there were 3 distinct peaks in the histogramat grey values:

• ∼ 2.4× 104 arb. unit

• ∼ 4.1× 104 arb. unit

• ∼ 5.0× 104 arb. unit.

By doing a threshold, as shown in Figure 3.3, it was clear that the 3 mainsources for the peaks in the histogram were the 3D printed sample, thebackground and the foam holder at the bottom of the image.

10

Page 14: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Grey values (arb. unit)×10 4

1 2 3 4 5 6 7

Fre

quency d

ensity

(arb

. unit

-1)

×10 4

0

0.5

1

1.5

2

2.5

3

Figure 3.2: Histogram of all the grey values from all the pixels in the 100images from the dataset. The grey values ranged from 18 286 to 60 045.

(a) Threshold pixels (b) Histogram

Figure 3.3: A threshold was selected to highlight pixels with certain greyvalues. Blue: Pixels with grey values less than 34 765. Green: Pixels greyvalues more than 40 269.

11

Page 15: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

3.2 Normality

For a given pixel, a Normal distribution was fitted on the 100 grey values(one from each image) and the χ2 goodness of fit test was conducted. Thiswas done for all 1 9962 pixels as shown in Figure 3.4. When consideringthe Bonferoni correction [18] for multiple hypothesis testing, the Normaldistribution appeared to be a good fit because only 9 pixels have p valuesless than 10%. In addition, these significant pixels did not seem to clusterand appeared random in space. When ignoring the Bonferoni correction,there was a suspicious patch of low p values on the bottom right. It shouldbe reasonable to fit models where the grey values are Normally distributed,however one should be dubious if unusual results were obtained from thebottom right of the images.

3.3 Autocorrelation

The sample mean and standard error grey value for each of the 100 imageswere taken and plotted as a time series as shown in Figure 3.5. There wasevidence of dependence between samples by looking at the sample autocorre-lation plots as shown in Figure 3.6. Because there was a peak at lag 1 in thesample partial autocorrelation plot, the time series could be modelled usingan autoregressive model with one parameter. Despite this, the images wereassumed to be i.i.d. in this project.

12

Page 16: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a) p values

-25

-20

-15

-10

-5

(b) log10 p values

Figure 3.4: p values from the χ2 goodness of fit test on fitting the Normaldistribution on the grey values for each pixel. In b) circled are pixels withp values less than 10%, corrected for multiple hypothesis testing using theBonferroni correction.

13

Page 17: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Observation number

10 20 30 40 50 60 70 80 90 100

Mean g

rey v

alu

e (

arb

. unit)

×10 4

3.81

3.82

3.83

3.84

3.85

3.86

3.87

3.88

3.89

3.9

Figure 3.5: The sample mean and standard error grey value for each of the100 images.

Lag

0 5 10 15 20

Sam

ple

Auto

corr

ela

tion

-0.2

0

0.2

0.4

0.6

0.8

1

(a) Sample autocorrelation

Lag

0 5 10 15 20

Sam

ple

Part

ial A

uto

corr

ela

tions

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

(b) Sample partial autocorrelation

Figure 3.6: Sample autocorrelation and partial autocorrelation of the meangrey value for each image as a time series.

14

Page 18: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Chapter 4

Mean and VarianceRelationship

The aim of this chapter was to do supervised learning on predicting thevariance of a pixel’s grey value given its sample mean. This was done byfitting a linear regression to obtain an explanatory model rather than usingout of the box machine learning algorithms. Once a good model was found,a single CT scan can be used to quantify the uncertainty on each pixel’s greyvalue. This uncertainty then can be carried forward when doing volumetricreconstruction.

Figure 4.1 shows the sample variance and sample mean grey value for eachpixel. It can be seen that for sample means of (4.5 ± 0.5) × 104, the rangeof sample variances increased. In addition, there were a few outliers withextremely large sample variances, one example can be seen in Figure 4.2. Totake into consideration outliers, a simple weighted least squares regressionwas done to give less weight to outliers.

4.1 Weighted Least Squares

4.1.1 Theory

The sample variance is a random variable, thus itself has uncertainty. Weight-ed least squares was used to take into consideration the uncertainty of thesample variance. This was done by giving low weights to data points withhigh uncertainty on the grey value sample variance.

15

Page 19: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Sample grey value mean (arb. unit)×10 4

2 2.5 3 3.5 4 4.5 5

Sam

ple

gre

y v

alu

e v

ariance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

Figure 4.1: Frequency density plot of the sample variance against samplemean grey value for all 1 9962 pixels.

Sample mean (arb. unit) ×10 4

2 3 4 5 6

Sam

ple

variance

(arb

. unit

2)

×10 5

0

1

2

3

4

(a) Sample of 7 968 points

Sample mean (arb. unit) ×10 4

2 3 4 5 6

Sam

ple

variance

(arb

. unit

2)

×10 5

0

2

4

6

8

10

12

14

(b) Another sample of 7 968 points

Figure 4.2: Scatter plots of the sample variance-mean grey value pairs for asample of pixels.

16

Page 20: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Let S2i and σ2

i be the sample variance and true variance, respectively, ofthe ith pixel’s grey value for i = 1, 2, ··· ,m and m = 1 9962. The samplevariance has a sampling distribution such that [7, pp. 195-198]

(n− 1)S2i

σ2i

∼ χ2n−1 (4.1)

where n = 100 is the number of samples used in estimating the variance.The variance is then

2(n− 1) =(n− 1)2

σ4i

Var(S2i

)thus

Var(S2i

)=

2σ4i

n− 1. (4.2)

A good weight for pixel i, wi, would be one which is proportional tothe precision, or reciprocal of the variance, of S2

i . σ2i is generally unknown

but can be estimated using S2i . By doing so, the weights for each sample

mean-variance pair was chosen to be

wi ∝1

(S2i )

2 . (4.3)

The weights can be introduced into the least squares problem. LetY1, Y2, ··· , Ym and x1, x2, ··· , xm be the sample variance and sample mean greyvalue respectively. Let x1,x2, ··· ,xm be the p length feature vectors of thesample mean grey values. For example the feature vectors can contain poly-nomials such that

xi =

1xix2i...

xp−1i

. (4.4)

Let Yi and xi have a linear relationship such that

Yi = (xi)T β + εi (4.5)

where β is a p length parameter vector and εi ∼ N(0, σ2ε ) for i.i.d. i =

1, 2, ··· ,m. The weighted least squares is an optimisation problem where the

17

Page 21: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

objective T is minimised by varying β

T =

∑mi=1wi

(Yi − (xi)

T β)2∑m

i=1wi. (4.6)

This can be expanded

T =

∑mi=1wi

(Yi − 2Yiβ

Txi + βTxi (xi)T β)

∑mi=1wi

.

Using the properties of the trace

T =

∑mi=1wi

(Yi − 2Yi Tr

(βTxi

)+ Tr

(βTxi (xi)

T β))

∑mi=1wi

the differential with respect to β, using the results in Appendix B.1, is

∇βT =−2∑m

i=1wiYixi + 2∑m

i=1wixi (xi)T β∑m

i=1wi. (4.7)

Setting the differential to 0, the value of β which minimises the objective is

β =

[m∑i=1

wixi (xi)T

]−1 m∑i=1

wixiYi . (4.8)

Assuming Y1, Y2, ··· , Ym are independent and have constant variance σ2ε ,

the covariance of β is

Σβ = Cov[β]

= σ2ε

[m∑i=1

wixi (xi)T

]−1 [ m∑i=1

wixi

][

m∑i=1

wi (xi)T

][m∑i=1

wixi (xi)T

]−1. (4.9)

σ2ε can be estimated using the weighted mean squared error

σ2ε =

∑mi=1wi

[Yi − (xi)

T β]2∑m

i=1wi. (4.10)

18

Page 22: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

After estimating β and σ2ε and given a data point xtest, the predicted grey

value variance isY = (xtest)

T β + εtest (4.11)

with expectation

E[Y]

= (xtest)T β (4.12)

and varianceVar

[Y]

= (xtest)T Σβxtest + σ2

ε (4.13)

where σ2ε can be used in the estimation of Var

[Y].

4.1.2 Methods

Polynomial features for the sample mean were constructed up to order k = 6.The weighted linear regression was then fitted. Before fitting the linear re-gression, the weights were normalized to have a maximum of 1. Furthermore,each feature was normalized to have mean 0 and standard deviation 1. Af-terwards, a constant of 1 was appended to each feature vector to model theintercept of the linear regression.

In addition, the BIC [19, p. 233] was calculated given as

BIC = m(ln 2π + ln σ2

ε + 1)

+ (k + 1) lnm (4.14)

for each polynomial order and repeated for 40 different bootstrapped sam-ples. A bootstrapped sample was obtained by randomly resampling m timeswithout replacement from the original dataset [20].

4.1.3 Results

The weighted least squares with polynomial features were fitted successfullyas shown in Figure 4.3. It appeared successful because the regression goesthrough the majority of the mean and variance pairs. It was unusual to notethat the prediction variance for the 6th order polynomial blew up, furtherinvestigation will be needed to explain such behaviour.

The BIC, as shown in Figure 4.4, favoured high order polynomials. Be-cause the BIC favoured more complicated models, this suggested that fittingpolynomial features was not a good model. The BIC aimed to seek moreand more polynomials so that the model becomes more similar to the truemodel.

19

Page 23: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

(a) Order 1

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

(b) Order 2

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

(c) Order 3

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

(d) Order 4

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

(e) Order 5

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

(f) Order 6

Figure 4.3: Frequency density of the sample mean-variance grey value pairs.Weighted least squares with different orders of polynomial features were fit-ted. The (red) dotted line represent the 95% prediction interval.

20

Page 24: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Polynomial order

1 2 3 4 5 6

BIC

(nat)

×10 7

9.23

9.24

9.25

9.26

9.27

9.28

9.29

Figure 4.4: BIC, for 40 different bootstrapped samples, for fitting weightedleast squares with polynomial features on the sample mean-variance pairs.

4.2 Resampling Weighted Least Squares

4.2.1 Methods

From the previous chapter, it was clear there were 3 materials, the 3D printedsample, background and foam, which contributed to the peaks in the his-togram of the grey values.

The sample mean-variance pairs were resampled from each of the 3 mate-rials, as shown in Figure 4.5. A 1st and 2nd order weighted linear regressionwere fitted for each material. In addition, the BIC was calculated whenfitting such weighted regression on 40 bootstrapped samples [20], for eachmaterial.

4.2.2 Results

It was clear the mean-variance relationship is material dependent, this canbe seen by the different gradients of the separate regressions in Figure 4.6.Table 4.1 showed that for the 3D printed sample and the foam, the BICfavoured higher order models. However for the background, when taking

21

Page 25: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Figure 4.5: Highlighted by a square, the resampled pixels from each material.Left: Sample from the background. Middle: Sample from the 3D printedsample. Right: Sample from the foam.

BIC Order 1 Order 2Sample (4.7421± 0.0003)× 107 (4.7413± 0.0003)× 107

Background (4.177± 0.001)× 106 (4.176± 0.001)× 106

Foam (1.5418± 0.0003)× 106 (1.5403± 0.0003)× 106

Table 4.1: The mean ± standard deviation of the BIC when fitting weightedleast squares with polynomial features on 40 bootstrap samples of the resam-pled mean-variance pairs for each material.

into consideration the uncertainty, the BIC can favour the 1st order model.It was unusual to see the prediction variance to blow for the 2nd order modelin the background and foam resampled data.

4.3 Mixture of Linear Regression

In order to take into consideration the different mean-variance relationshipfor different materials, a mixture of linear regression was tried to see if itcaptured the different behaviours.

22

Page 26: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

(a) Sample- Order 1

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

(b) Sample - Order 2

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

0.5

1

1.5

(c) Background - Order 1

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

0.5

1

1.5

(d) Background - Order 2

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-4

0

1

2

3

4

(e) Foam - Order 1

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-4

0

1

2

3

4

(f) Foam - Order 2

Figure 4.6: Frequency density of the sample mean-variance grey value pairsfor each resampled pixels. Weighted linear regression with polynomial fea-tures was fitted for each material. The (red) dotted line represent the 95%prediction interval.

23

Page 27: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

4.3.1 Theory

A mixture of linear regression was used of the form

Yi ∼

N(

(xi)T β1, σ

21

)if Si = 1

N(

(xi)T β2, σ

22

)if Si = 2

...

N(

(xi)T βq, σ

2q

)if Si = q

(4.15)

where Si is a latent variable, P (Si = r) = πr for i = 1, 2, ··· ,m and r =1, 2, ··· , q.

The likelihood is

L =m∏i=1

q∑j=1

pY |S (yi|Si = j) πj

thus the log likelihood is

lnL =m∑i=1

ln

[q∑j=1

pY |S (yi|Si = j)πj

]

lnL =m∑i=1

ln

q∑j=1

πj√2πσj

exp

−1

2

(yi − (xi)

T βjσj

)2 . (4.16)

Such a log likelihood can be maximised by using the EM algorithm [21], bytreating Si as a random latent variable [22, pp. 260-261]. The EM algo-rithm is an iterative algorithm conducting an E step and M step until someconvergence condition are met.

In the E step, the posterior distribution is used to put a prediction on thelatent variables Si for i = 1, 2, ··· ,m given estimates of the parameters βj, σ

2j

and πj for j = 1, 2, ··· , q. The posterior distribution is given as

ri,j = P (Si = j|Yi = yi) =pY |S (yi|Si = j) πj∑q

j′=1 pY |S (yi|Si = j′) πj′

24

Page 28: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

ri,j = P (Si = j|Yi = yi) =

πj√2πσj

exp

−12

(yi − (xi)

T βjσj

)2

∑qj′=1

πj′√2πσj′

exp

−12

(yi − (xi)

T βj′

σj′

)2

(4.17)and such a quantity is called the responsibility of the ith data belonging tothe jth model.

In the M step, the estimations of the parameters βj, σ2j and πj for

j = 1, 2, ··· , q are updated, using the maximum log likelihood, given theresponsibilities ri,j for i = 1, 2, ··· ,m. The regression parameters are updatedby

βj =

[m∑i=1

ri,jxi (xi)T

]−1 [ m∑i=1

ri,jyixi

](4.18)

and this is weighted least squares where the weights are the responsibilitieson model j. For the rest of the parameters

σ2j =

∑mi=1 ri,j

(yi − (xi)

T βj

)2∑mi=1 ri,j

(4.19)

and this is the weighted mean squared error. Finally

πj =m∑i=1

ri,jm

(4.20)

and this is the average responsibility. The proofs for the M step can be foundin Appendix B.2.

4.3.2 Methods

Only a mixture of 1st order linear regression was considered. The EM algo-rithm was initialised by assigning random responsibilities to all models anddata points in the E step. The number of mixture was set to q = 3 to see ifthis model captured the 3 materials. 50 EM steps were used.

To make the EM algorithm run properly without problems, a randomsample of 10 000 sample mean-variance pairs were used in fitting the mixturemodel.

25

Page 29: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

4.3.3 Results

It was verified that the EM algorithm always increased the log likelihood atevery EM step, as can be seen in Figure 4.7.

In Figure 4.8, the 3 regressions did not capture the 3 different materialsin the same way resampling weighted least squares did. This was becausethe mixture probabilities, π1, π2, π3, did not depend on the sample mean greyvalue and did not represent the fact that certain materials have certain greyvalues. For example, low grey values were more likely to be from the 3Dprinted sample.

4.4 Distance Weighted Least Squares

4.4.1 Methods

Weighted least squares was used where the weights depended on some dis-tance from some particular grey value. k = 3 independent weighted leastsquares were fitted where the weights are

wi = exp

[−1

2

(xi − µjσ

)2], (4.21)

µj was found using k-means [22, p. 446] in the sample-mean grey value spaceand σ is a predefined constant, called the Gaussian width.

4.4.2 Results

The fitted model captured the 3D printed sample and foam well becausethe regression expressed the shape of the histogram, as shown in Figure 4.9.However for the background, the regression had a negative gradient as aresult of being influenced by lower sample mean grey values from the foamand 3D printed sample. This did not capture the background in the sameway resampled weighted least squares did.

For higher Gaussian widths, the regression became an ordinary leastsquares regression because all the data points have approximately equalweights.

26

Page 30: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Number of EM steps

5 10 15 20 25 30 35 40 45 50

Log lik

elih

ood (

nat)

×10 4

-1.25

-1.2

-1.15

-1.1

-1.05

-1

-0.95

Figure 4.7: The log likelihood of the mixture of linear regressions at everyEM step for 10 different initial values.

Sample grey value mean (arb. unit)×10 4

2 2.5 3 3.5 4 4.5 5

Sam

ple

gre

y v

alu

e v

ariance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

Figure 4.8: Fitting a mixture of 3 linear regressions on a sample of 10 000sample mean-variance data. The (red) dotted line represent the 95% predic-tion interval.

27

Page 31: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

(a) σ = 102

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

(b) σ = 103

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

(c) σ = 104

Sample mean (arb. unit) ×10 4

2 3 4 5

Sam

ple

variance

(arb

. unit

2)

×10 5

0.5

1

1.5

2

2.5

3

3.5

4×10

-3

0

1

2

3

4

5

6

(d) σ = 105

Figure 4.9: 3 independent Gaussian weighted least squares regressions. Theweights depended on the distance, from the sample mean to a k-mean samplemean grey value, and some width σ. The (red) dotted line represent the 95%prediction interval.

28

Page 32: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Chapter 5

Latent Variable Model

Probabilistic latent models was used to model the grey values for each pixel.These latent variables can be used to describe the underlining structure, andperhaps sources of variance. In this chapter, principal component analy-sis and factor analysis was conducted on the 100 images. At the end, thecompound Poisson model will be presented with a toy example.

5.1 Principal Component Analysis

Each pixel has a grey value which was treated as a random variable. By esti-mating the covariance matrix sources of (orthogonal) variance was estimated.This was done by investigating the eigendecomposition representation of thesample covariance matrix. Such a technique is called principal componentanalysis [22, pp. 329-330].

5.1.1 Theory

Suppose the m×m covariance matrix Σ has eigenvalues λi and eigenvectorsφi for i = 1, 2, ··· ,m. Then the eigenvectors and eigenvalues are related by

Σφi = λiφi . (5.1)

A closure can be form by pre-multiplying both sides by φTj so that

φTj Σφi = λiφ

Tj φi(

ΣTφj

)Tφi = λiφ

Tj φi .

29

Page 33: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

But given the covariance matrix is symmetric, Σ = ΣT, and λj is the corre-sponding eigenvalue of the eigenvector φj, then(

λjφj

)Tφi = λiφ

Tj φi

λjφTj φi = λiφ

Tj φi .

As long as λi 6= λj for i 6= j, then φTj φi = 0. Therefore all eigenvalues in the

covariance matrix are orthogonal to each other.Equation (5.1) can be extended to include all eigenvectors and eigenvalues

such that

Σ

↑ ↑ ↑φ1 φ2 · · · φm

↓ ↓ ↓

=

↑ ↑ ↑λ1φ1 λ2φ2 · · · λpφm

↓ ↓ ↓

. (5.2)

Transposing both sides← φ1 →← φ2 →

...← φm →

Σ =

λ1 0 · · · 00 λ2 · · · 0...

.... . .

0 0 · · ·λm

← φ1 →← φ2 →

...← φm →

but because all eigenvectors are orthogonal to each other

Σ =

↑ ↑ ↑φ1 φ2 · · · φm

↓ ↓ ↓

λ1 0 · · · 00 λ2 · · · 0...

.... . .

0 0 · · ·λm

← φ1 →← φ2 →

...← φm →

.

This is a weighted sum of the outer products of the eigenvectors, thereforethe eigendecomposition is

Σ =m∑i=1

λiφiφTi . (5.3)

Because this is a weighted sum, the eigenvectors with the biggest eigenvaluewill be most responsible for the covariance matrix. φi are called the principlecomponents.

30

Page 34: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

5.1.2 Methods

The 100 images were shrunk with averaging to a size of 100 × 100 pixelsusing ImageJ [23]. The sample covariance matrix was calculated using themaximum log likelihood estimator so that the 6 eigenvectors with the biggesteigenvalues were obtained. Using 1 600 bootstrapped samples [20], the sam-pling distribution of the 6 largest eigenvalues were sampled.

5.1.3 Results

Figure 5.1 shows the first 6 of the diagonal of the outer product of the indi-vidual principle components, they represent the normalised main sources ofvariance. It was observed that on the first and second principle components,there were sources of variance on the bottom right and top right of the im-age. This could imply that the sources of variance in the background werenot uniformly distributed in space.

The sampling distribution of the eigenvalues are shown in Figure 5.2.Quite clearly there were 2 significant sources of variance because the eigen-values of the first 2 principle components were much larger than zero whenconsidering its interquartile range.

5.2 Factor Analysis

In factor analysis, grey values for each pixel were modelled as Normallydistributed using Normally distributed latent variables. Sources of variancefrom this model were then isolated and investigated.

5.2.1 Theory

Let Y be a random vector containing k latent variables, called factors, and

Y ∼ N (0, 1) . (5.4)

Let e be a random p vector modelling the uncorrelated intrinsic noise suchthat

e ∼ N (0,Ψ) (5.5)

where Ψ is a diagonal matrix.

31

Page 35: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

PC1 ×10-4

1

2

3

4

PC2 ×10-4

1

2

3

4

PC3 ×10-4

2

4

6

8

PC4 ×10-4

2

4

6

8

PC5 ×10-4

2

4

6

8

10

PC6 ×10-4

2

4

6

8

10

12

Figure 5.1: Normalised variance due to the principle components (1 to 6).The sum of all the values in each figure equal to 1.

Eigenvalue number

1 2 3 4 5 6

Eig

envalu

e (

arb

. unit)

×10 8

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Figure 5.2: Eigenvalues of the sample covariance matrix. 1 600 bootstrapsamples were used.

32

Page 36: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Given a p× k loading matrix Λ, the observable data X can be modelledthrough

X = ΛY + e (5.6)

thusX ∼ N

(0,ΛΛT + Ψ

). (5.7)

As a result there are two sources of variance, factor noise from the ΛΛT

component and the intrinsic noise Ψ. Such a model is called factor analysis[22, pp. 462-469]. The model can be extended for non-zero mean easily byadding an arbitrary constant. X was used to represent the grey value of allpixels in an image.

The conditional distribution of X given Y can be calculated to be

X|Y ∼ N (ΛY,Ψ) . (5.8)

Using Bayes’ theorem, the conditional distribution of Y given X can becalculated

pY|X(y|x) ∝ pX|Y(x|y)pY(y)

pY|X(y|x) ∝ exp

[−1

2(x− Λy)T Ψ−1 (x− Λy)

]exp

[−1

2yTy

]pY|X(y|x) ∝ exp

[−1

2

(−xTΨ−1Λy − yTΛTΨ−1x + yTΛTΨ−1Λy + yTy

)]pY|X(y|x) ∝ exp

[−1

2

(yT(ΛTΨ−1Λ + 1

)y − 2yTΛTΨ−1x

)]and comparing it with the probability density function of a general multi-variate Normal distribution implies

Y|X ∼ N((

ΛTΨ−1Λ + 1)−1

ΛTΨ−1X,(ΛTΨ−1Λ + 1

)−1). (5.9)

Suppose X = x(i) was observed for i = 1, 2, ··· , n, then the likelihood is

L =n∏i=1

1

‖2π (ΛΛT + Ψ) ‖1/2exp

[−1

2

(x(i))T (

ΛΛT + Ψ)−1

x(i)

]. (5.10)

The log likelihood is then

lnL = −n2

ln ‖2π(ΛΛT + Ψ

)‖ − 1

2

n∑i=1

(x(i))T (

ΛΛT + Ψ)−1

x(i)

33

Page 37: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

and by using the trace and its properties

lnL = −n2

ln ‖2π(ΛΛT + Ψ

)‖ − 1

2

n∑i=1

Tr[(

x(i))T (

ΛΛT + Ψ)−1

x(i)]

lnL = −n2

ln ‖2π(ΛΛT + Ψ

)‖ − 1

2

n∑i=1

Tr[(

ΛΛT + Ψ)−1

x(i)(x(i))T]

lnL = −n2

ln ‖2π(ΛΛT + Ψ

)‖ − 1

2Tr

[(ΛΛT + Ψ

)−1 n∑i=1

x(i)(x(i))T]

.

(5.11)

The loading matrix Λ and intrinsic noise Ψ can be estimated by maximisingthe log likelihood. This was done using the EM algorithm [22, pp. 260-261].

The E step estimates the values of the latent variables, Y(1),Y(2), ··· ,Y(n),

using the conditional expectation of the latent variables given the observeddata x(1),x(2), ··· ,x

(n). The parameters Λ and Ψ are assumed to be knownin this step and are fixed. By using Equation (5.9), each latent variable arepredicted by the conditional expectation

E[Y(i)

∣∣ X(i) = x(i)]

= y(i) =(ΛTΨ−1Λ + 1

)−1ΛTΨ−1x(i) (5.12)

for i = 1, 2, ··· , n. The conditional covariance matrix of the latent variablesare estimated by

Cov[Y(i)

∣∣ X(i) = x(i)]

= ΣY =(ΛTΨ−1Λ + 1

)−1. (5.13)

In the M step, the parameters Λ and Ψ are varied to maximise the loglikelihood, assuming the latent variables are known and fixed. The objectiveis to maximise is the condition expectation of the log joint likelihood, that is

H =n∑i=1

E[ln pX,Y

(x(i),Y(i)

) ∣∣X(i) = x(i)]. (5.14)

It is given the Λ and Ψ which maximises H is

Λ =

(n∑i=1

x(i)(y(i))T)(

nΣY +n∑i=1

y(i)(y(i))T)−1

(5.15)

34

Page 38: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

and

Ψ = diag

[1

n

n∑i=1

((x(i) − Λy(i)

) (x(i) − Λy(i)

)T)+ ΛΣYΛ

]. (5.16)

The proofs for the M step can be found in Appendix B.3.

5.2.2 Methods

The 100 images were shrunk with averaging to a size of 100 × 100 pixelsusing ImageJ [23]. The data was centred so that the mean of each pixel’sgray value was zero. By doing this, factor analysis focused on modelling thevariance of the pixel’s gray values.

For a given k number of factors, the EM algorithm was initialized byrandomly assigning each latent variable to be N(0, 1) distributed and ΣY tobe the estimated covariance of the latent variables.

The EM algorithm ran many times with different initial values. Becauseevaluating the log likelihood was expensive, the stopping condition was metwhen a certain amount of E and M steps were taken. The parameters withthe highest log likelihood when such condition were met was recorded. Thelog likelihood at each step was not recorded.

The BIC [19, p. 233] was used to select the best k to use. The BIC isgiven as

BIC = −2 lnL+ 1002(k + 1) lnn . (5.17)

The data was bootstrapped [20] to assess the uncertainty in the BIC.

5.2.3 Results

k = 1, 2, ··· , 15 were investigated. The EM algorithm was repeated 80 timesfor each k with a stopping condition of 100 EM steps. The BIC for each k isas shown in Figure 5.3. The number of factors with the lowest BIC was foundto be 8. The 8 factor variances were isolated and as shown in Figure 5.4,together with the intrinsic variance as shown in Figure 5.5. It was interestingto note that in Figure 5.4, there were clusters of high variance in most of thefactors. This suggested clusters of high variance can appear in the data, thusthe variances of the pixel’s gray values can depend on each other spatially.In Figure 5.5, the shape of the 3D printed sample can be seen representedby the low intrinsic variance in the centre. This was evidence that there may

35

Page 39: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Number of factors

2 4 6 8 10 12 14

BIC

(nat)

×10 6

8

8.5

9

9.5

10

10.5

Figure 5.3: BIC for fitting different number of factors onto the data. 80different initials were used for each k with a stopping condition of 100 EMsteps.

well be a mean and variance relationship because the 3D printed sample haslower sample mean grey values.

The uncertainty of model selection was assessed by investigating how theBIC and optimal k varied for different bootstrapped samples. The BIC forthe 100 bootstrapped samples is shown in Figure 5.6. For a given k andbootstrapped sample, the EM algorithm was initialised once to capture theuncertainty if the EM algorithm converges to a global maxima or not. Fromthe figure, quite clearly there was a lot of variability in the BIC and can causeimprecise decisions on model selection. Using all 100 bootstrapped samples,the mean ± standard deviation optimal k was found to be 11 ± 3 factors.The main source of error was down to the small sample size.

5.3 Compound Poisson

Consider photons emitted from an X-ray tube as a Poisson process. Therehas been Poisson latent models developed to consider such behaviour. Forexample the number of photons detected can be modelled as Y ∼ Poisson(λ)

36

Page 40: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

200

400

600

800

1000

1200

1400

1600

1800

2000

Figure 5.4: The 8 individual factor variances with the highest log likelihoodfound. The units are arbitrary but comparable with the intrinsic variance.

200

400

600

800

1000

1200

Figure 5.5: The intrinsic variance with the highest log likelihood found. Theunits are arbitrary but comparable with the factor variances.

37

Page 41: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Number of factors

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

BIC

(nat)

×10 6

8

8.5

9

9.5

10

10.5

Figure 5.6: BIC for fitting different number of factors onto the data. 100bootstrapped samples were used. The EM algorithm was initialised onlyonce and a stopping condition of 100 EM steps was used.

and then the grey value can be modelled as X|Y ∼ N(αY, β) [24]. Sucha model explains that there is a linear relationship between the mean andvariance grey value [24]. However this does not consider that these photonshave a board energy spectrum as a result of bremsstrahlung and characteristicradiation. It can be shown that a compound Poisson model performs better[25] because it considers the board energy spectrum of the photons. Howeverfitting the model is very difficult using the EM algorithm and approximationshave to be made [26].

5.3.1 Theory

Suppose an image has m pixels and each pixel has a corresponding grey valueXi for i = 1, 2, ··· ,m. Let Yi be the number of photons detected in pixel i andeach photon has energy randomly distributed with mean µi and variance σ2

i .The number of photons detected can be modelled as a Poisson process suchthat

Yi ∼ Poisson(νiτ) (5.18)

38

Page 42: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

where νi is the post-attenuation rate and τ is the exposure time. The totalamount of energy measured Ui is then the sum of the energies of Yi photons.For large Yi, the central limit theorem can be used so that Ui is approximatelyNormally distributed so that

Ui|Yi ∼ N(Yiµi, Yiσ

2i

). (5.19)

Assuming the grey value is modelling the intensity, rate of energy per unitarea, then Xi can be modelled using

Xi|Ui ∼ N(ατUi, β

2i

)(5.20)

where α is some positive constant and β2i is the variance of some additive

Normal noise.The full joint probability density function is

pXi,Ui,Yi (xi, ui, yi) = pXi|Ui(xi|ui)pUi|Yi(ui|yi)P(Yi = yi)

pXi,Ui,Yi (xi, ui, yi) =1√

2πβiexp

[−1

2

(xi − αui/τ

βi

)2]

1√

2π√yiσ2

i

exp

−1

2

(ui − yiµi√

yiσ2i

)2 e−νiτ

(νiτ)yi

yi!(5.21)

for any real xi, ui and yi = 0, 1, 2, ···. By marginalising out Ui, a one layerlatent variable model can be obtained. Thus the EM algorithm [22, pp. 260-261] can be used to estimate the unknown parameters νi, µi, σ

2i and β2

i . Theone layer latent variable is given as

Yi ∼ Poisson(νiτ) (5.22)

Xi|Yi ∼ N

(αµiτYi,

α2σ2i

τ 2Yi + β2

i

). (5.23)

The proof of this can be found in Appendix B.4.The marginal expectation and variance [7, pp. 149-151] can be calculated

E[Xi] = E[E[Xi|Yi]]

39

Page 43: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

E[Xi] = E[αµiτYi

]E[Xi] = αµiνi (5.24)

andVar[Xi] = E[Var[Xi|Yi]] + Var[E[Xi|Yi]]

Var[Xi] = E[α2σ2

i

τ 2Yi + β2

i

]+ Var

[αµiτYi

]Var[Xi] =

α2σ2i

τ 2νiτ + β2

i +α2µ2

i

τ 2νiτ

Var[Xi] =α2νiτ

(µ2i + σ2

i

)+ β2

i . (5.25)

In this model, there are 3 sources of variance from the µ2i ,σ

2i and β2

i terms.This model can be made simpler by setting βi = 0 because by doing so

the joint probability density function is in the form of an exponential familywith sufficient statistics X2

i /Yi, Xi and Yi. This is proven in Appendix B.5.However by doing so the joint probability density function must consider thespecial case of Yi = 0 because such a case will drive the variance of Xi|Yi tozero. This can be dealt with by using the Dirac delta function [6, pp. 439-443]

pXi,Yi (xi, yi) = pXi|Yi(xi|yi)P(Yi = yi)

pXi,Yi (xi, yi) =

e−νiτδ(x) for y = 0

e−νiτ (νiτ)yiτ

yi!√

2πyiασiexp

[−1

2

(xi − yiµiα/τ)2

α2yiσ2i /τ

2

]for y = 1, 2, ···

.

(5.26)The conditional probability mass function can be obtained by

P (Yi = yi|Xi = xi) =pXi,Yi (xi, yi)

pXi(xi)

where pXi(xi) is the normalization constant. For the special cases

P (Yi = 0|Xi = 0) ∝ δ(0)

P (Yi = yi|Xi = 0) ∝ (νiτ)yi

yi!√yi

exp

[−1

2

(xi − yiµiα/τ)2

α2yiσ2i /τ

2

]for yi = 1, 2, ··· .

40

Page 44: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

But because δ(0) is infinite

P (Yi = yi|Xi = 0) =

{1 for yi = 0

0 for yi = 1, 2, ···(5.27)

andE[Yi|Xi = 0] = 0 . (5.28)

E[1/Yi|Xi = 0] cannot be defined in this special case.Another special case is

P (Yi = 0|Xi = x) ∝ δ(x) for x 6= 0

thenP (Yi = 0|Xi = x) = 0 for x 6= 0 (5.29)

For the general case yi = 1, 2, ··· and xi 6= 0

P (Yi = yi|Xi = xi) =1

pXi(xi)

e−νiτ (νiτ)yiτ

yi!√

2πyiασiexp

[−1

2

(xi − yiµiα/τ)2

α2yiσ2i /τ

2

](5.30)

where

pXi(xi) =

∞∑yi=0

pXi,Yi(xi, yi)

pXi(xi) =

∞∑yi=1

e−νiτ (νiτ)yiτ

yi!√

2πyiασiexp

[−1

2

(xi − yiµiα/τ)2

α2yiσ2i /τ

2

]. (5.31)

The expectation of the sufficient statistics are then

E [Yi|Xi = xi] =1

pXi(xi)

∞∑yi=1

e−νiτ (νiτ)yiτ

yi!√

2πασiy1/2i exp

[−1

2

(xi − yiµiα/τ)2

α2yiσ2i /τ

2

](5.32)

and

E [1/Yi| Xi = xi] =1

pXi(xi)

∞∑yi=1

e−νiτ (νiτ)yiτ

yi!√

2πασiy−3/2i exp

[−1

2

(xi − yiµiα/τ)2

α2yiσ2i /τ

2

]. (5.33)

41

Page 45: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Unfortunately such a sum cannot be simplified. The fastest approximate wayto get a value for these expectations is to truncate the sum. Another way toobtain the conditional expectation is to use the sample mean from samplesof the conditional distribution, however this could be slower depending onthe sampling algorithm.

Because the conditional probability mass function is known up to a con-stant, rejection sampling can be used to draw conditional samples. Rejectionsampling samples exactly but a proposal distribution is needed and this in-fluences the rejection rate, or in other words the efficiency. One possibleproposal is the Poisson distribution and there does exist adaptive methodswhich changes the proposal on the fly [27]. Approximate Bayesian computa-tion (ABC) [28] can be used instead because Yi and Xi|Yi can be simulated.ABC draws approximate samples without a proposal but can suffer from highrejection rates.

Suppose for pixel i, the follow grey values have been observed x(1)i , x

(2)i ,

···, x(n)i . For each corresponding observable let Y

(1)i , Y

(2)i , ··· , Y

(n)i be the Pois-

son latent variables. For the E step, Y(j)i and 1/Y

(j)i are estimated given all

the parameters using the conditional expectation for j = 1, 2, ··· , n. That is

y(j)i = E

[Y

(j)i

∣∣∣ X(j)i = x

(j)i

]and ζ

(j)i = E

[1/Y

(j)i

∣∣∣ X(j)i = x

(j)i

]respectively.

In the M step the parameters νi, µi, σ2i are estimated given the latent

variables. This is done by maximising the conditional expectation of the logjoint likelihood function, that is

Hi =n∑j=1

E[ln pXi,Yi

(x(j)i , Y

(j)i

)∣∣∣ X(j)i = x

(j)i

](5.34)

It is given that the following M step estimators can be obtained:

νi =

∑nj=1 y

(j)i

nτ(5.35)

µi =τ∑n

j=1 x(j)i

α∑n

j=1 y(j)i

(5.36)

σ2i =

1

n

[τ 2

α2

n∑j=1

(x(j)i

)2ζ(j)i −

2τµiα

n∑j=1

x(j)i + µ2

i

n∑j=1

y(j)i

](5.37)

42

Page 46: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

where(x(j)i

)2ζ(j)i = 0 if

(x(j)i

)2= 0. This is derived in Appendix B.6. It is

also given that the Cramer-Rao lower bounds [29] are

Var [νi] =νiτn

(5.38)

Var [µi] =σ2i

νiτn(5.39)

and

Var[σ2i

]=

2σ4i

n. (5.40)

This is derived in Appendix B.7.The constant α only sets the scale of µi and σ2

i but can be calibratedso that it has units of keV. From Chapter 3, the following properties of theX-ray tube were stated or calculated:

• Voltage = V = 100 kV

• Power = P = 33.0 W

• Exposure time = τ = 500 ms

• Efficiency = η = 8.1× 10−3

• Target material is tungsten.

Assume, very crudely, the X-ray spreads out and targets the detector with1 9962 pixels only and nothing else, then the total energy detected in onepixel is

U =ηPτ

1 9962= 2.1× 108 keV · pixel−1 . (5.41)

The images were calibrated so that the background has grey value of 60 000.Assuming the air has attenuation coefficient of zero, then α can be estimatedby

α =60 000τ

U= 1.4× 10−4 s · keV−1 · pixel . (5.42)

This calculation could be improved if the solid angle of the X-ray beam andthe distance between the X-ray tube and X-ray detector were given.

43

Page 47: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

5.3.2 Toy Example

A dataset was simulated, with n = 100 samples, using the follow parameterswith arbitrary units: α = 1, τ = 1, ν = 5, µ = 3 and σ2 = 0.1. The EMalgorithm was initialized by assigning each latent variable such that Y (j) ∼Poisson(φ) for i.i.d. j = 1, 2, ··· , n. The sums, to work out the expectationsand likelihood, were truncated to include 100 terms.

Figure 5.7 showed that the EM algorithm worked for different initialvalues, φ = 1, 5, 9, because the parameters converged towards the true valueand the log likelihood always increased. However for φ = 9, it appeared theEM algorithm got stuck in a local maxima and needed at least 200 steps for itto converge to the global maxima. This demonstrated that the performanceof the EM algorithm depended heavily on the initial values and how manysteps were taken.

The EM algorithm was used on 50 different simulated datasets with initialvalue φ = 4. Figure 5.8 verified that the parameters obtained from the EMalgorithm converged within the standard error of the true parameters for thisparticular toy model.

For high ντ , problems arises as more and more terms were needed in thetruncated sum. Too many terms in the truncated sum slowed down the Estep a lot.

44

Page 48: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Number of EM Steps

0 50 100 150 200 250 300

Mean e

stim

ate

(arb

. unit)

0

5

10

15

20

25

Number of EM Steps

0 50 100 150 200 250 300

Variance e

stim

ate

(arb

. unit)

10 -2

10 -1

10 0

10 1

10 2

10 3

Number of EM Steps

0 50 100 150 200 250 300

Rate

estim

ate

(arb

. unit)

0

2

4

6

8

10

Number of EM Steps

0 50 100 150 200 250 300

Log lik

elih

ood (

nats

)

-500

-450

-400

-350

-300

Figure 5.7: The parameters and log likelihood at each step of EM. The EMalgorithm was used to estimate the parameters of a single simulated n = 100dataset. The latent variables were initialized Poisson(φ) 50 times. Blue:φ = 1, Red: φ = 5, Green: φ = 9. Dotted line: true value

45

Page 49: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Number of EM Steps

0 50 100 150 200 250 300

Mean e

stim

ate

(arb

. unit)

3

3.5

4

Number of EM Steps

0 50 100 150 200 250 300

Va

ria

nce

estim

ate

(a

rb.

un

it)

10 -2

10 -1

10 0

10 1

10 2

Number of EM Steps

0 50 100 150 200 250 300

Rate

estim

ate

(arb

. unit)

3.5

4

4.5

5

5.5

Number of EM Steps

0 50 100 150 200 250 300

Log lik

elih

ood (

nats

)

-400

-380

-360

-340

-320

-300

-280

Figure 5.8: The parameters and log likelihood at each step of EM. TheEM algorithm was used to estimate the parameters of 50 different simulatedn = 100 dataset. The latent variables were initialized Poisson(4). Dottedline: true value ± standard error.

46

Page 50: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Chapter 6

Conclusion

6.1 Mean and Variance Relationship

The relationship between the sample mean and sample variance grey valuewas modelled by fitting a weighted linear regression. Less weights were givento bigger sample variances as a simple method for robust regression. It wasfound that the mean and variance relationship depended on the differentmaterials: the 3D printed sample, the background and the foam holder. Totake into account the different mean and variance relationships, a mixture oflinear regression was fitted but it failed to capture the different behavioursfrom the different materials. Weighted linear regression, where the weightsdepended on the difference between the sample mean grey value and some kmean sample mean grey value, was fitted and only captured 2 out of the 3material mean variance relationships.

There is further work to do in the mean and variance relationship. Firstlyit should be investigated whenever transforming the variance will produce amore linear model. For example the log variance or the standard deviationcould be used. Furthermore, more features than just polynomials should beused for the sample mean grey values. Lastly, it was established that thesample variance has a scaled χ2 distribution. Because the χ2 distribution isin the exponential family, generalised linear models [30] should be used toopen up more possibilities to model the relationship between the mean andvariance.

47

Page 51: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

6.2 Latent Variable Model

In order to be able to find sources of variance, latent variable models wereused as a way to develop explanatory models. PCA and factor analysis wereused on a shrunken down dataset and found the 3D printed sample does havea lower sample variance. It was also found that the background variance wasnot uniform and there were patches of high variance. However shrinking theimages down could be a problem because this could of affected the varianceof the dataset. In order to improve the data analysis, the computationalalgorithms should be scaled up so that PCA and factor analysis can be donedirectly on the full size dataset.

Another explanatory model developed under the light of physics was thecompound Poisson distribution. While the compound Poisson distributionwas used in the medical physics community [25], fitting the distribution wasextremely difficult because the conditional and marginal distributions cannotbe written down in closed form. A toy example was provided in this report.In order for the fitting to work computationally, approximations which arecheap to run must be made.

48

Page 52: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Appendix A

Source Code

MATLAB code can be found on the GitHub account https://github.com/shermanip/oxwasp1.

49

Page 53: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Appendix B

Proofs and Identities

Sections B.1, B.2 and B.3 was heavily influenced by the lectures in proba-bilistic and unsupervised learning at University College London [31].

B.1 Matrix Calculus

Let A,B,C be matrices with elements

A =

a1,1 a1,2 · · · a1,na2,1 a2,2 · · · a2,n

......

. . ....

am,1 am,2 · · · am,n

and similar for B and C.

Let DA be the gradient operator such that

DA =

∂/∂a1,1 ∂/∂a1,2 · · · ∂/∂a1,n∂/∂a2,1 ∂/∂a2,2 · · · ∂/∂a2,n

......

. . ....

∂/∂am,1 ∂/∂am,2 · · · ∂/∂am,n

then it can be shown that [32]

DA Tr(ATB

)= B (B.1)

DA Tr(ATBAC

)= BAC + BTACT (B.2)

50

Page 54: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

DA ln ‖A‖ =(A−1

)T. (B.3)

Because a vector is a special case of a matrix, these results can be extendedto vectors.

B.2 M Step for Mixture of Linear Regression

The likelihood for a general mixture model is

lnL =m∑i=1

ln

[q∑j=1

pY |S (yi|Si = j) πj

](B.4)

In the M step, the parameters βj, σ2j and πj for j = 1, 2, ··· , q are updated

given the posterior distribution of Si for i = 1, 2, ··· ,m. Consider some arbi-trary parameter θj which model j depends on. Then

∂ lnL

∂θj=

m∑i=1

πj∑qj′=1 pY |S (yi|Si = j′) πj′

∂pY |S (yi|Si = j)

∂θj

∂ lnL

∂θj=

m∑i=1

pY |S (yi|Si = j)πj∑qj′=1 pY |S (yi|Si = j′) πj′

∂ ln[pY |S (yi|Si = j)

]∂θj

which is just∂ lnL

∂θj=

m∑i=1

ri,j∂ ln

[pY |S (yi|Si = j)

]∂θj

. (B.5)

Using the above result and the log likelihood in Equation (4.16)

∇βjlnL =

m∑i=1

ri,j∇βjln

1√2πσj

exp

−1

2

(yi − (xi)

T βjσj

)2

∇βjlnL =

m∑i=1

ri,j∇βj

−1

2ln(2π)− 1

2ln(σ2j

)− 1

2

(yi − (xi)

T βjσj

)2 .

Expanding it out, where c1 is some constant,

∇βjlnL =

m∑i=1

ri,j∇βj

[− 1

2σ2j

(−2yiβ

Tj xi + βT

j xi (xi)T βj

)+ c1

]

51

Page 55: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

and using the properties of the trace

∇βjlnL =

m∑i=1

ri,j∇βj

[− 1

2σ2j

(−2yi Tr

(βTj xi)

+ Tr(βTj xi (xi)

T βj

))+ c1

]together with the results in Appendix B.1

∇βjlnL =

m∑i=1

− ri,j2σ2

j

(−2yixi + 2xi (xi)

T βj

). (B.6)

Setting this to zero

m∑i=1

ri,jyixi =m∑i=1

ri,jxi (xi)T βj

βj =

[m∑i=1

ri,jxi (xi)T

]−1 [ m∑i=1

ri,jyixi

](B.7)

and this is weighted least squares where the weights are the responsibilitieson model j.

By re-parametrise 1/σ2j = τj

∂ lnL

∂τj=

m∑i=1

ri,j∂

∂τj

[−1

2ln(2π) +

1

2ln (τj)−

τj2

(yi − (xi)

T βj

)2]∂ lnL

∂τj=

m∑i=1

ri,j

[1

2τj− 1

2

(yi − (xi)

T βj

)2]. (B.8)

Setting this to zero

σ2j =

∑mi=1 ri,j

(yi − (xi)

T βj

)2∑mi=1 ri,j

(B.9)

and this is the weighted mean squared error.To obtain the maximum log likelihood estimator for πj, a Lagrange multi-

plier must be used to put a constraint such that∑q

j=1 πj = 1. The objectiveis then

T = lnL+ λ

(q∑j=1

πj − 1

)

52

Page 56: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

∂T

∂πj=

∂πj

m∑i=1

ln

[q∑

j′=1

pY |S (yi|Si = j′) πj′

]+ λ

∂T

∂πj=

m∑i=1

pY |S (yi|Si = j)∑qj′=1 pY |S (yi|Si = j′) πj′

+ λ

∂T

∂πj=

m∑i=1

ri,jπj

+ λ . (B.10)

Setting this to zero

πj = −m∑i=1

ri,jλ

. (B.11)

λ can be determined by putting a constraint on the estimators of πj forj = 1, 2, ··· , q

−q∑j=1

m∑i=1

ri,jλ

= 1

−m∑i=1

q∑j=1

ri,j = λ

−m∑i=1

1 = λ

−m = λ (B.12)

therefore

πj =m∑i=1

ri,jm

. (B.13)

This is the average responsibility.

B.3 M Step for Factor Analysis

The objective is to maximise the condition expectation of the log joint like-lihood, that is

H =n∑i=1

E[ln pX,Y

(x(i),Y(i)

)∣∣ X(i) = x(i)].

53

Page 57: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

This can be expressed as

H =n∑i=1

E[(

ln pX|Y(x(i)|Y(i)

)+ ln pY

(Y(i)

))∣∣ X(i) = x(i)]

Because the marginal distribution of the latent variables does not depend onΛ and Ψ, the last term is not of interest. Using Equation (5.8), where c1 isa constant,

H =n∑i=1

E[ln

(1

‖2πΨ‖1/2

exp

(−1

2

(x(i) − ΛY(i)

)TΨ−1

(x(i) − ΛY(i)

)))∣∣∣∣ X(i) = x(i)

]+ c1

H =n∑i=1

E[(−1

2ln ‖2πΨ‖

−1

2

(x(i) − ΛY(i)

)TΨ−1

(x(i) − ΛY(i)

))∣∣∣∣ X(i) = x(i)

]+ c1

H = −1

2

n∑i=1

E[(

ln ‖2πΨ‖+(x(i))T

Ψ−1x(i)

−2(x(i))T

Ψ−1ΛY(i) +(Y(i)

)TΛTΨ−1ΛY(i)

)∣∣∣ X(i) = x(i)

]+ c1

By definingE[Y(i)|X(i) = x(i)

]= y(i) (B.14)

andCov

[Y(i)|X(i) = x(i)

]= ΣY , (B.15)

which are obtained from the E step. Then when taking the conditionalexpectation

H = −1

2

n∑i=1

[ln ‖2πΨ‖+

(x(i))T

Ψ−1x(i) − 2(x(i))T

Ψ−1Λy(i)

+ Tr(ΛTΨ−1ΛΣY

)+(y(i))T

ΛTΨ−1Λy(i)]

+ c1

54

Page 58: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

H = −1

2

n∑i=1

[ln ‖2πΨ‖+

(x(i))T

Ψ−1x(i) − 2(x(i))T

Ψ−1Λy(i)

+ Tr(ΛTΨ−1ΛΣY

)+ Tr

(ΛTΨ−1Λy(i)

(y(i))T)]

+ c1

H = −1

2

n∑i=1

[ln ‖2πΨ‖+

(x(i))T

Ψ−1x(i) − 2(x(i))T

Ψ−1Λy(i)

+ Tr(

ΛTΨ−1Λ(

ΣY + y(i)(y(i))T))]

+ c1

H = −n2

ln ‖Ψ‖ − 1

2

n∑i=1

(x(i))T

Ψ−1x(i) +n∑i=1

(x(i))T

Ψ−1Λy(i)

− 1

2

n∑i=1

Tr(

ΛTΨ−1Λ(

ΣY + y(i)(y(i))T))

+ c2 ,

where c2 is some constant.Taking the derivative with respect to Λ, using the results from Appendix

B.1,

DΛH = DΛ

[n∑i=1

(x(i))T

Ψ−1Λy(i) − 1

2

n∑i=1

Tr(

ΛTΨ−1Λ(

ΣY + y(i)(y(i))T))]

DΛH = DΛ

[n∑i=1

Tr((

y(i))T

ΛTΨ−1x(i))

−1

2

n∑i=1

Tr(

ΛTΨ−1Λ(

ΣY + y(i)(y(i))T))]

DΛH = DΛ

[n∑i=1

Tr(

ΛTΨ−1x(i)(y(i))T)

−1

2

n∑i=1

Tr(

ΛTΨ−1Λ(

ΣY + y(i)(y(i))T))]

55

Page 59: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

DΛH =n∑i=1

Ψ−1x(i)(y(i))T − n∑

i=1

Ψ−1Λ(

ΣY + y(i)(y(i))T)

.

Setting the derivative to zero

0 =n∑i=1

Ψ−1x(i)(y(i))T − n∑

i=1

Ψ−1Λ(

ΣY + y(i)(y(i))T)

.

0 =n∑i=1

x(i)(y(i))T − Λ

(nΣY +

n∑i=1

y(i)(y(i))T)

Λ =

(n∑i=1

x(i)(y(i))T)(

nΣY +n∑i=1

y(i)(y(i))T)−1

. (B.16)

The objective can be re-parametrised using T = Ψ−1 so that

H = −n2

ln ‖T−1‖ − 1

2

n∑i=1

(x(i))T

Tx(i) +n∑i=1

(x(i))T

TΛy(i)

− 1

2

n∑i=1

Tr(

ΛTTΛ(

ΣY + y(i)(y(i))T))

+ c2 .

‖T−1‖ = 1/‖T‖ and using the properties of the trace

H =n

2ln ‖T‖ − 1

2

n∑i=1

Tr(

Tx(i)(x(i))T)

+n∑i=1

Tr(

TΛy(i)(x(i))T)

−n∑i=1

1

2Tr(

TΛ(

ΣY + y(i)(y(i))T)

ΛT)

+ c2 .

Taking the derivative with respect to T, using the results from AppendixB.1,

DTH =n

2T−1 − 1

2

n∑i=1

x(i)(x(i))T

+n∑i=1

Λy(i)(x(i))T

−n∑i=1

1

2Λ(

ΣY + y(i)(y(i))T)

ΛT .

56

Page 60: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Setting the derivative to zero

0 =n

2Ψ− 1

2

n∑i=1

x(i)(x(i))T

+n∑i=1

Λy(i)(x(i))T

− n

2ΛΣYΛ− 1

n∑i=1

y(i)(y(i))T

ΛT

0 =n

2Ψ− 1

2

n∑i=1

((x(i) − Λy(i)

) (x(i) − Λy(i)

)T)− n

2ΛΣYΛ

Ψ =1

n

n∑i=1

((x(i) − Λy(i)

) (x(i) − Λy(i)

)T)+ ΛΣYΛ . (B.17)

Because Ψ is a diagonal matrix, only the diagonal entries need to be updated.

B.4 Obtaining a One-Layer Compound Pois-

son Model

The full joint probability density function is

pXi,Ui,Yi (xi, ui, yi) = pXi|Ui(xi|ui)pUi|Yi(ui|yi)P(Yi = yi)

pXi,Ui,Yi (xi, ui, yi) =1√

2πβiexp

[−1

2

(xi − αui/τ

βi

)2]

1√

2π√yiσ2

i

exp

−1

2

(ui − yiµi√

yiσ2i

)2 e−νiτ

(νiτ)yi

yi!(B.18)

for any real xi, ui and yi = 0, 1, 2, ···.By marginalising out Ui, a one layer latent variable model can be obtained

and the unknown parameters νi, µi, σ2i , β

2i can be estimated using the EM

algorithm. This can be shown by integrating the full joint probability density

57

Page 61: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

function with respect to ui

pXi,Yi (xi, yi) =

∫ ui=∞

ui=−∞

1√2πβi

exp

[−1

2

(xi − αui/τ

βi

)2]

1√

2π√yiσ2

i

exp

−1

2

(ui − yiµi√

yiσ2i

)2 e−νiτ

(νiτ)yi

yi!dui

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

2πβiyi!√yiσi

∫ ui=∞

ui=−∞exp

[−1

2

(xi − αui/τ

βi

)2

−1

2

(ui − yiµi√

yiσ2i

)2 dui

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

2πβiyi!√yiσi

∫ ui=∞

ui=−∞exp

[−x

2i − 2αxiui/τ + α2u2i /τ

2

2β2i

−u2i − 2yµiui + y2i µ

2i

2yiσ2i

]dui

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

2πβiyi!√yiσi

∫ ui=∞

ui=−∞exp

[− 1

2yiσ2i β

2i(

x2i yiσ2i − 2αxiuiyiσ

2i /τ + α2u2i yiσ

2i /τ

2 + u2iβ2i

−2yiµiuiβ2i + y2i µ

2iβ

2i

) ]dui

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

2πβiyi!√yiσi

∫ ui=∞

ui=−∞exp

[− 1

2yiσ2i β

2i(

u2i(β2i + α2yiσ

2i /τ

2)− 2ui

(αxiyiσ

2i /τ + yiµiβ

2i

)+y2i µ

2iβ

2i + x2i yiσ

2i

) ]dui .

58

Page 62: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

The terms inside the exponential is a quadratic in terms of ui. The followingsubstitutions are made to make completing the square easier

a1 = β2i + α2yiσ

2i /τ

2 (B.19)

b1 = αxiyiσ2i /τ + yiµiβ

2i (B.20)

c1 = y2i µ2iβ

2i + x2i yiσ

2i (B.21)

so that

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

2πβiyi!√yiσi

∫ ui=∞

ui=−∞exp

[−u

2i a1 − 2uib1 + c1

2yiσ2i β

2i

]dui .

Completing the square is done by

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

2πβiyi!√yiσi∫ ui=∞

ui=−∞exp

[−u

2i − 2uib1/a1 + c1/a1 + b21/a

21 − b21/a21

2yiσ2i β

2i /a1

]dui

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

2πβiyi!√yiσi∫ ui=∞

ui=−∞exp

[−(ui − b1/a1)2 − b21/a21 + c1/a1

2yiσ2i β

2i /a1

]dui

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

2πβiyi!√yiσi

exp

[−−b

21/a

21 + c1/a1

2yiσ2i β

2i /a1

]∫ ui=∞

ui=−∞exp

[−(ui − b1/a1)2

2yiσ2i β

2i /a1

]dui .

The integral is now a Gaussian integral which can be evaluated

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

2πβiyi!√yiσi

exp

[−−b

21/a

21 + c1/a1

2yiσ2i β

2i /a1

]√2πyiσ2

i β2i

a1

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

yi!√

2π√a1

exp

[− 1

2a1

(−b21 + a1c1yiσ2

i β2i

)].

59

Page 63: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Substituting in Equations (B.19), (B.20) and (B.21)

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

yi!√

2π√a1

exp

[− 1

2a1(− (αxiyiσ

2i /τ + yiµiβ

2i )

2+ (β2

i + α2yiσ2i /τ

2) (y2i µ2iβ

2i + x2i yiσ

2i )

yiσ2i β

2i

)]

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

yi!√

2π√a1

exp

[− 1

2a1(−α2x2i y

2i σ

4i /τ

2 − 2αxiy2i σ

2i µiβ

2i /τ − y2i µ2

iβ4i

yiσ2i β

2i

+y2i µ

2iβ

4i + x2i yiσ

2i β

2i + α2y3i σ

2i µ

2iβ

2i /τ

2 + α2x2i y2i σ

4i /τ

2

yiσ2i β

2i

)]

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

yi!√

2π√a1

exp

[− 1

2a1(−2αxiy

2i σ

2i µiβ

2i /τ + x2i yiσ

2i β

2i + α2y3i σ

2i µ

2iβ

2i /τ

2

yiσ2i β

2i

)].

As before, completing the square can be done by making the following sub-stitutions

a2 = yiσ2i β

2i (B.22)

b2 = αy2i σ2i µiβ

2i /τ (B.23)

c2 = α2y3i σ2i µ

2iβ

2i /τ

2 (B.24)

so that

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

yi!√

2π√a1

exp

[− 1

2a1

(−2b2xi + a2x

2i + c2

a2

)]

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

yi!√

2π√a1

exp

[− 1

2a1(x2i − 2b2xi/a2 + b22/a

22 − b22/a22 + c2/a2

) ]

60

Page 64: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

yi!√

2π√a1

exp

[−−b

22/a

22 + c2/a22a1

]exp

[−(xi − b2/a2)2

2a1

].

Next is to note that −b22/a22 + c2/a2 = 0, this can be shown by

−b22/a22 + c2/a2 = −α2y4i σ

4i µ

2iβ

4i /τ

2

y2i σ4i β

4i

+α2y3i σ

2i µ

2iβ

2i /τ

2

yiσ2i β

2i

−b22/a22 + c2/a2 = −α2y2i µ

2i

τ 2+α2y2i µ

2i

τ 2

− b22/a22 + c2/a2 = 0 . (B.25)

In addition, the following equation can be obtained as well

b2/a2 =αyiµiτ

. (B.26)

As a result by substituting back in Equation (B.19), the following can beobtained

pXi,Yi (xi, yi) =e−νiτ (νiτ)yi

yi!

1√

2π√β2i + α2yiσ2

i /τ2

exp

−1

2

(xi − yiµiα/τ√β2i + α2yiσ2

i /τ2

)2 . (B.27)

This is the product of a Poisson probability mass function and a Normalprobability density function. Finally by inspection the one layer latent modelis

Yi ∼ Poisson(νiτ) (B.28)

Xi|Yi ∼ N

(αµiτYi,

α2σ2i

τ 2Yi + β2

i

). (B.29)

B.5 Joint Compound Poisson is in the Expo-

nential Family

For the y = 1, 2, ··· case, the joint probability density function can be rewrit-ten as

pXi,Yi (xi, yi) =e−νiτ (νiτ)yiτ

yi!√

2πyiασiexp

[−1

2

(xi − yiµiα/τ)2

α2yiσ2i /τ

2

]

61

Page 65: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

pXi,Yi (xi, yi) =e−νiττ√2πασi

exp

[−1

2

(xi − yiµiα/τ)2

α2yiσ2i /τ

2

−yi ln(νiτ)− 1

2ln yi − ln yi!

]

pXi,Yi (xi, yi) =e−νiττ√2πασi

exp

[− τ 2

2α2σ2i

x2i − 2αµixiyi/τ + α2µ2i y

2i /τ

2

yi

−yi ln(νiτ)− 1

2ln yi − ln yi!

]

pXi,Yi (xi, yi) =e−νiττ√2πασi

exp

[− τ 2

2α2σ2i

(x2iyi− 2αµixi

τ+α2µ2

i yiτ 2

)−yi ln(νiτ)− 1

2ln yi − ln yi!

]

pXi,Yi (xi, yi) =e−νiττ√2πασi

exp

[− τ 2

2α2σ2i

x2iyi

+τµiασ2

i

xi

−(µ2i

2σ2+ ln(νiτ)

)yi −

1

2ln yi − ln yi!

].

This shows that the joint probability density function is a form of an expo-nential family where the sufficient statistics are X2

i /Yi, Xi and Yi.

B.6 M Step for Compound Poisson

In the M step the parameters νi, µi, σ2i are estimated given the latent vari-

ables. This is done by maximising the conditional expectation of the log jointlikelihood function, that is

Hi =n∑j=1

E[ln pXi,Yi

(x(j)i , Y

(j)i

)∣∣∣ X(j)i = x

(j)i

].

62

Page 66: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Substituting in Equation (5.26), where c1 and c2 are constants,

Hi =n∑j=1

E

−νiτ + Y(j)i ln(νiτ)− 1

2ln

(α2σ2

i

τ 2Y

(j)i

)

−1

2

(x(j)i − Y

(j)i µiα/τ

)2α2Y

(j)i σ2

i /τ2

∣∣∣∣∣∣∣ X(j)i = x

(j)i

+ c1

Hi = −nνiτ + ln(νiτ)n∑j=1

E[Y

(j)i

∣∣∣ X(j)i = x

(j)i

]− n

2ln

(α2σ2

i

τ 2

)

− 1

2

n∑j=1

E

(x(j)i − Y

(j)i µiα/τ

)2α2Y

(j)i σ2

i /τ2

∣∣∣∣∣∣∣ X(j)i = x

(j)i

+ c2

Hi = −nνiτ + ln(νiτ)n∑j=1

E[Y

(j)i

∣∣∣ X(j)i = x

(j)i

]− n

2ln

(α2σ2

i

τ 2

)

− 1

2

n∑j=1

E

(x(j)i

)2− 2x

(j)i Y

(j)i µiα/τ +

(Y

(j)i µiα/τ

)2α2Y

(j)i σ2

i /τ2

∣∣∣∣∣∣∣ X(j)i = x

(j)i

+ c2

Hi = −nνiτ + ln(νiτ)n∑j=1

E[Y

(j)i

∣∣∣ X(j)i = x

(j)i

]− n

2ln

(α2σ2

i

τ 2

)

− τ 2

2α2σ2i

n∑j=1

(x(j)i

)2E[1/Y

(j)i

∣∣∣ X(j)i = x

(j)i

]+τµiασ2

i

n∑j=1

x(j)i

− µ2i

2σ2i

n∑j=1

E[Y

(j)i

∣∣∣ X(j)i = x

(j)i

]. (B.30)

Let y(j)i = E

[Y

(j)i

∣∣∣ X(j)i = x

(j)i

]and ζ

(j)i = E

[1/Y

(j)i

∣∣∣ X(j)i = x

(j)i

]which

63

Page 67: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

was obtained from from the E step. Then

Hi = −nνiτ + ln(νiτ)n∑j=1

y(j)i −

n

2ln

(α2σ2

i

τ 2

)

− τ 2

2α2σ2i

n∑j=1

(x(j)i

)2ζ(j)i +

τµiασ2

i

n∑j=1

x(j)i −

µ2i

2σ2i

n∑j=1

y(j)i . (B.31)

This can be maximised by setting the partial differentials to zero with respectto νi, µi and σ2

i .Taking the partial derivative with respect to νi

∂Hi

∂νi= −nτ +

∑nj=1 y

(j)i

νi. (B.32)

Setting this to zero, the estimator for νi is

νi =

∑nj=1 y

(j)

nτ. (B.33)

Taking the partial derivative with respect to µi

∂Hi

∂µi=

τ

ασ2i

n∑j=1

x(j)i −

µiσ2i

n∑j=1

y(j)i . (B.34)

Setting this to zero, the estimator for µi is

µi =τ

α

∑nj=1 x

(j)i∑n

j=1 y(j)i

. (B.35)

Finally, taking the partial derivative with respect to σ2i

∂Hi

∂σ2i

= − n

2σ2i

+τ 2

2α2σ4i

n∑j=1

(x(j)i

)2ζ(j)i −

τµiασ4

i

n∑j=1

x(j)i +

µ2i

2σ4i

n∑j=1

y(j)i . (B.36)

Setting this to zero

−nσ2i

2+

τ 2

2α2

n∑j=1

(x(j)i

)2ζ(j)i −

τµiα

n∑j=1

x(j)i +

µ2i

2

n∑j=1

y(j)i = 0

64

Page 68: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

and the estimator for σ2i is

σ2i =

1

n

[τ 2

α2

n∑j=1

(x(j)i

)2ζ(j)i −

2τµiα

n∑j=1

x(j)i + µ2

i

n∑j=1

y(j)i

]. (B.37)

Care should be taken because for y(j)i = 0, ζ

(j)i is undefined. For this special

case it is easiest to just set ζ(j)i = 0 as though that data point was removed.

B.7 Cramer-Rao Lower Bounds for the Com-

pound Poisson

Suppose the latent variables are known. The log joint likelihood as a functionof random variables is

lnL = −nνiτ + ln(νiτ)n∑j=1

Y(j)i −

n

2ln

(α2σ2

i

τ 2

)

− τ 2

2α2σ2i

n∑j=1

(X

(j)i

)2/Y

(j)i +

τµiασ2

i

n∑j=1

X(j)i −

µ2i

2σ2i

n∑j=1

Y(j)i . (B.38)

The Fisher’s information matrix is

I = −E

∂2 lnL

∂ν2i

∂2 lnL

∂νi∂µi

∂2 lnL

∂νi∂σ2i

∂2 lnL

∂νi∂µi

∂2 lnL

∂µ2i

∂2 lnL

∂µi∂σ2i

∂2 lnL

∂νi∂σ2i

∂2 lnL

∂µi∂σ2i

∂2 lnL

∂ (σ2i )

2

. (B.39)

65

Page 69: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

By substituting in the log joint likelihood, it can be shown that

I = E

∑nj=1 Y

(j)i

ν2i0

0

∑nj=1 Y

(j)i

σ2i

0τ∑n

j=1X(j)i

ασ4i

− µiσ4i

∑nj=1 Y

(j)i

0

τ∑n

j=1X(j)i

ασ4i

− µiσ4i

∑nj=1 Y

(j)i

− n

2σ4i

+τ 2

α2σ6i

∑nj=1

(X

(j)i

)2/Y

(j)i −

2τµiασ6

i

∑nj=1X

(j)i +

µ2i

σ6i

∑nj=1 Y

(j)i

.

(B.40)

Using the results that E[Y

(j)i

]= νiτ and E

[X

(j)i

]= αµνi, the following can

be obtained.

E

[n∑j=1

Y(j)i

]= nνiτ (B.41)

E

[n∑j=1

X(j)i

]= nαµiνi . (B.42)

Also assuming the E step estimators are unbiased, then

nσ2i = E

[τ 2

α2

n∑j=1

(X

(j)i

)2/Y

(j)i −

2τµiα

n∑j=1

X(j)i + µ2

i

n∑j=1

Y(j)i

]. (B.43)

Using these expectations, the Fisher’s information matrix can be calculatedto be

I =

νi0 0

0nνiτ

σ2i

0

0 0n

2σ4i

. (B.44)

66

Page 70: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

The Carmer-Rao bound is

Var

νiµiσi

2

> diag[I−1]

(B.45)

Var

νiµiσi

2

>

νinτ

σ2i

nνiτ

2σ4i

n

. (B.46)

67

Page 71: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

Bibliography

[1] Kaufui V Wong and Aldo Hernandez. A review of additive manufactur-ing. ISRN Mechanical Engineering, 2012, 2012.

[2] Hideo Kodama. Automatic method for fabricating a three-dimensionalplastic model with photo-hardening polymer. Review of Scientific In-struments, 52(11):1770–1773, 1981.

[3] Hyun-Wook Kang, Sang Jin Lee, In Kap Ko, Carlos Kengla, James JYoo, and Anthony Atala. A 3D bioprinting system to produce human-scale tissue constructs with structural integrity. Nature biotechnology,34(3):312–319, 2016.

[4] Audrey Kueh, Wilfrid S Kendall, and Thomas E Nichols. Modelling thepenumbra in computed tomography. 2014.

[5] Julia Brettschneider, John Albert Thornby, Thomas E Nichols, and Wil-frid S Kendall. Spatial analysis of dead pixels. 2014.

[6] Kenneth Franklin Riley, Michael Paul Hobson, and Stephen John Bence.Mathematical methods for physics and engineering. Cambridge Univer-sity Press, 3rd edition, 2006.

[7] John Rice. Mathematical statistics and data analysis. Brooks/Cole Cen-gage Learning, 3rd edition, 2009.

[8] Angela Cantatore and Pavel Muller. Introduction to computed tomog-raphy. Technical report, DTU Mechanical Engineering, 2011.

[9] Godfrey N Hounsfield. Computed medical imaging. Journal of computerassisted tomography, 4(5):665–674, 1980.

68

Page 72: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

[10] Greg Michael. X-ray computed tomography. Physics Education,36(6):442, 2001.

[11] Frank Welkenhuyzen, Kim Kiekens, Mieke Pierlet, Wim Dewulf, PhilipBleys, Jean-Pierre Kruth, and Andre Voet. Industrial computer to-mography for dimensional metrology: overview of influence factors andimprovement strategies. In Proceedings of the 4th International Confer-ence on Optical Measurement Techniques for Structures and Systems:Optimess2009, pages 401–410, 2009.

[12] Robert Andrews Millikan. A direct photoelectric determination ofPlanck’s “h”. Physical Review, 7(3):355, 1916.

[13] JH Hubbell. Electron–positron pair production by photons: A historicaloverview. Radiation Physics and Chemistry, 75(6):614–623, 2006.

[14] Arthur H Compton. A quantum theory of the scattering of X-rays bylight elements. Physical review, 21(5):483, 1923.

[15] Idris A Elbakri and Jeffrey A Fessler. Statistical image reconstructionfor polyenergetic X-ray computed tomography. IEEE Transactions onMedical Imaging, 21(2):89–99, 2002.

[16] M Bartscher, U Hilpert, J Goebbels, and G Weidemann. Enhancementand proof of accuracy of industrial computed tomography (CT) measure-ments. CIRP Annals-Manufacturing Technology, 56(1):495–498, 2007.

[17] Rodney A Brooks and Giovanni Di Chiro. Principles of computer as-sisted tomography (CAT) in radiographic and radioisotopic imaging.Physics in medicine and biology, 21(5):689, 1976.

[18] Eric W Weisstein. Bonferroni correction. http://mathworld.wolfram.com/BonferroniCorrection.html, 2004. From MathWorld–A WolframWeb Resource., Accessed: 18/04/2016.

[19] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elementsof statistical learning. Springer series in statistics Springer, Berlin, sec-ond edition, 2009.

[20] Bradley Efron. Bootstrap methods: another look at the jackknife.Springer, 1992.

69

Page 73: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

[21] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximumlikelihood from incomplete data via the EM algorithm. Journal of theroyal statistical society. Series B (methodological), pages 1–38, 1977.

[22] David Barber. Bayesian reasoning and machine learning. CambridgeUniversity Press, 2012.

[23] Johannes Schindelin, Ignacio Arganda-Carreras, Erwin Frise, VerenaKaynig, Mark Longair, Tobias Pietzsch, Stephan Preibisch, Curtis Rue-den, Stephan Saalfeld, Benjamin Schmid, et al. Fiji: an open-sourceplatform for biological-image analysis. Nature methods, 9(7):676–682,2012.

[24] Suchen Jin. Investigating the random variation of computed tomographydata. Master’s thesis, University of Warwick, Department of Statistics,2014.

[25] Bruce R Whiting, Parinaz Massoumzadeh, Orville A Earl, Joseph AOSullivan, Donald L Snyder, and Jeffrey F Williamson. Properties ofpreprocessed sinogram data in X-ray computed tomography. Medicalphysics, 33(9):3290–3303, 2006.

[26] Liangjun Xie. X-ray Computed Tomography Image Reconstruction:Energy-integrating Detector and Performance Analysis. PhD thesis,Washington University, Department of Electrical and Systems Engineer-ing, 2008.

[27] George Casella, Christian P Robert, and Martin T Wells. Generalizedaccept-reject sampling schemes. Lecture Notes-Monograph Series, pages342–347, 2004.

[28] Jean-Michel Marin, Pierre Pudlo, Christian P Robert, and Robin J Ry-der. Approximate bayesian computational methods. Statistics and Com-puting, 22(6):1167–1180, 2012.

[29] Harald Cramer. Mathematical methods of statistics, volume 9. Princetonuniversity press, 1945.

[30] Peter McCullagh and John A Nelder. Generalized linear models, vol-ume 37. CRC press, 1989.

70

Page 74: Inside-Out: Characterisation of Computed Tomography Noise ...sip/miniProject1/report_oxwasp1.pdf · Computed Tomography Noise in ... Mini-Project 1 Report Sherman Ip ... photons behave

[31] Maneesh Sahani. Probabilistic and unsupervised learning, approxi-mate inference and learning in probabilistic models. http://www.

gatsby.ucl.ac.uk/teaching/courses/ml1-2014.html, 2014. Ac-cessed: 05/06/2016.

[32] Kaare Brandt Petersen, Michael Syskind Pedersen, et al. The matrixcookbook. Technical University of Denmark, 7:15, 2008.

71