improving stochastic optical reconstruction...

TU/e Technische Universiteit Eindhoven

Bachelor thesis Applied mathematics

Improving stochastic opticalreconstruction microscopy using

probability theoryModeling a Poisson point process.

Author:R.A.J. [email protected]

Supervisors:prof.dr. R.W. van der Hofstad

G.Bet MSc

In cooperation with:dr. L. Albertazzi

D. van der Zwaag MSc

July 21, 2015

Abstract

In this thesis the stochastic optical reconstruction microscopy technique is de-scribed and analyzed. A Markov model is proposed to model the dynamics involvedin the super resolution fluorescence microscopy technique. The model is validatedbased on raw data and describes part of the dynamics. The dataset should stillbe improved by the specialist after which this conclusion should be revised. TheMarkov model is implemented in Java to simulate datasets comparable to the imagesproduced through the dSTORM technique. For these simulated datasets the exactnumber and positions of the fluorescent dyes are known. The developed algorithmsare tested on these simulates sets. Subsequently the algorithms are used to analysea first preliminary dSTORM dataset. The first conclusion of this thesis is about thedistribution of the dyes over the surface of the sphere. The general assumption innanoparticle literature is that dyes attach randomly to beads. Based on a test witha type-I error of 5% and a power of 94.0% we reject the null hypothesis that thisassumption is true. More research should be done on this complex problem. Anestimator for the number of dyes that are not bleached at the equilibrium time ispresented. This builds upon only a small part of the Markov model and does not relyon the homogeneity assumption. The estimator proposed (MSE = 46.0) performsbetter than the unbiased estimator that divides the number of localizations observedby the expected blinks per dye (MSE = 79.6). Furthermore an algorithm that solvesthe inverse problem of estimating the real positions of a given number of dyes basedon observed localizations has been implemented. In the future this algorithm can beadjusted to perform better in this particular setting than the most commonly used k-means algorithm. Together the estimator and the algorithm can solve overcounting.Finally, a real dSTORM image is processed using the estimator and the algorithmpresented.

1

Contents

1 Introduction 4

2 dSTORM 52.1 Sample preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Fixed bead and chemical reaction . . . . . . . . . . . . . . . . . . . . . . . 52.3 Microscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 Fitting procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.5 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.6 Overcounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Mathematical model 123.1 Poisson point process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Distribution emitters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.4 Behavior one emitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.5 ON-state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.6 Distribution of the localizations . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Estimating model parameters 164.1 Transition rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1.1 Number of blinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.1.2 Time in active state . . . . . . . . . . . . . . . . . . . . . . . . . . 194.1.3 Time in dark state(s) . . . . . . . . . . . . . . . . . . . . . . . . . . 204.1.4 Computing transition rates . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Number of dyes per bead . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Simulation 235.1 Generate emitters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.2 Generate localizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6 Model validation 266.1 Qualitative validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.2 Testing the homogeneity assumption . . . . . . . . . . . . . . . . . . . . . 27

6.2.1 Choice of test statistic . . . . . . . . . . . . . . . . . . . . . . . . . 276.2.2 Chi-squared type test . . . . . . . . . . . . . . . . . . . . . . . . . . 286.2.3 Inhomogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306.2.4 Power analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

7 Estimating number of emitters using compound Poisson properties 317.1 Number of non-bleached dyes at equilibrium . . . . . . . . . . . . . . . . . 31

7.1.1 Quality of the estimator . . . . . . . . . . . . . . . . . . . . . . . . 337.2 Number of dyes on bead . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2

8 Determining exact locations 348.1 General statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348.2 Normal mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368.3 Extra constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378.4 Implementation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

8.4.1 k-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398.5 Quality of the algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408.6 Taking the number of blinks per emitter into account . . . . . . . . . . . . 418.7 Different approach for estimating the number of emitters . . . . . . . . . . 41

9 Adapt dSTORM images 43

10 Conclusion, discussion and recommendations 46

A Goodness-of-fit plots 53A.1 Number of blinks original datasets . . . . . . . . . . . . . . . . . . . . . . . 53A.2 Number of blinks outliers removed . . . . . . . . . . . . . . . . . . . . . . . 54A.3 On-time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

B Real and simulated images 56

C Real and estimated positions emitters 60

D Calculations pA,t 63

3

1 Introduction

Microscopes play an important role in understanding phenomena and structures that arenot visible by the human eye. In fields as diverse as Biology, Chemistry and Physics,microscopes are essential and for this reason there is a lot of development in this field.Due to the wave characteristics of light there is a limit in optical resolution of microscopes.This diffraction-limited resolution is defined by E. Abbe and L. Rayleigh, and for thatreason better known as Abbe’s diffraction limit which equals Rx,y = λ

2n sin θ, Rz = 2λ

(2n sin θ)2 .Here is λ the wavelength of light, n is the refractive index of the imaging medium andθ equals the aperture angle, see [1]. As a result, conventional light microscopes can onlydistinguish (cellular) structures and objects that are at least 200 − 350 nm apart. Influorescence microscopy the object of interest are labeled with fluorescence markers. Thesecan emit photons which can be captured by the microscope. When very small objectsare imaged a problem occurs with this technique. The fluorescence signals of the differentmarkers diffract and as a result it is impossible to distinguish different sources from eachother. Last year the Royal Swedish Academy of Sciences awarded E. Betzig, S.W. Helland W.E. Moerner the Nobel prize in Chemistry 2014 for the development of the so calledsuper-resolution fluorescence microscopy (SRFM). The new nanoscopy allows scientists toproduce nano-scale (20− 50nm) resolution images of dynamic structures.

The SRFM techniques can be divided roughly in two groups: the near- and far-fieldmicroscopy. In near-field microscopy the microscope is much closer to the object of interest(distance much smaller than wavelength) than in far-field microscopy. This thesis focus onthe far-field techniques, which can further be divided in two groups. Firstly, stimulatedemission depletion (STED) reduces the size of the excited fluorescent region, which resultsin less overlap of the signals.

Secondly, some techniques, which are only slightly different from each other, dependon different principles than STED. Examples of techniques in this group are photo acti-vated localization microscopy (PALM) and stochastic optical reconstruction microscopy(STORM). The rough idea of these two techniques is to suppress most of the fluorescentdyes, the light emitters. Only a very small fraction of the dyes is active during imag-ing, reducing interference. This strongly reduces the probability of two dyes being activewithin the diffraction limit. Thus, the individually emitting molecules can be highly accu-rate localized through a fitting of the Point Spread Function (PSF). At every time framethe fluorescence emission is monitored and at the end the computer can aggregate thesedifferent images to obtain a detailed image of the object, see [1] and [2].

The development of SRFM allows to image the surface of a nanoparticle (NP). NPsare already extensively used for drug delivery, for example in treatments against cancer,see [3] and [4]. NPs used in therapies can reach their target (e.g. tumor cell) and attachto it with their side groups, which are called for this reason functional side groups. WithSRF microscopy these side groups can be imaged, counted and evaluated. This approachis applied by the Institute for Complex Molecular Systems (ICMS) in Eindhoven, see [13].

Currently the research effort of ICMS is directed towards nano-scale beads that areimaged with direct Stochastic Optical Reconstruction Microscopy (dSTORM), which will

4

be described in detail in the next section. A common implicit assumption in NP literature israndom functionalization. This means that the general assumption is that dyes will attachrandomly to the beads. In this interdisciplinary project the main goal is to develop a simplemathematical model to understand the dynamics of the dyes and simulate data comparableto real dSTORM images. For this the distribution of emitters, or dyes, over the surface ofa bead will be investigated and some of the related problems (especially overcounting) willbe tackled in order to improve super-resolution microscopy. Furthermore possibilities toextend the model and opportunities for future research will be presented. Since a thoroughunderstanding of the imaging technique is necessary before starting to model the process,the dSTORM technique will be described first.

2 dSTORM

As described in the introduction using the dSTORM technique only a very small part ofthe dyes is active during imaging which reduces interference. In this section the dSTORMtechnique will be described in more detail. During one measurement an image is made ofa solution containing the fluorescent beads. The dye used is the cyanine Alexa 647 whichis advised by Demspey et al., see [8], as a top choice. For simplicity the situation with onebead in a solution is described.

2.1 Sample preparation

A solution containing the bead and both an excess of a certain thiol and chemically adapteddyes is used. Fluorescent dyes are covalently attached to the surface of the bead throughchemically reactive groups. Due to the excess of dyes as many dyes as possible will attachuntil the beads are saturated. The dyes that do not attach to the bead do not pose aproblem for imaging.

2.2 Fixed bead and chemical reaction

After the sample is prepared it is installed on a plate in the microscope. Due to thehydrofobic character of the bead it is attached to the surface of the plate. As a result thebead is immobilized. At the beginning of the experiment all dyes are on. During a singleON-state period, the dye absorbs a high-energy photon and emits a low-energy photonthousands of times. After a certain time the dye enters a non-fluorescent, or dark, state.This is the result of a chemical reaction which occurs due to the laser and the presenceof the thiol. Using a laser beam of another wavelength (and spontaneous activations) thedye in the dark state can be activated again. The event that a photo switchable dye goesfrom the OFF-state to the ON-state, and back again, is referred to as a blink.

5

2.3 Microscopy

After a certain time the reaction reaches an equilibrium state and the recording with a CCDcamera starts, see [5]. A very small part of the fluorescent dyes gets irreversibly damagedafter being active, an effect which is referred to as photo bleaching. After bleaching, thesedyes no longer blink. In Figure 1 the complete reaction is visualized.

Figure 1: Left: Photo bleaching, Right: Chemical (equilibrium) reaction during dSTORM.

The blinking dyes emit photons for as long as they are active. The digital camera capturesimages of the photons emitted by the individual molecules. Due to diffraction of thephoton output it is not immediately clear what the location of the emitter is. In figure 2,a maximum signal intensity at the center of the image and less signal further away fromthe center can be observed. This diffraction pattern is a realization of the point spreadfunction (PSF) of the signal. It is important to note that the exact location of the emitteris unknown due to this diffraction. For this reason this real location is modelled as arandom variable.

6

Figure 2: Zoom of original STORM movie

2.4 Fitting procedure

From now on an estimate of the location of a dye will be referred to as a localization. Theexact point spread function is unknown, a Gaussian intensity function is used as an approx-

imation, see [16]. In more detail, I(x, y) = N · 4 log 2πw2 · exp

(−4 log 2 · ( (x−µx)2

w2 + (y−µy)2

w2 ))

.

Here w is the full-width-at-half-maximum (FWHM), which equals |x2−x1| where I(x1) =I(x2) = 1

2I(µ) with µ the mean of the Gaussian. Furthermore µx and µy represent the x

and y coordinates of the emitter and N equals the total number of photons detected. Byfitting the Gaussian a point estimate for the position of the emitter can be found. Thepositional accuracy is estimated as the Cramer-Rao-bound of this point estimate, see [17].

The above approach was used for two-dimensional STORM imaging. In this thesiswe focus on three-dimensional imaging. The z, or vertical, location of an emitter can bedetermined by different techniques, see [14]. We focus on the astigmatism method which isknown to be the most cost-effective technique, see [16]. σx and σy are defined as the FWHMin the x and y direction of the intensity. Astigmatism is the result of a weak cylindricallens. The focal planes for the x and ydirections become slightly different (σx 6= σy). Asa result the captured image is different for emitters of different z positions, see [7]. Thisis shown in Figure 3. A dye that is active in the average focal plane, this is at z = 0,results in a round pattern, while an emitter with a higher z position results in a moreellipsoidal image (long axis along x direction). The same effect on the image would occurfor a lower z position (this time the long axis along the y direction), see [18]. Huang et al.experimentally generate a calibration curve to determine the z position based on σx andσy, see Figure 3.

7

Figure 3: A: Principle of astigmatism, B: Calibration curve σ2x and σ2

y against z position.[7]

The intensity distribution I(x, y) should be adapted for use in a 3D setting. The intensitydistribution for a point emitter including astigmatism is described by Holtzer, see [17].This leads to

I(x, y) = N · 4 log 2

πσ2r

· exp

(−4 log 2 ·

((x− µx)2

σ2r

+ε2(y − µy)2

σ2r · ε2

)). (1)

Here ε =√

σyσx

is called the ellipticity, σ2r =

√σ2x · σ2

y, represents a generalized width.

By fitting this intensity equation estimates for µx, µy can be obtained. To estimate the µzHoltzer used a formula found by Schutz et al., see [19]. This formula is z = ± zr

σ0·√σ2 − σ2

0,in which zr is the focal depth and σ0 the diffraction-limited FWHM for a point source infocus, see [15]. Holtzer substitutes z with z + γ and z − γ, where γ equals the amount ofastigmatism, in the formula of Schutz which results in:

z =

zrσ0·√

σ2r

ε2− σ2

0 − γ if ε < 1,

− zrσ0·√σ2r · ε2 − σ2

0 + γ if ε > 1.(2)

The derivation of the Cramer-Rao bounds of the estimators for µx, µy and z can be foundin the thesis of Holtzer [17]. The values of the Cramer-Rao bounds are presented as

s2µx =

1

N

σ2r

ε2

8 log 2, s2

µy =1

N

σ2rε

2

8 log 2, s2

z =1

N

( √5z2

r

4(z ± γ)+

√5

4(z ± γ)

)2

, if ε ≶ 1. (3)

When estimating the parameter of interest we make use of the fitted Gaussian. ThisNormal distribution is a distribution from the exponential family and for this reason the

8

Cramer-Rao bounds are not only the lower bounds for the variances, but they equal thevariances, [33, Theorem 2.3].

Holtzer validated the theoretical values by simulations. As a result of camera readoutnoise, photon-counting statistics and pixelation he found a scaling factor between simula-tions and theory, see [15]. For this reason the expressions for the variances in the differentdirections derived by Holtzer are not realistic for our model. In his thesis, Wang includedpixelation and background noise and found more reliable expressions for the variances ofthe estimator of the location [18]. The details regarding the derivation of the different vari-ances are not presented here, and we limit ourselves to noting that some approximationsare used. The uncertainty values of interest are expressed in (4) -(7), where the first andthe last equation represents the variances of interest:

Var(µx) ≈σ2x + a2

12

N+

8πσ3xσyb

2

a2N2. (4)

Var(µy) ≈σ2y + a2

12

N+

8πσ3yσxb

2

a2N2.

Var(σx) ≈σ2x + a2

12

2N+

16πσ3xσyb

2

3a2N2. (5)

Var(σy) ≈σ2y + a2

12

2N+

16πσ3yσxb

2

3a2N2.

σx(z) = σ0

√1 +

(z + c

d

)2

. (6)

σy(z) = σ0

√1 +

(z − cd

)2

.

Var(µz) ≈d2σ2

xVar(σx)

(1 +σ2y−σ2

0

σ2x−σ2

0(σyσx

)2k−4)2(σ2

x − σ20)σ2

0

+d2σ2

yVar(σy)

(1 +σ2x−σ2

0

σ2y−σ2

0(σxσy

)2k−4)2(σ2

y − σ20)σ2

0

. (7)

Here σ0 is the standard deviation of the signal peak without astigmatism, c is the offset ofthe x or y from the average focal plane (z = 0), d equals the focal depth of the microscope,a equals the pixel size, b2 is the background photon per pixel, see [18]. As a result of thesettings of the microscope k = 2 is used, see [18]. All the other variables used in (4) and (7)can be obtained from the observed signals or from certain characteristics of the microscopeand the imaging software. In the rare occasion when two (or more) dyes are active in thesame time frame, the Gaussian can not be fitted and the signals are discarded. We willrefer to this shortcoming as double blinking.

9

2.5 Equilibrium

The duty cycle is defined as the ratio between the time a dye is in the blinking (active)state and the time the dye is in the dark (inactive) state, see [8]. From the moment theequilibrium is reached the duty cycle stays almost constant and data is collected. If one dyeis imaged during N time frames and signal is measured during k time frames, k,N ∈ N,the duty cycle would approximately equal k

N.

As a result of the excess of thiol present in solution (here beta-mercaptoethylamine wasused) and the excessive amount of light the dyes do not influence other dyes in their closeproximity. All dyes behave independently from each other. That is, at every moment asingle dye on a bead has the same probability to be activated as another dye that lays onthe surface of a bead with n dyes. Moreover, the probability that a dye is activated doesnot depend on the number of times this dye has been blinking during the imaging.

2.6 Overcounting

The uncertainty in the x, y direction is roughly 20 − 30nm, see [5], while we would liketo know the exact position of the emitters on the bead surface. In the final image thereare multiple localizations whose uncertainty areas do overlap. In Figure 4 each blue dotrepresents a localization and the uncertainties areas are of the size of the red bin. Theuncertainty area can be compared with a confidence interval for the exact location of theemitter of 95%.

Figure 4: STORM localizations output for one bead where the red bin gives an indicationof the average uncertainty area.

In the case that the uncertainty regions of two localizations do overlap there are threepossibilities: the two localizations comes from two different emitters, the two localizations

10

come from one emitter that has been activated and measured twice or the two localiza-tions come from one emitter that has been measured twice during his active period. Ofcourse these possibilities for two localizations can be generalized to n localizations whoseuncertainty regions overlap.

The imaging software automatically mitigates the third effect. Indeed, if a certain dyeis active during both measurements in timeframe k and k + 1, the localization of thatemitter will be based on both signals, so that the number of emitters is not overestimated.This situation is sketched in Figure 5.

Figure 5: Sketch of signal during two time frames.

Unfortunately, the remaining two effects are more challenging and the imaging softwaredoes not deal with them. Thus, one emitter can result in more than one localization, seeFigure 6. As a result it is unclear whether a cloud of localizations comes from the sameemitter. This is called overcounting and is shown in Figure 7.

Figure 6: Localizations of one emitter.

If the uncertainty regions of two localizations do not overlap it can be assumed that theydo not come from the same emitter. To solve overcounting, both the number of dyes on abead and the real positions of these dyes should be determined.

11

Figure 7: Problem: 2 or more emitters?

3 Mathematical model

In the last section the real dynamics were described. To deal with the problem of over-counting and to check the homogeneity of the distribution of the emitters a mathematicalmodel should be developed. In this section a model based on a Poisson point process anda Markov process will be presented. Before the model is explained some mathematicaldefinitions and properties will be presented.

3.1 Poisson point process

The nanoparticle, the sphere of interest, that contains the dyes is referred to as S. Let Abe a subset of R3, such that S ⊆ A and Li ∈ A for all i. N = (NA), A ⊆ R3 is a PoissonPoint Process (PPP) if:

1. P(N(A) = 0) = e−∫A Λ(x)dx, where

∫A

Λ(x)dx = Λ(A) = E[N(A)], with intensityΛ.

2. B,C ⊆ A and B∩C = ∅ ⇒ N(B) ⊥ N(C)⇔ P(N(B) = 0, N(C) = 0) = P(N(B) =0)P(N(C) = 0).

As a result the probability distribution of a PPP equals: P(N(A) = k) = Λ(A)k exp−Λ(A)k!

,see [36].

There are two concepts involved in this process. First of all the number of points inA, N(A), which is a random variable that follows a Poisson distribution with parameterΛ(A). The points are distributed over A proportionally to the measure of the subspacesof A. If the intensity in set A is equal for al its subset the PPP on A is referred to as ahomogeneous PPP. For this reason the points that are the realization of the homogeneousPPP are randomly distributed over A. In other words the distribution of the locations ofthe points conditioned on the number of points is uniformly distributed over A. PPPs havepossess important properties, for example one can prove that if all the points of a PPP aredisplaced with an displacement, D from a known distribution, then the resulting processis again a PPP whose intensity equals the convolution of the intensity of the original PPP

12

with the probability distribution of the displacement, see [37, Example 3.3]. When N(A) isa PPP another important property can be derived. Let N(B) ∼ Bin(N(A), p) then N(B)is again a PPP with intensity pλ(A), see [38, Example 5.1.27].

3.2 Markov processes

A Markov process is also referred to as a continuous-time Markov chain. The chain de-scribes a finite countable state space for which the time spend in each state is exponentiallydistributed, see [39, Chapter 6]. The Markov property states that the future behavior ofthe model only depends on the current state of the model. The rate matrix Q is definedas Qi,j = λi,j if i 6= j and Qi,j = −

∑j,j 6=i λi,j if i = j with transition rates λi,j. P (t) is the

matrix whose ith row presents the probability to be in state j at time t if the process startsin state i The Kolmogorov forward equation can be written as P ′(t) = P (t)Q, while thebackward equation results in P ′(t) = QP (t), see [38, Paragraph 6.9]. If Q is an n×n matrixwith n eigenvectors it can be written as Q = UDU−1 where U is the matrix of eigenvectorsof Q and D the diagonal matrix of eigenvalues. The matrix differential equations are nowboth solved by P = UetDU−1, see appendix D. Note that etD equals a diagonal matrix ofetµi , where µi is the ith eigenvalue of Q, i ∈ 1, ..., n.

3.3 Distribution emitters

We assume that the dyes on the sphere can be modeled as an inhomogeneous PPP. Wedefine the exact position of emitter i as Ei and the total number of emitters as Xstart.An important practical question is whether the PPP is homogeneous. Since a commonimplicit assumption in NP literature is this homogeneity we assume this as well for now.Nevertheless this assumption will be tested later on.

3.4 Behavior one emitter

In 2012 Lee et al. published a promising approach to solve the overcounting problemin PALM imaging, see [11]. This method is based upon the finding that the fluorescentmarkers used in PALM imaging photo bleach quickly after their initial activation. Incontrast, the dyes used in dSTORM show a much slower photobleaching. Nevertheless,the way the authors modeled the behavior of one emitter can be used in our case as well.Every dye on the sphere follows the process depicted in Figure 1, mathematically definedas the Markov process shown in Figure 8.

13

Figure 8: Markov model for the behavior of one emitter, λA, λB, λD denote the transitionrates.

The process starts with all the dyes emitting photons in the ON-state (A). From this Astate a dye can go to the dark (D) state or the dye can be irreversibly photobleached(B). Since almost all dyes incur in photobleaching during imaging, the number of timesan emitter is in the ON-state before it is photobleached is approximately geometricallydistributed:

Ndark ∼ Geo

(λB

λB + λD

). (8)

Note Ndark ∈ N. As a result the number of blinks per bead is geometrically distributed aswell,

Bl ∼ Geo

(λB

λB + λD

). (9)

Note Bl ∈ N \ 0 .We define Tij as the transition time between state i and state j,i, j ∈ A,B,D and i 6= j. Next to this Ton is the time that an emitter in the ON-state.

TAB ∼ exp(λB). (10)

TAD ∼ exp(λD). (11)

TDA ∼ exp(λA). (12)

Ton ∼ exp(λB + λD). (13)

Here we write that X ∼ exp(λ) when X has an exponential distribution with parameterλ.

3.5 ON-state

After entering the photobleached state (B) the dye will never leave this state again. If thedye enters the dark state (B) it will return to the active state (A) after an exponential time.

14

In the ON-state (A) an emitter is activated and starts emitting photons. The number ofphotons emitted is linearly increasing with the time the dye is active, Nphotons = νTon,where ν represents the number of photons per second. That is, the longer the dye blinksthe more photons are emitted and captured. As discussed in Section 2.4 the numberof photons determines the precision of the localization of the dye. We assume that theposition of the ith localization, Li = (x, y, z) follows a multivariate normal distributionwith expectation the exact position of the emitter Ei and covariance matrix with zero’s,except from the diagonal, which consists of the theoretical lower limits discussed in Section

2.4, Li ∼ N(Ei, I.σ

), where σ = (Var(µx),Var(µy),Var(µz)) and I the 3 × 3 identity

matrix. To determine these lower limits the number of photons observed should be known.We assume that all photons released by the emitter are observed. The number of releasedphotons are proportional to the time an emitter is active, Nphotons ∼ exp(λd+λb

ν).

3.6 Distribution of the localizations

When a dye enters the active state it starts emitting photons. The observed photons areused to estimate the location of the dye. A dye can be active more than once and for thisreason there could be more than one localization for a dye. The distribution, and especiallythe expectation, of the number of localizations in a subspace will be very important whentesting the homogeneity assumption. We define Yi(A) as the number of localizations ofemitter i in subspace A around the sphere S, and compute that

P(Yi(A) = l) =∞∑k=l

P(Bi = k) ·(k

l

)P(Li ∈ A)l · P(Li /∈ A)k−l. (14)

Next, the probability of∑N

i=1 Yi(A) = z should be evaluated, where N is again the number

of emitters. We define this sum as NL(A), i.e. NL(A) =∑N

i=1 Yi(A). We compute

P(NL(A) = z) =∞∑N=0

P(X(A) = N)z∑

x1=1

P(Y1(A) = x1)P(N∑i=2

Yi(A) = z − x1) (15)

=∞∑N=0

P(X(A) = N)z∑

x1=1

x1∑x2=1

...

xN−2∑xN−1=1

N−1∏j=1

P(Yj(A) = xj) · P

(YN(A) = z −

N−1∑j=1

xj

).

Here X(A) represents the total number of dyes on A, where A is a part of the surface ofthe bead. This expression is analytically intractable. To determine the expected numberof localizations in the subspace A around the bead of interest we use Wald’s identity:

15

E[NL(A)] = E[X(A)]E[Bl] (16)

=area(A ∩ S)

area(S)Λ(S)

λB + λDλD

.

The expected position of a localization equals the position of the emitter that is the sourceof this localization. As a result the expected total number of blinks of all the emitters ina certain subspace equals the expected number of localizations. Moreover the number ofblinks is independent from the location of an emitter.

4 Estimating model parameters

Before the mathematical model as described in the previous section can be implementedand used to simulate data sets the transition rates of the Markov chain as well as thePoisson parameter of the number of dyes on a bead should be estimated.

4.1 Transition rates

At the moment of writing only a small dataset is available. This dataset will be evaluated,improved and extended in the upcoming months. All the steps that are performed in thisthesis can be repeated in exactly the same way when a more representative dataset is avail-able.

To estimate the transition rates an analysis similar to that of Lee et al., see [11], isused. First of all beads that contain only one dye are synthesized. An excessive amount ofbeads is added to a solution with a small number of dyes. As a result approximately 200beads with one or more dyes can be produced. Roughly 1% of these beads are expected tocontain more than one dye. If the different dyes are localized within a range that is smallerthan the super-resolution this bead cannot be distinguished from a bead that contains onedye. Only 5% of these beads are indistinguishable, so only 0.05% of the beads that arediagnosed as ’one dye’ beads contains more than one dye. The beads with one dye do havethe property that the number of localizations on that bead equals the number of blinksper dye. For these localizations the time an emitter was in the active state is recorded.Next to this, the time between two successive localizations is a realization of the time adye is in the dark state. In our model the number of blinks is geometrically distributedwith parameter λB

λD+λB, while the time an emitter is on is exponentially distributed with

parameter λD+λB. If these parameters are known the transition rates λD and λB can easilybe computed. Furthermore the λA can be estimated by fitting a exponential distributionto the off-time data.

16

4.1.1 Number of blinks

The dataset with the number of blinks per dye consist of 315 numeric values in the range0 to 89. Unfortunately the observations that are smaller than 4 can not be distinguishedfrom noise. As a result this analysis is based on 212 blinks. Let Bl ∼ Geo(p), then theconditional probability mass function can be determined as:

P(Bl = k|Bl > 3) =(1− p)k−1p∑∞i=4(1− p)i−1p

= (1− p)k−4p , k ≥ 4. (17)

Lee et al. fitted the probability function to the cumulative counts of the data based on theleast squared principle. We prefer an estimator that maximizes the likelihood function forthe observed values. The log likelihood becomes:

ln(1− p)(n∑i=1

xi − 4n) + n ln(p), (18)

Remark that this is logical, since for al Bl > 3, Bl − 3 is geometrically distributed. theroot of this equation equals the maximum likelihood estimator (MLE) for the probabilityto bleach after a emitter was active. Our final result is

pbleach =n∑n

i=1 xi − 3n=

212

1296≈ 0.1636. (19)

The probability distribution function (PDF) can be plotted within a picture with thefrequency of the observations. To investigate the goodness-of-fit (GoF) a geometric prob-ability plot is used. Here the theoretical quantiles values (inverse of the PDF) are plottedagainst the sample quantiles values. In Figure 9 it looks like the PDF fits well the obser-vations. From the geometric probability plot it becomes clear that the observations with23 and more blinks do not fit the Geometric distribution based on the MLE. The laststatement is even more apparent after the observed and expected number of dyes with thenumber of blinks in a specific range, see table 1. In Appendix A one can find zoomed plotsand additional information.

Number of blinks 4 [5,6] [7,9] [10,15] [16,22] >22Expected 35 53 51 48 18 7Observed 51 64 44 30 14 9

Table 1: Expected and observed number of dye’s with number of blinks in specific interval

A Pearson Chi-square GoF test can be performed, this test statistic will be explained inmore detail later in this thesis. The data given in table 1 result in a test statistic of20.48 which should be tested against a Chi-square distribution with 5 degrees of freedom.This implies that the null hypothesis: the number of blinks data come from this fittedgeometrical distribution, would be rejected (p = 0.001). If a comparable result would

17

Number of blinks

Pro

babi

lity

5 15 25 35

0.00

0.05

0.10

0.15

0.20

0.25

5 15 25

0

20

40

60

80

Theoretical values

Em

piric

al q

uant

iles

Figure 9: PDF geometric probability plot of MLE geometric distribution and frequencycounts (blue and red line present the fitted distribution)

be found for the improved dataset the geometric distribution is not appropriate to fit thenumber of blinks per dye. Furthermore the Pearson Chi-square test is not very conservative.If the future improved dataset seems to fit this distribution better and the H0 could not berejected based on the Chi-square statistic it is important to perform a more distributionspecific GoF test, for possible tests see [35].

Table 1 shows that the problem particularly occurs in the tail of the observations. Inthis thesis no attention is paid to possible reactivation of bleached dyes. Reactivated dyesare expected to blink twice as long as dyes who are not. For this reason these reactivateddyes would especially influence the number of dyes with a high number of blinks. On thepossibility of reactivation further research is necessary. It make sense to compare theseproblems with the analysis of Lee et al. They conclude that the geometric distributionfits well, because a R2 of 0.98 is observed. The R2 is better known as coefficient ofdetermination and is especially used in GoF of regression models. Lee et al. do fit theprobability mass distribution to the data based on the least square principle, for which R2

is a measure of success as well. The R2 can be computed as follow: R2 = 1−∑ni=1(yi−fi)2∑ni=1(yi−y)2 ,

here n is the number of observations, yi represents the ith observation, y is the meanof the observations and fi is the fitted value of yi. In our setting this measure of fit is

18

very inappropriate. To illustrate this to the reader the R2 is computed for our modeland this equals 0.88, which does not indicate a misfit. The geometric distribution can atleast qualitatively be used to model the number of blinks. For our purpose of generatingdatasets which can be compared with datasets produced during dSTORM measurementsthis is enough. At this moment it is not possible to conclude if the distribution does notfit, since the data is not yet in the right form.

4.1.2 Time in active state

The dataset consists of 1109 observations of the time different emitters were emittingphotons. Since a shot with the CCD camera takes 22 ms this is data is limited as well. LetTon,i be the on-time of the ith observation, if Ton,i = 0.022 this implies 0.007 ≤ Ton,i ≤ 0.022.The lower bound is a result of the photon threshold which will be specified later. To createa dataset that is less continuous the data is smoothed. Since the difference in on-time inone time frame is very small it seems reasonable to treat the times in one time frame asif the exact on-times are distributed uniformly over the interval (even though in reality anexponential decay seems more appropriate, see [11]). If there are k observations with Ton,i =0.44, then these measurements are replaced by k

22observations with Ton = 0.22 + 0.01i, for

i ∈ 1, 2, ..., 22. Next, the parameter of the exponential distribution is determined usingthe MLE for a Exponential distribution, µon = n∑n

i=1 Ton,i, where n is the total number of

observations. The smoothed dataset results in µon = 112129.545

≈ 37.445. In Figure 10 one canobserve the histogram of the on-time data and the probability density function (green).Furthermore an exponential probability plot can be found and it becomes clear that thisexponential does not fit the probability for longer on-times. In table 2 one can observe thatthe number of on-times smaller than 1 is underestimated. As a result the Chi-square teststatistic equals 67.17 in this case. Compared to a chi-square distribution with 5 degrees offreedom the null hypothesis that this exponential distribution fits well the data is rejected(p = 4.0 · 10−12). To compare our situation with that of Lee et al. an R2 is computed andequals 0.79, again this could be misleading. In the case the data would seem to fit betterit is highly recommended to perform a more distribution specific GoF test. Furthermore adiscretized distribution should be used or the photon distribution should be fitted insteadof the on-time distribution.A possible explanation for this lack of fit could be the possibility of reactivation. Thereactivated dyes will be more often emitting for a short time than for a long time. As aresult the difference between the number of ‘long’activations and ‘short ’activations willgrow and cannot be fitted by the exponential distribution.

Number of time frames (0,1] (1,2] (2,3] (3,4] (4,5] >5Expected 679 298 131 57 25 20Observed 819 220 88 44 22 17

Table 2: Expected and observed number of dye’s with on-time in specific interval

19

Time on

Pro

babi

lity

dens

ity

0.00 0.10 0.20

0

5

10

15

20

25

30

0.00 0.10

0.00

0.05

0.10

0.15

0.20

Theoretical values

Em

piric

al q

uant

iles

Figure 10: PDF and exponential probability plot of MLE exponential distribution andfrequency counts

4.1.3 Time in dark state(s)

The last dataset contains 1025 time measurements of the time a dye is in the dark state.Again the data is discrete in number of time frames, but this seems not to be a problemsince the off-times are clearly longer than the on-times. Even though there is physicalevidence for more than one dark state only one dark state is assumed here for simplicity.The MLE of an exponential distribution is used again. Due to the huge variance, (samplestandard deviation equals 47.65), the MLE would not result in an appropriate model forthis data. Nevertheless this estimate is used in the rest of the thesis. The MLE equalsµoff = 0.0725. In Figure 11 the exponential probability plot is presented. It is clear thatthis distribution has the worst fit of the three data sets analyzed.Figure 12 presents the exponential probability plots of two exponentials fitted (MLE used)of two subgroup (Toff < 0.5 and Toff > 0.5). The fit becomes a bit better, but worsethan the fit found for the on-time and the number of blinks distribution. The presenceof two dark states, as presented in [11], seems not to be true in the STORM case. Thenumber of dark states, the characteristics of these states as well as the reactivation shouldbe investigated further.

20

0 20 40 60 80

0

100

200

300

400

Theoretical values

empi

rical

qua

ntile

s

Figure 11: CDF of MLE exponential distribution and frequency counts

0.0 0.2 0.4 0.6

0.1

0.2

0.3

0.4

0.5

Theoretical values

Em

piric

al q

uant

iles

0 50 150 250

0

100

200

300

400

Theoretical values

Em

piric

al q

uant

iles

Figure 12: CDF of MLE exponential distribution and frequency counts for two dark states

21

4.1.4 Computing transition rates

First of all note that µoff is our estimate for λA, so λA = 0.0725. The bleaching probabilityequals λB

λB+λDwhile µon is an estimate for λB + λD. The λB that will be used equals 6.125

and λD = 31.320. The estimators and algorithms presented here do not depend on thecomplete Markov model. For example the λA will determine how fast all the emitters willreach the bleached (sink) state, but will not influence the number of blinks or the numberof photons emitted by the different dyes.

4.2 Number of dyes per bead

Empirically the number of time frames that are discarded due to double blinking is verysmall compared to the total number of frames, the effect is assumed to be negligible.Let Bli be the number of blinks of emitter i during the imaging process. Define a newcompound Poisson process as follows :

Bj =

Xj∑i=1

Bli. (20)

Here Bj equals the total number of blinks of bead j and Xj equals the number of nonbleached emitters at equilibrium-time on the bead. The number of dyes that are notbleached at the equilibrium is reached can be seen as a binomial random variable, Xj ∼Bin(Xstart, 1−pbleach). Here Xstart represents the total number of dyes on a bead and pbleachis the probability that a dye is in the bleached state when the equilibrium is reached. SinceXstart is modeled as a Poisson random variable Xj is also Poisson distributed, see Section3.1. Wald’s theorem can be used to derive the expectation of Bj. This expectation can besimplified to

E[Bj] = E[X]E[Bl] (21)

= Λ(S)λB + λDλB

.

In (21) S represents the surface of the bead. It is important to note that even when thePPP would be inhomogeneous number of dyes on the bead is Poisson distributed. Considerdata of Nb beads, Λ(S) could be estimated as:

Λ(S) =1

Nb

λBλD + λB

Nb∑j=1

bj. (22)

Data of 47 beads is available. Again this data is very rough and will be evaluated andadjusted in the upcoming months. Here the data of 46 beads is used, because the varianceof the number of localizations, var(b), equals 7970 without this bead and 18745 with thisbead. This single bead is therefore declared as an outlier and removed for the analysis.This results in

22

Λ(S) =1

46· 0.1635802 · 17926 ≈ 63.75. (23)

To evaluate whether enough data is available the variance of the estimator can be computed.

Var(Λ(S)) = Var

(1

Nb

λBλD + λB

Nb∑j=1

bj

)(24)

=1

Nb

(λB

λD + λB

)2

Var(b). (25)

Var(b) Is estimated by the sample variance of the data, s2B = 7970.616. The variance

of the estimator for the PPP intensity equals approximately 0.1008. From the centrallimit theorem it follows that if the estimation is repeated the probability that Λ(S) ∈(63.13, 64.37) equals 0.95. We conclude that there is already enough data available toestimate this parameter correctly.

If this expectation of the intensity is used as an input parameter of the simulationwhich will be described in the next section a problem occurs. The duration of a blinkshould be taken into account as well. Signals consisting of less than 140 photons can notbe distinguished from the background noise. The constant ν discussed in Section 3.5 isexperimentally estimated by the specialists from the data used in the previous subsections,ν ≈ 11.9 · 104 photons per second. As a result the time threshold equals 0.007 seconds.The data used to compute the estimate for the intensity consist of the number of blinksper bead that has been visible for a time longer than this threshold, Bl′. For this reasonthe compound Poisson process should be:

Bj =

Xj∑i=1

Bl′i. (26)

Here Bl′i ∼ Bin(Bli,P(TON > 0.007)). As a result E[B] = Λ(S)E[Bl] · P(Ton > 0.007),where P(Ton > 0.007) ≈ 0.759 if the transition rates specified in the previous subsectionsare used. For simplicity this will not be used in this thesis, but this should be taken intoaccount to find a suitable estimate for the intensity that will be used as input for thesimulation. The intensity used in the simulation is

Λsimulation(S) =1

Nb

1

0.759

λBλD + λB

Nb∑j=1

bj = 84.0. (27)

5 Simulation

The mathematical model described in this thesis is implemented in Java. With the inputof the estimated transition rates and the estimate for the Poisson intensity this simulationcan generate datasets comparable to the output of dSTORM imaging. For these generated

23

datasets it is known from which emitter a localization comes, the number of emitters andtheir exact positions. For this reason these datasets are appropriate to validate estimatingprocedures for the number of emitters and their localizations. In this section the implemen-tation is described. First of all the emitters should be generated after which the imagingprocess can start until all emitters are bleached or the end time is reached. During imagingthe emitters will change states following the Markov chain. Every time an emitter goesto the active state a localization is generated based on a predetermined on-time and theposition of the emitter. The upcoming subsections will give a more detailed explanationabout the way emitters and their localizations are generated. In table 3 the input parame-ters of this simulation can be found. Note that the last five input parameters were alreadydefined in Section 2.4 and are needed to computed the variances as derived by Wang. Forthese quantities the same values as used in the experimental setting are taken, see [18],these are used as default settings for dSTORM imaging.

Input parameter Descriptionnruns The number of datasets that should be generated.Tend The imaging time in seconds, default is 1200s.Tthreshold The minimum on-time for a localizations to be distinguished from

the background photons, equals 140ν

.

λD The transition rate between the active and the dark state, here λDwill be used.

λB The transition rate between the active and bleached state, here λBwill be used.

λA The transition rate between the dark and the active state, here λAwill be used.

ΛS The intensity of the PPP, here Λsimulation(S) will be used.r The radius of the sphere (nm), in this thesis beads with a radius

of 165nm are investigated.ν The photon/s constant, as mentioned before the experimentally

estimated quantity used in this setting equals 1.9 · 104.σ0 The standard deviation of the signal peak without astigmatism,

equals 160.a The pixel size, here 167nm.b b2 represents the background photon per pixel, b2 = 36.c The focal-plane-offset equals 200nm.d The focal depth of the microscope, 360nm.

Table 3: Input parameters dSTORM data simulation

24

5.1 Generate emitters

First of all a realization of the Poisson variable, n, is determined based on an extension ofthe random number generator available in Java. Next, n emitter objects will be created.An emitter object consists of a identity number (the ID of this emitter), the number ofblinks of this emitter and its x, y, and z coordinate. Initially the number of blinks equal 0.The position coordinates are determined by generating 3 random numbers, x′, y′, z′ of astandard normal distribution (again an extension of the random number generator). Now(x, y, z) = (x′, y′, z′) r√

x′2+y′2+z′2, see [40]. After the coordinates are determined an event is

created for the emitter of interest. In this event the information of the emitter is storedand extended with the state of the emitter, the start time of this state and the time theemitter will be in this state. In the initialization all the events that are created have stateD (dark), this is close to equilibrium state. The end time is generated from a exponentialrandom generator (parameter λA). The events are stored in a sorted future event arraybased on the their time in the dark state. At the end of this initialization phase the time,t, is still 0.

5.2 Generate localizations

Next t is increased to the time of the first event in the future event array, which equalsthe minimum of all the off-times generated during the initialization. This first event isdeleted from the array. A new event is created with state A and the end time equals thesum of the current t and an on-time generated from a exponential random generator withparameter λB + λD. The new event is placed at position i in the future event array, wherethe end time of the event on position i − 1 is smaller than the end time of the event ofinterest and the event on position i has a end time that is later. The events that wereon place i to the end of the future event array are placed a position further in the array.Hereafter t is increased to the end time of the event that is now in the first position ofthe future event array. This process will continue till there is a event with state A atthe first position of the future event array. In this case a 0 or an 1 is generated as therealization of a Bernoulli experiment with probability λB

λB+λD. If the outcome is 0 a new

event with state D is created in the same way as has been done in the initialization phase.If the outcome is 1 a new event with state B is created. Before the time increases again alocalization (which is the result of the stay of the emitter in the active state) is generated.A localization contains the ID number of the emitter from which it is generated is stored,the start time of emitting and the total emitting time. Furthermore the location of thelocalization should be determined. A localization is only realized if the on-time is longerthan Tthreshold. The realization of the Lx coordinate equals a number generated from anormal distribution with expectation x, the x position of the emitter, and a variance basedon the formula of Wang and the time the emitter was on, see (4). The Ly and Lz aregenerated in an equivalent way. The variances in the different directions are stored aswell. For the verification of the implementation of the variance the on-time was fixed andcompared with the example in [18]. The number of blinks of the emitter that generated

25

the localization is increased by one. After this the process continues with the next eventin the future event array. When the next event has state D no new event is created andthe value of a count variable, k is increased. The simulation ends when k equals n or whent ≥ Tend.

The final result is a dataset consisting of localizations from which is known to whichemitter they belong and for which the number of emitters that were present on the sphere isknown. All the properties of the emitters and the localizations are stored in (two different)text files. These text files can be imported in R for further analysis.

6 Model validation

In this section the different parts of our model will be validated. The model can be used fordifferent purposes. It is very important to note that no algorithms or estimation procedureswill rely on the full model. First of all the simulation results can be used to generate datathat is comparable to dSTORM output from which the positions and the number emittersis known. With this purpose it is important that the final result of the simulation, thelocalizations, can be compared to the real images. Here it is not important to know if thisresult was reached within 3 seconds or 20 minutes. On the other hand the model couldbe used to analyze the dynamics of the emitters through time. In this thesis this will notbe done, but a method to validate if the model is appropriate for such purposes will bepresented. Last of all the model could be used to validate the homogeneity assumption ofthe distribution of the emitters over the sphere.

6.1 Qualitative validation

Later in this thesis a method to estimate the positions of the emitters will be presented.To validate this algorithm, simulated data of which the position of the emitters is knownis needed. For this reason data from the simulation is compared to real dSTORM data tovalidate the output of the simulation. The parameters used in the simulation are basedon the real data. For this reason in expectation the number of emitters and localizationsare the same. In Figure 13 four dSTORM images are presented. The first two picturesare generated by them simulation and the second two are real dSTORM pictures. InAppendix B one can find these plots in more detail. We conclude that the final outputof our simulation seems comparable to dSTORM images. It should be remarked thatthe localizations in the real dSTROM images lay a little bit closer to the surface of thebead than the simulated images. This is the result of both the fact that the varianceapproximated by Wang is smaller than in our real images and some mistakes in the on-time data.

26

Figure 13: Simulated images (left two) versus real dSTORM images (upper two) .

6.2 Testing the homogeneity assumption

In the previous subsection the final output of the algorithm was validated. Our generalmodel also consists of the way dyes are distributed over the sphere and the way local-izations are determined. The latter is equivalent to determining the resolution of thesuper-resolution technique and, as described earlier, has been investigated extensively. Ifthe validation in the previous subsection is satisfying performing a goodness-of-fit test forour localization distribution is equivalent to testing whether the dyes follow a homogeneousPoisson process on the sphere, which is the only not yet validated assumption.

6.2.1 Choice of test statistic

Kulldorf proposed a general framework for testing whether a spatial point pattern is ran-domly generated (not clustered) based on all available spatial test statistics at this moment,see [23]. Nevertheless, the situation in which one wants to test whether the sources of clus-tered data, where the distribution of the signals is known, is clustered as well has not beeninvestigated until now. The spatial point process of the localizations on its own is clearlynot homogeneous. An emitter will have more than one localizations, and this results ina clustered pattern. For this reason one would definitely be able to reject that this pointprocess is a homogeneous Poisson process. A more appropriate test statistic is needed.

Goodness-of-fit tests can roughly be divided into two groups. First of all the sum of allthe differences between the observed data and the data expected based on the null hypoth-esis could be used, for example Chi-squared type tests (see [34] for more explanation). Onthe other hand the test can be based a maximal difference between the observed and the ex-pected data to test. One could think of the parallel with functional analysis where one oftenhas to decide between the use of the L∞ and the Lp norm. It is not immediately clear whichconcept is better to test the homogeneity assumption. The most commonly used test statis-tic based on the L∞ norm is the Kolmogorov-Smirnov KS test, Dn = supt |Fn(t)− F0(t)|.Under H0 the Glivenko Cantelli theorem tells us supt |Fn(t) − F0(t)| −→ 0 almost surely,see [34]. The KS test can be seen as a test statistic, since it scans over al possible intervalsof t. A spatial scan statistic is also based on the supremum norm, designed to identify

27

hot spots or areas of increased intensity. This statistic is designed by Kulldorf in 1997,[31]. The statistic identifies potential cluster areas. Even though the statistic is based on ageneralized likelihood ratio test static, which is in limit equivalent to a chi-square statistic,see [34], a supremum is taken over these statistics and for this reason it is based on a supre-

mum norm. The Chi-square statistic is given by Q =∑k

i=1(oi−ei)2

ei. Here oi represents the

observed number of counts in subset I of S, such that⋃I = S and I1 ∩ I2 = ∅ if I1 6= I2,

where S is the space of interest. ei equals the expected number of counts in subset i basedon H0. The distribution of the number of localizations (see section 3.6) is discrete. Fordiscrete data the Chi-square test is often seen as an obvious choice. The application of theChi-square test statistic in spatial (mostly geographical, two-dimensional) point processeshas been discussed by researchers of different fields, see [24],[25] and [26]. The statistic isslightly adjusted for the detection of clusters of diseases, see [27], [28] and [29].

In 2003 Kulldorff, Tango and Park compared the power of different clustering tests, see[32]. They used, among other things, the spatial scan statistic of Kulldorff , see [23], andthe adjusted Chi-square statistic of Tango, see [29]. The results of this study shows thatdifferent clustering test are applicable for different goals. If one is interested in findingthe location of a cluster the scan statistic is preferred. The same statistic performs betterthan Chi-square type statistic for hot-spot clustering (position of clusters). On the otherhand the Chi-square type statistic is better at detecting global type clustering (detectclustering). The homogeneity assumption should be tested, a global type of clusteringproblem. We conclude that a Chi-square statistic is more appropriate than a scan statisticin our setting.

6.2.2 Chi-squared type test

The expected value of the number of localizations in a subspace A is determined by theexpected position of these localizations, that is the positions of the emitters. For thisreason the expected number of localizations equals the expected number of emitters timesthe expected number of blinks per emitter, see (16). We see that testing the GoF of ourmodel using a Chi-square statistic would be equivalent to testing whether the data fits toa homogeneous Poison point process with intensity λ ·E[Bl], where Bl denotes the numberof blinks. It has already be mentioned that our localizations are clustered. As a result thenull hypothesis will be rejected if our regions of interest are comparable to the accuracyof the microscope. The Chi-squared test should be adapted in such a way that it becomesappropriate. If the regions of interest are big then the Chi-square statistic will not be ableto detect the clustered pattern on the resolution level, see Figure 14. In this case the sizeof clusters are small compared to the region of interest. The probability that a clusterfalls into two segments is small due to the size of the region of interest. Performing theChi-square test would be appropriate when testing if the clusters are clustered.

28

Figure 14: Clustered pattern which would not be detected by a Chi-square statistic if theregions of interests are the four quadrants

First of all the number of subspaces and a way to create these should be determined.Adapting the number and sizes of subspaces while testing the data would result in unreliableresults, so that this should be determined beforehand. As described earlier the resolution ofthe dSTORM technique is roughly 20nm×20nm×50nm. For the reasons just discussed thesubspaces should be bigger than this resolution volume. To create subspaces the sphericalcoordinate system is used, defined as

x = rsin(θ)cos(φ)

y = rsin(θ)sin(φ)

z = rcos(θ).

. (28)

Note θ ∈ (−12π, 1

2π) and φ ∈ (0, 2π). Due to the difference in accuracy for the axial

and lateral directions it seems appropriate to have fewer subspaces in the axial directioncompared to the lateral directions. A simple idea is dividing the θ in three parts and φin six parts, without further constraints on the r coordinate. As a result the space R3 isdivided into 18 regions. It is important to note that not all 18 regions have the same areaand thus not the same amount of expected localizations. The upper and lower six areasall consist of 1

24of the area of the complete sphere. For this reason 1

24· Λsimulation(S) 1

pbleach

are expected in these areas. The areas of the other six equals 112

of the area of the wholesphere. Simulated data enables computing the type-I (and type-II) error for these regionsof interest. Based on these results it is possible to adapt the number of regions.

The information of different beads cannot be merged to perform a more reliable test.It is impossible to sum the number of counts in a specific subspace of different beads, thereis no rotation invariance. Since there is no reference point on the beads the orientationof the bead is unknown. As a result the homogeneity assumption can only be tested for

29

a single bead. Due to the fact that one bead does not contain much data it reasonablethat the Chi-square statistic cannot be compared to a Chi-square distribution. It is hardto find the distribution of this test statistic in closed analytic form due to the form of thedistribution of the number of localizations, see Section 3.6. For this reason a Monte Carlosimulation will be used as has been done in the research of Kulldorff et al. [32]. The teststatistics for the experimental dataset data as well as for K − 1 simulated datasets whereH0 is true are computed. If the value of the test statistic of the real data belongs to the100α1% highest values the H0 is rejected and the alternative hypothesis is accepted for thebead. This procedure was originally proposed by Dwass in 1957, see [30].

Information of nbeads beads is available, so nbeads outcomes of our bead-specific test canbe generated. These outcomes can be seen as realizations of nbeads Bernoulli experiments.Under the assumption that the H0 is true, p equals α1. The total number of rejections,T , of the single bead test can be seen as a binomial distributed random variable. Testingthe homogeneity assumption at a significance level of α would imply that P(T > Tα) ≤ α.Designing a test that satisfies this property is equivalent to determining Tα for which thisproperty is satisfied. When data of 1000 beads would be available and α1 = α = 0.05,Tα = 62, when data of the localizations of 10000 beads would be available Tα = 536. Thisfinal procedure will test if the homogeneity assumption is acceptable.

6.2.3 Inhomogeneity

At the moment of writing only data on 46 beads is available. If α1 = α = 0.05, Tαequals 5. The test statistic is computed and equals 6. The null hypothesis that thedyes are homogeneously distributed over the sphere is rejected. At this moment the beaddata is very raw and still requires processing. Nevertheless it becomes clear that thisis an important point for future research. The general assumption that dyes will attachrandomly to the beads should be investigated more and some first evidence that this willnot be the case is found.

6.2.4 Power analysis

The test is constructed in such a way that the type-I error equals 0.05. To judge whetherthe test is appropriate to detect inhomogeneity the type-II error should be calculated. InSection 5.1 the way the position of the emitters is determined was explained. Now thepositions of emitters are generated by picking a number from the uniform distributionon [0, 2π], which is assigned to the θ value of the emitter. Next a random number fromthe interval [0, π] is picked, which is assigned to the φ value of the emitter. Since allemitters should be on the sphere r equals 165. Finally, the coordinates of this emitter arecomputed according to (28). Since the area element dΩ = sin(φ)dθdφ is a function of φthe emitters will be concentrated near the poles. This way 200 inhomogeneous datasetconsisting of 46 beads are generated and the designed test will be performed on them. Thetotal number of rejections is divided by 200 to compute the power of this test (for thistype of inhomogeneous datasets). Using the central limit theorem to approximate the 95%

30

confidence interval of the power proportion for the standard error (SE) it is known that:

SE = 1.96 ·√ppower · (1− ppower)

n≤ 1.96

√0.5 · (0.5)

n.

The total number of generated datasets, n, should be chosen in such a way that thisstandard error is small. It seems sufficient to generate 46 · 200 datasets to calculate thepower of the different tests. If n = 200 the standard error is less or equal to 0.07. Thepower equals 94.0% which strengthen the rejection of the homogeneity assumption.

7 Estimating number of emitters using compound Pois-

son properties

In this section an estimation method to start tackling the overcounting, see Section 2.6, ispresented. Before the real locations of emitters in a localization cloud can be determinedit should be known how many dyes are the reason of this cloud.

7.1 Number of non-bleached dyes at equilibrium

It is very important to note that this method is based on the fit of the geometric dis-tribution for the number of blinks data. In the data analysis it became clear that thecurrent dataset seems not to be very realistic and this could be the reason for the fittingproblems of the geometric distribution. This method does not use the distribution for theon time and the off time. In section 4 an estimate for Λsimulation(S) has been computed.A probability distribution for the number of emitters (at equilibrium time), of a certainbead, given the observed blinks can be obtained. The probability generating function of Blequals GBl(z) = xp

1−x(1−p) and GNdark= p

1−x(1−p) , here p equals the pbleach. The probability

generating function of X, where X ∼ NB(p, r), GX =(

1−p1−xp

)r. In other words a sum of k

independent Ndark random variables,∑k

i=0Ndark, ∼ NB(1 − pbleach, k). The reader shouldbe remind of the fact that Bl = Ndark + 1. Using these facts the following can be derived

31

P(Xj = k|Xj∑i=1

Bl′i = bj) = P

Xj = k

∣∣∣∣∣∣Xj∑i=1

Ndark,i =

⌊bj

P(TON > 0.007))

⌋−Xj

(29)

=P(X = k,

∑Xi=1Ndark,i =

⌊bj

0.759

⌋−X

)P(∑X

i=1Ndark,i =⌊

bj0.759

⌋−X

)=

P(X = k) · P(∑k

i=1Ndark,i =⌊

bj0.759

⌋− k)

P(∑X

i=1Ndark,i =⌊

bj0.759

⌋−X

)=

P(X = k) · P(∑k

i=1Bli =bj

0.759− k)

∑∞l=0 P

(∑lm=1 Blm =

⌊bj

0.759

⌋− l)· P(X = l)

.

The estimator k that maximizes the probability P(Xj = k|∑Xj

i=1Bl′i = bj) equals the k

that maximizes P(X = k,∑X

i=1Bl′i = bj) as well as the log of this expression. Distribution

(29) is an unimodal probability distribution which first increases (if k starts at 1) and afterreaching the maximum the probability mass function decreases. As a result k = max

k :

f(k)f(k−1)

≥ 1

, note that Equation (29) is referred to as f(k). This ratio of probabilitiesequals:

f(k)

f(k − 1)=

e−Λsimulation(S)Λsimulation(S)k

k!

(⌊ bj0.759

⌋−1

k

)λD

λD+λB

bbj

0.759c−k λB

λB+λD

k

e−Λsimulation(S)Λsimulation(S)k−1

(k−1)!

(⌊ bj0.759

⌋−1

k−1

)λD

λD+λB

⌊bj

0.759

⌋−k+1 λB

λB+λD

k−1(30)

=Λsimulation(S)

k2

λBλD

(⌊bj

0.759

⌋− k + 1

)To determine k Equation (30) is equated to 1:

Λsimulation(S)

k2

λBλD

(⌊bj

0.759

⌋− k)

= 1 (31)

⇒ k2 + kΛsimulation(S)λBλD

+

⌊bj

0.759

⌋Λsimulation(S)

λBλD

= 0

⇒ k =1

2

−Λsimulation(S)λBλD±

√(Λsimulation(S)

λBλD

)2

+ 4Λsimulation(S)λBλD

⌊bj

0.759

⌋

32

Since k ≥ 1 and k = maxk : f(k)

f(k−1)≥ 1

we conclude that:

k =

∥∥∥∥∥∥1

2

−Λsimulation(S)λBλD

+

√(Λsimulation(S)

λBλD

)2

+ 4Λsimulation(S)λBλD

⌊bj

0.759

⌋∥∥∥∥∥∥ .(32)

Here ‖x‖ means the nearest integer to x. Since k ∈ 1, ...,⌊

bj0.759

⌋ the probabilities for all

possible k can be computed in R and this way k can also be determined. Furthermore a95% confidence interval can be given.

7.1.1 Quality of the estimator

The mean squared error (MSE) of an estimator equals 1n

∑ni=1(Yi − Yi)2. In general esti-

mators can be compared based on their MSE, the estimator with the smallest MSE beingthe best estimator. Since the MLE cannot be written in closed form it is impossible tofind a closed expression for the MSE as well. Nevertheless we can approximate the MSEby computing the MSE of 1000 simulated datasets and take the mean of these MSE’s. InFigure 15 a plot of the different residuals can be found. If we assume that the residuals arenormally distributed this estimator might be biased, but more simulations should be doneto conclude this. The mean MSE equals 46.0. Another possible estimate for the number ofdyes could be b

E(bl)= b(λB+λD)

λD. The MSE of this unbiased estimator equals 76.1. For this

reason the estimator described in this section is a better estimator than the alternativejust presented.

Figure 15: Frequency versus error based on 1000 simulations

In this section an estimator for the number of non-bleached emitters (at equilibrium time),of a certain bead, given the observed blinks is proposed.

7.2 Number of dyes on bead

The estimate found in the previous subsections gives an estimate for the number of non-bleached dyes at the equilibrium time. Of course this does not equal the real number of

33

dyes on a bead. In this subsection a method is presented to compute this real numberof dyes based on the previous estimate. Unfortunately the quality of this estimator willhighly depend on the goodness-of-fit of the distribution of the time in the dark state. InSection 4.1.3 it has been concluded that this distribution does not fit at all. In the rest ofthe thesis this distribution is not used.

In Section 3.2 a method was presented to compute the probability for an emitter tobe in a certain state at a given moment in time with a given the initial state of theemitter. A Mathematica script is written to determine the eigenvectors and eigenvalues ofthe rate matrix Q and to compute the final probability matrix. Given different dSTORMmeasurements an expected duty cycle can be computed. The duty cycle was defined asthe total signal in a period divided by the total duration of this period. In general theduty cycle is estimated by splitting up the total measure time and in each interval thetotal period of signal is divided by the interval length. This is done for multiple beadmeasurements and the mean of these duty cycles presents the expected duty cycle, see [8].

Working with the Markov chain the duty cycle could also be seen as the probabilitythat at least one emitter is in the active state. In dSTORM imaging all the emitters startin the active state. We define pA,t as the probability for an emitter to be in the active stateat time t when it starts in the active state. pA,t can be computed using the Mathematicascript, this complete calculation has been done for the data in this thesis and can be foundin Appendix D.

This pA,t can be used to compute the expected duty cycle of the system. Let NA,t bethe number of emitters in the active state at time t. All emitters behave independentlyand identically, so NA,t ∼ Bin(nemitters, pA,t). The duty cycle at time t equals P(NA,t >0) = 1 − P(NA,t = 0) = 1 − (1− pA,t)ntotal dyes . The equilibrium time equals the time thatthe duty cycle drops below 0.1 for the first time. With this equilibrium time pA,tequilibrium

can be computed. Next, nemitters, the real number of dyes on a bead, can be computed bydividing the number of visible dyes at equilibrium by pA,tequilibrium

. Lastly this procedurebrings a new opportunity to validate the time span of the Markov model. The equilibriumtime can be found by solving 1 − (1− pA,t)ntotal dyes = 0.1. This solution can be comparedto tequilibrium, after which the difference can be evaluated. At this point the equilibriumtime is not yet known. As a result we can not give an example of this calculation. Thisprocedure will be important if we want to improve the mathematical model to describe allthe time dependent dynamics.

8 Determining exact locations

8.1 General statements

In Section 7 an estimator for the number of emitters for a specific image has been deter-mined. The labels of the localizations with the emitter they belong to should be determined.In our setting, the positions of the emitters are not known. We refer to this as mixtures.This type of analysis is often referred to as ‘clustering’, see for example [21, Chapter 10]

34

and [22, Paragraph 14.3]. This section starts with some results from the book of Duda[21]. For this, the following assumptions are introduced:

(A1) The localizations come from a known number N of emitters.

(A2) The localization labels Hj are unknown.

(A3) The positions for the N emitters (parameter vectors) θ1, ...,θN , are unknown.

(A4) The prior probabilities, P(Hj = i) for each emitter i are known.

(A5) The form for the class-conditional probability densities fY|Hj ,θ(yj) is known.

In (A2) Hj = i means that localization j comes from emitter i. Let us focus on the priorprobability of localization j coming from emitter i, j = 1, ..., n, i = 1, ..., N . Emitters areidentical and independent from one another. Since the positions of the different emittersare unknown, every emitter is equally probable to be the source of a certain localization.The prior probability is uniform over all emitters, that is P(Hj = i) = 1

Nfor all i.

The probability mass function of the position of a localization given the positions of theemitters can be derived as

fYj |θ(yj) =N∑i=1

fYj |Hj(yj),θP(Hj = i). (33)

The likelihood given the n localizations is

L(θ|y1,y2, ...,yn) =n∏j=1

fYj|θ(yj) =n∏j=1

N∑i=1

fYj|Hj(yj),θiP(Hj = i). (34)

The maximum-likelihood estimate θ is the maximizer of (34), as well as the natural loga-rithm of (34). It is convenient to define the log-likelihood, as:

l(θ|y1,y2, ...,yn,) = lnL(θ|y1,y2, ...,yn,) =n∑j=1

ln fYj|θ(yj). (35)

The derivatives with respect to each component of θ should equal zero for the estimate ofθ that maximizes (35). The derivative equals

∂

∂θil(θ|y1,y2, ...,yn,) =

n∑j=1

1

fYj|θ(yj)· ddθi

fYj|θ(yj) =n∑j=1

1

fYj|θ(yj)· ddθi

N∑k=1

fYj|Hj ,θi(yj)·P(Hj = k).

(36)Before continuing with maximizing the log-likelihood we take a closer look at the proba-bility for a certain localization to come from a specific emitter given the location of thisemitter.

35

P(Hj = i|θ) =fYj|Hj ,θ(yj) · P(Hj = i)∑Ni=1 fYj|Hj ,θ(yj) · P(Hj = i)

=fYj|Hj ,θ(yj) · P(Hj = i)

fYj|θ(yj). (37)

The derivative to θi is evaluated in (36). The densities that are not functions of θi do notcontribute so that,

d

dθil(θ|y1,y2, ...,yn,) =

n∑j=1

1

fYj|θ(yj)· d

dθifYj|Hj ,θi(yj) · P(Hj = i) (38)

=n∑j=1

P(Hj = i)

fYj|θ· d

dθifYj|Hj ,θi(yj).

Now Equation (37) is used to find,

d

dθil(θ|y1,y2, ...,yn,) =

n∑j=1

P(Hj = i|θ)

fYj|Hj ,θ(yj)· d

dθifYj|Hj ,θi(yj) (39)

=n∑j=1

P(Hj = i|θ)d

dθiln fYj|Hj ,θi(yj).

In other words the maximum-likelihood estimate for θi must satisfy:

n∑j=1

P(Hj = i|θ)d

dθiln fYj|Hj ,θi(yj) = 0. (40)

8.2 Normal mixtures

As described in Section 3.5 the position of a localization, given the position of the emitter,is a trivariate normal distribution. For this reason, θ = (µ1,µ2, ...,µN ), where µi ∈ R3.The class-conditional probability density fY |Hj ,θ(yj) equals:

fYj|Hj ,θ(yj) = (2π)−32 |Σj|−

12 exp

(−1

2(Yj − µHj)

′Σj−1(Yj − µHj)

). (41)

Therefore

ln fYj|Hj=i,θ(yj) = −3

2ln 2π − 1

2ln |Σj| −

1

2· (Yj − µi)

′Σj−1(Yj − µi), (42)

andd

dµiln fYj|Hj=i,θ(yj) = Σj

−1(Yj − µi). (43)

36

The result presented in (40) can be used to find that the maximum likelihood estimatorµi for µi must satisfy

n∑j=1

P(Hj = i|θ)Σj−1(Yj − µi) = 0 (44)

⇐n∑j=1

P(Hj = i|θ)Σj−1Yj =

n∑j=1

P(Hj = i|θ)Σj−1µi

⇐ µi =

(n∑j=1

P(Hj = i|θ)Σj−1

)−1

·n∑j=1

P(Hj = i|θ)Σj−1Yj.

The solution for each component of µi equals:

µi,x =

∑nj=1 P(Hj = i|θ)

yj,xσj,x∑n

j=1P(Hj=i|θ)

σj,x

, µi,y =


yj,yσj,y∑n

j=1P(Hj=i|θ)

σj,y

, µi,z =


yj,zσj,z∑n

j=1P(Hj=i|θ)

σj,z

,

(45)with

P(Hj = i|θ) =fYj|Hj ,θ(yj) · P(Hj = i)∑Ni=1 fYj|Hj ,θ(yj) · P(Hj = i)

=fYj|Hj=i,θ(yj)∑Ni=1 fYj|Hj=i,θ(yj)

. (46)

Evaluating (45) one may observe that the estimate for µi can be seen as a weighted averageof the samples. The weight can be seen as the probability that a certain localization belongsto a specific emitter, note P(Hj = i|θ) is the probability that a localization comes fromemitter i given that the positions of all emitters are known and equal θ.

If some good initial estimates, µi(0), are known the iterative scheme showed in (47)could be used to improve the estimates, see [21]. Here Duda makes use of the expectation-maximization (EM) algorithm, which is extensively explained [22, Paragraph 8.5].

µi,c(k + 1) =

∑nj=1

P(Hj=i|µi,c(k))

σj,cyj,c∑n

j=1P(Hj=i|µi,c(k))

σj,c

, c ∈ x, y, z. (47)

8.3 Extra constraints

In reality all the emitters should be on the sphere, in other words (µi −m)′ ·(µi −m) = r2

where m equals the midpoint of the bead of interest and r equals the radius. We includethis in our problem by introducing a Lagrange multiplier λi in (35). This gives N extraconstraints of the form 〈(µi −m), (µi −m)〉 − r2 = 0. Here 〈x,y〉 is the inner productbetween vector x and the vector y. The new problem is

l(θ, λ|y1,y2, ...,yn) =n∑j=1

ln fYj|θ(yj)−N∑i=1

λi(µTi µi − r2). (48)

37

The system of equations to be solved is ∑nj=1 P(Hj = i|θ)Σj

−1(Yj − µi)− λiµi = 0

µTi µi − r2 = 0,(49)

or equivalently


(yj,x−µi,x)

σj,x− λiµi,x = 0∑n

j=1 P(Hj = i|θ)(yj,y−µi,y)

σj,y− λiµi,y = 0∑n

j=1 P(Hj = i|θ)(yj,z−µi,z)

σj,z− λiµi,z = 0

µ2ix + µ2

iy + µ2iz − r2 = 0.

(50)

The different µ can be expressed as a function of λ

µi,x =

(∑nj=1

P(Hj=i|θ)σj,x

yj,x

)(∑n

j=1

P(Hj=i|θ)σj,x

+λi

)

µi,y =

(∑nj=1

P(Hj=i|θ)σj,y

yj,y

)(∑n

j=1

P(Hj=i|θ)σj,y

+λi

)

µi,z =

(∑nj=1

P(Hj=i|θ)σj,z

yj,z

)(∑n

j=1

P(Hj=i|θ)σj,z

+λi

)µ2i,x + µ2

i,y + µ2i,z − r2 = 0.

(51)

The first three equations can be substituted in the last one to find:

(∑n

j=1 P(Hj = i|θ) yj,xσj,x

)2

(∑n

j=1 P(Hj = i|θ) 1σj,x

+ λi)2+

(∑n

j=1 P(Hj = i|θ) yj,yσj,y

)2

(∑n

j=1 P(Hj = i|θ) 1σj,y

+ λi)2+

(∑n

j=1 P(Hj = i|θ) yj,zσj,z

)2

(∑n

j=1 P(Hj = i|θ) 1σj,z

+ λi)2−r2 = 0.

(52)Equation (52) consist of a polynomial of degree 6 and has for this reason maximally 6solutions. In practice it turns out that most of the time there are 2 or 3 unique solutions.All the roots of (52) can be found and for each solution of λi the corresponding µi and canbe computed and can be substituted in the (log) likelihood can be computed. The λi thatmaximizes this likelihood, (48), is the best candidate.

8.4 Implementation algorithm

This algorithm to estimate the position of the emitters is implemented in the freely avail-able statistical software R. The implementation is summarized in Algorithm 1.

38

Data: y1,y2, ...,yn, N,θstart, εResult: θendθprev = 0;θnew = θstart;θintermediate = θstart;count= 0;while count 6= N do

count = 0; while i in 1:n doif ‖ θstart,i − θprev,i ‖< ε then

count=count+1;else

roots= (λi,1, ..., λi,k);while j in 1:k doθcandidate = θnew;θcandidate,i = (µi,x(λi,k),µi,y(λi,k),µi,z(λi,k));

if L(θcandidate,y1, ...,yn) > L(θnew,y1, ...,yn) thenθnew = θcandidate;

else

end

end

end

endθold = θintermediate;θintermediate = θnew;

endAlgorithm 1: Clustering algorithm

Due to the analytical properties of the likelihood the final clustering is considerably depen-dent on the initial clustering. The algorithm finds a local maximum which can be reachedstarting from our initial clustering. It is unclear whether this local maximum is close to theglobal maximum or not. The likelihood can be computed for n emitters, where n equalsthe total number of localizations. By projecting each of these localizations on the spherean upper bound for the maximum of the likelihood can be computed. The choice for theinitial clustering is an interesting problem and is left for further investigation. In this thesisonly one initial estimate is evaluated.

8.4.1 k-means

The most widely used clustering algorithm is the k-means algorithm, see e.g. [21],[22].In this algorithm, first N points are placed randomly in the space of the observations ofinterest. The data and N are the only inputs of the algorithm. Next the Euclidean distancefor each observation to each of the N points is calculated and an observation is said to

39

belong to that point that is closest to it. From this, N clusters are obtained and withineach cluster one computes the mean of all the coordinates of the members of that cluster.The result is a set of N new points and starting from these points the steps which havejust been described can be performed again. This process continues until the points of thecluster do not change anymore. The k-means algorithm is available in the software R. Theoutput of the algorithm can be used as an initial estimate for the estimators. The biggestissue of the k-means algorithm is that it only takes the positions of each localization andnot the uncertainty (number of photons collected) into account.

8.5 Quality of the algorithm

In Table 4 the mean of the log likelihood of the outcome of the algorithm and the meanof the likelihood using the real positions is presented. Furthermore one can find the upperbound of the likelihood, that is the likelihood using the projection of all signals on thesphere. In Table 4 the average of these likelihoods for 277 simulated datasets can befound.

Upper bound Real positions Estimated positions-9589.816 -13485.14 -13402.46

Table 4: Different values of the log likelihood for simulated datasets.

The log likelihood of the estimated positions is slightly higher than the log likelihoodof the real positions. As a result one would expect that the local maximum found bythe algorithm is comparable to the global maximum. In Appendix C a few examples ofreal positions of emitters and estimated positions can be found. Since there are 84 pointsexpected on the sphere it is hard to determine how well the algorithm works based onthese pictures. Furthermore it is hard to come up with an MSE variant for this problem.The positions of the emitters should be compared one by one. This matching problemcan be modeled as a weighted complete bipartite graph. The vertices can be divided intwo equal disjoint sets R and E. Here R consist of the real locations of the dyes andE is the set with estimated locations. The weight of the edge between r and e, r ∈ Rand e ∈ E, equals the Euclidean distance between these two points. Next the Hungarianalgorithm, see [41, Paragraph 3.5], can be used to find a least weighted (perfect) matchingfor this weighted graph. This weight divided by the number of dyes can be seen as theaverage distance between the estimated and the real point for a specific dye. D Is definedas this measure and can be used as a quality measure for the algorithm. The averagedistance of our algorithm equals D = 38, 87nm. The average distance of the k-meansalgorithm equals 39, 10nm. Recall that these values are reached when the exact numberof emitters is used as input for the algorithm. This difference is smaller than expected.After evaluating the implemented algorithm again it turns out that the certainty of thelocalizations is only used in the labeling, determining probability of belonging to a specificdye for a specific localizations, and is not used in determining the new position of the

40

emitter. The implementation should be adjusted so that the certainty of a localizationinfluences the weight of this localization when determining the exact position of the dyes.

The point estimate found in Section 7 can be used for the number of dyes of a specificbead and the algorithm presented (or alternatively the k-means algorithm) can estimatethe exact positions of these dyes. Together these algorithms present a first mathematically-founded solution to the overcounting problem.

8.6 Taking the number of blinks per emitter into account

In [21] no assumptions are made on the behavior of each cluster. In our situation a clusterrepresents an emitter and the number of members of a cluster represents the number ofblinks of that emitter. Therefore, in this setting, the distribution of the number of membersper cluster is known. It is not immediately clear if this information could improve theestimation procedure. Let us investigate what the effects of this extra information wouldbe on th estimation procedure. Given a certain θ 39 can be calculated, which can beinterpreted as an expectation of the number of blinks per emitter. This information shouldbe incorporated in the likelihood. For now we assume that the positions of the localizationsand the number of blinks per emitter are independent. We define the joint probabilitydensity function of the positions of the localizations and the number of blinks per emitter

L(θ|y1,y2, ...,yn,,B1, B2, ..., BN) =n∏j=1

fYj|θ(yj) ·N∏i=1

fBi|θ (53)

=n∏j=1

N∑i=1

fYj|Hj=i,θi(yj) · P(Hj = i)N∏i=1

P

(Bi =

n∑j=1

P(Hj = i|θ)

)

=n∏j=1

N∑i=1

fYj|Hj=i,θi(yj) · P(Hj = i) ·(

λdλb + λd

)N (λd

λb + λd

)n.

Here∑n

j=1 P(Hj = i|θ) should round off to the nearest integer. As it turns out theinformation about the distribution of the number of blinks does not add any value to ourestimator of the positions of the emitters. The real number of blinks per emitter can becompared to the

∑nj=1 P(Hj = i|θ) vector. The real number of blinks per emitter and the

estimated ones do not have a very different pattern which is what was expected.

8.7 Different approach for estimating the number of emitters

Until this section the designed algorithm was only based on the real number of emitterswhich are known for the simulated datasets. In practice the estimator derived in Section 7could be used as the number of emitters. In that section a 95% confidence interval for thenumber of emitters was derived as well. The algorithm described in this section could beexecuted for all values in this confidence interval. Next the likelihood could be extended

41

with the probability given in (29) and we could estimate k with the k that maximizes thelikelihood. The log likelihood used here is

l(θ, λ|y1,y2, ...,yn,) =n∑j=1

ln fYj|θ(yj)−N∑i=1

λi(µTi µi−r2)− ln(P

Xj = k

∣∣∣∣∣∣Xj∑i=1

Bli = bj

),

(54)

where fYj|θ(yj) = (2π)−32 |Σj|−

12 exp

(−1

2(Yj − µHj)

TΣj−1(Yj − µHj)

).

To compare this estimator with the estimator from Section 7 the MSE’s should becompared. The MSE for this estimator is determined the same way as in Section 7.Since the ME algorithm is computationally intensive this will be done in the upcomingmonths. Recall that this estimator would only depend on the way the uncertainties of thelocalizations are modeled and would not depend on any part of the Markov model.

42

9 Adapt dSTORM images

At this moment a method to estimate the number of (non-bleached) dyes at equilibrium isavailable. Next to this an algorithm to estimate the original positions of emitters has beenintroduced. In this section the dSTORM imaging will be extended to end up with an imageof the locations of the emitters. It should be remarked that the number of photons forthe real dSTORM data did not seem to be appropriate. If these numbers were used therewould be localizations that could not lay on the bead’s surface. To solve this problem thenumber of photons was divided by a factor such that the mean number of photons equalsthe expected number of photons based on our Markov model.

First of all the number of dyes is estimated. In Figure 16 the probability distributionfor the number of dyes can be found.

Figure 16: Probability distribution for the number of dyes.

The maximum likelihood estimator equals 78 and the 95% confidence interval equals(67, 88). Using this point estimate the algorithm for estimating positions can be used.The upper bound for the log likelihood (based on the projections of all the points on thesphere) equals −28682.69. The algorithm finds an emitter distribution with log likelihood−36033.89. In Figure 17 one can observe the localizations (the original dSTORM image).In Figure 18 one can find the estimated positions of the emitters.

43

Figure 17: dSTORM image.

44

Figure 18: Processed dSTORM image.

45

10 Conclusion, discussion and recommendations

In this thesis the dSTORM technique was described and a Markov chain model to describethe dynamics involved in the imaging process was presented. Next, an approach to estimatethe model parameters as well as a method to pre validate the different parts of the model(GoF test) were proposed. The mathematical model was implemented in Java and thestatistical software R to construct datasets comparable to the output of dSTORM imaging.This model can help in understanding the dynamics and the generated datasets can beused to make statements about the quality of estimators for the number of dyes on a beadand the localizations of these dyes. In Section 6.1 the final output of the simulation wascompared to the real dSTORM images from which we conclude that the Markov simulationgenerates comparable images.

Mathematical Model The model presented in this thesis is the first mathematicalmodel to describe and simulate the dSTORM imaging process. From the data analysis insection 4 it became clear that the model is too simple to describe the dynamics. The datawas still being gathered at the moment of writing. If this dataset is treated as the finalone then the data gives rise to the opportunity that there exists reactivation of bleacheddyes. The Markov chain model could be extended with a rate from the bleached state tothe active state. The estimation of the transition rates will become more complex, butdefinitely possible. Furthermore, the state probabilities can be determined the same wayas before, see Appendix D. Next to possible reactivation one should think of two darkstates. To include this a hidden Markov model should be used, therefore the model willbecome more complex.

Homogeneity assumption In Section 6.2 the homogeneity assumption was validated.The problem consisting of testing whether the sources of clustered data are clustered aswell turned out to be a complex problem. The test statistic presented has been provento have a power of 94.0% for specific inhomogeneous. It is important to note that nextto the homogeneity assumption the whole Markov Model was tested. We concluded thatthe homogeneity assumptions should be rejected and that the real datasets can not be theresult of homogeneous distributed dyes. Since the Markov Model did not fit perfectly thedata it is unclear if the rejection is due to the lack of fit of the Markov Model or thatthe homogeneity assumption is not appropriate. It is important to repeat that the datasetis still being gathered and the test should be repeated with the improved dataset. Weconclude that the random functionalization assumption, which is common in NP literatureshould be intensively investigated in the future. Specialists should think about the mostprobable inhomogeneous distribution of the dyes for which the power of the test can becomputed.

Number of dyes on a bead The estimators presented in this thesis do not explicitly usethe homogeneity assumption and could also be valuable if the dyes are inhomogeneously

46

distributed. The overcounting problem is tackled by the development of an estimator forthe number of (non-bleached) emitters at the equilibrium time based on the total numberof localizations of that bead. A script to determine the most likely point estimate aswell a 95% confidence interval for the number of emitters has been written in R. Basedon the MSE (46.0) we concluded that this estimator performs better than the unbiasedestimator that divides the total number of localizations by the expected number of blinks(79.6). We conclude that it might prove very hard to find a better estimator based onthis underlying model. It is important to note that this estimator only depends on thedistribution of the number of blinks per emitter (here the geometric distribution was used).Based on the data analysis, see Section 4.1.1, the null hypothesis that the number of blinksis geometrically distributed should be rejected. The estimator is based on this distributionand will not be appropriate if the geometric distribution does not fit. Nevertheless thedata available at this point is not evaluated yet and will be adapted by the specialists.As pointed out in Section 4.2 the data from the dSTORM imaging only consists of thoseblinks that emitted more photons than the threshold for the software. For this reasonBl ∼ Bin(Nblinks,P(TON > 0.007)) and is not expected to follow a geometric distribution.Besides, based on the geometric probability plot we conclude that a huge portion of thedata does fit the distribution. To solve the overcounting problem the exact positions ofthe estimated number of dyes should be estimated as well.

A solution for overcounting An EM algorithm has been presented to solve the inverseproblem of determining the real positions of the emitters (non-bleached at the equilibriumtime). This works independently of the Markov model and only uses a point estimate forthe number of emitters. Now dSTORM images can be adapted to come up with images thatreally show the locations of the dyes and one can observe the distribution of the emittersover the NP. The presented algorithm depends on the initial positions of the emitters thatserves as input for the algorithm. In this thesis the k-means algorithm was used as aninitial estimate. In the future, other initial estimates, especially estimates that do takeinto account the accuracy of each point, could be evaluated. The algorithm can only finda local maximum and it is hard to say if it is near the global maximum. To determinethe quality of the algorithm simulated datasets, from which the locations of the dyes areknown, were used. In Section 8.5 the log likelihood value of the final estimate (−13402.46)and the value of the real positions of the emitters (−13485.14) were compared. The loglikelihood value of the estimates was slightly higher than the value of the real positionsand for this reason we concluded that using the outcome of the k-means algorithm as aninitial estimate results in a local maximum that is close to the global maximum value.The quality measure of the algorithm equals the weight of the least weighted matching ina weighted complete bipartite graph. Here the Euclidean distance between an estimatedand a real position is used as weight of an edge. Using this measure we conclude thatthe proposed algorithm is a little bit more appropriate in this setting than a standardk-means algorithm. The difference was not satisfying and a problem in the algorithm isdiscovered. The implemented algorithm should use the certainty of a localization when the

47

localizations are weighted to determine the position of the real emitters. At this momentthe certainty is only just in the labeling part of the implemented algorithm. Togetherwith the point estimate for the number of dyes on a bead this algorithm presents a firstmathematically-founded solution to the overcounting problem.

By including the number of photons as an unknown random variable in the likelihoodused in Section 8.3 another opportunity to estimate the number of non-bleached dyes atequilibrium was noticed. With this estimate for the number of dyes all the estimators arecompletely independent of the Markov model. The estimate only depends on the likelihoodwhich includes a Poisson probability (regarding the number of dyes on the whole sphere)and the different tri-variate normal distributions for the positions of the localizations.

Here the estimator for the number of dyes and the algorithm to estimate the position ofthese dyes are evaluated separately. Eventually one would be interested in the performanceof the image constructed using both procedures. This is yet another challenge for the future.

Double blinking The effect of double blinking was not incorporated in our model,because it would highly complicate the distributions used in this thesis. Given the dataused in this thesis on average 66 of the expected 390 blinks would be double blinks. Theoff-time data used in this thesis did not seem appropriate for the process. As a resultthe whole process took place in a very short time period which results in a large numberof double blinking. When this dataset is evaluated and improved better conclusions canbe made. Even though the specialists expect that this effect will be negligible the readershould be aware of this problem.

Real number of dyes on a bead In Section 7.2 a method to estimate the real numberof dyes on a bead is presented. In this section an extra opportunity to validate the timespan of the Markov model was presented. The data used in this thesis had a lack of fit forthe off-time distribution for which reason this extra opportunity is inappropriate. Sincethere is no data available of dyes that are bleached before the equilibrium time has beenreached we cannot do better than estimate the bleached number this way and informationof their positions can not be given at all.

Variances of Wang In Section 2.4 the results of the thesis of W.Wang were presented.From experiments it is known that the expected number of emitted photons per activation,per dye, is 4000. Wang his formulas would suggest that the resolution of the dSTORMimaging technique is even more precise than 10nm in the x, y direction and 20nm in thez direction. This is much smaller than the 20nm and 50nm that are measured. In theupcoming months a calibration curve between the emitted number of photons and thecertainty should be constructed. Next to this the measurement of the number of emittedphotons should be checked. The simulation presented in this thesis can be very easilyadapted with another variance measure. Al the other estimators presented here can beused in the same was as was shown.

48

Direct application of the thesis The practical applications of estimators and thealgorithm described in this thesis can be found in Section 9. Here a real dSTORM imageis processed to solve the overcounting problem and show how the dyes on this bead aredistributed. The results presented in this thesis are based on one specific bead size (radius of165nm). The parameters and estimate quality measures could be determined for differentsize of beads in the future. Next we could search for relations between the size of thebeads, the functionalization level and the transition parameters. The quality measuresof the different algorithms presented can be determined for the different functionalizationlevels.

49

References

[1] Ehrenberg, M.: Super-resolved fluorescence microscopy; Scientific background on theNobel Prize in Chemistry 2014, The Royal Swedish academy of sciences accessed18-02-2015: http://www.nobelprize.org/nobel_prizes/chemistry/laureates/

2014/advanced-chemistryprize2014.pdf.

[2] Schermelleh, L., Heintzmann, R., Leonhardt, H.: A guide to super-resolution fluores-cence microscopy Journal of Cell Biology 190 (2), pp. 165-175, 2010.

[3] Farokhzad, O.C., Langer, R.: Impact of nanotechnology on drug delivery , ACS Nano3 (1), pp. 16-20, 2009.

[4] Lin G., Zhang H, Huang L.: Smart polymeric nanoparticles for cancer gene delivery ,Molecular Pharmaceutics, 12 (2), pp. 314321, 2015.

[5] Joel S.Silfies, Stanley A. Schwartz and Michael W.Davidson: Stochastic Optical Re-construction Microscopy (STORM), accessed 18-02-2015: http://www.microscopyu.com/articles/superresolution/stormintro.html.

[6] Rust, M.J., Bates, M., Zhuang, X.: Sub-diffraction-limit imaging by stochastic opticalreconstruction microscopy (STORM), Nature Methods 3 (10), pp. 793-795, 2006.

[7] Huang, B., Wang, W., Bates, M., Zhuang, X.: Three-dimensional super-resolutionimaging by stochastic optical reconstruction microscopy, Science 319 (5864), pp. 810-813, 2008.

[8] Dempsey, G.T., Vaughan, J.C., Chen, K.H., Bates, M., Zhuang, X.: Evaluation offluorophores for optimal performance in localization-based super-resolution imaging,Nature Methods 8 (12), pp. 1027-1040, 2011.

[9] Sengupta, P., Jovanovic-Talisman, T., Skoko, D., Veatch, S.L., Lippincott-Schwartz,J.: Probing protein heterogeneity in the plasma membrane using PALM and paircorrelation analysis, Nature Methods 8 (11), pp. 969-975, 2011.

[10] Zhao, Z. W., Roy, R., Gebhardt, J. C. M., Suter, D. M., Chapman, A. R., & Xie, X.S.: Spatial organization of RNA polymerase II inside a mammalian cell nucleus re-vealed by reflected light-sheet superresolution microscopy. Proceedings of the NationalAcademy of Sciences, 111(2), pp. 681-686, 2014.

[11] Lee, S.-H., Shin, J.Y., Lee, A., Bustamante, C.: Counting single photoactivatablefluorescent molecules by photoactivated localization microscopy (PALM), Proceedingsof the National Academy of Sciences of the United States of America, 109 (43) Issue43, pp. 17436-17441, 2012.

50

http://www.nobelprize.org/nobel_prizes/chemistry/laureates/2014/advanced-chemistryprize2014.pdf

http://www.nobelprize.org/nobel_prizes/chemistry/laureates/2014/advanced-chemistryprize2014.pdf

http://www.microscopyu.com/articles/superresolution/stormintro.html

http://www.microscopyu.com/articles/superresolution/stormintro.html

[12] Rollins G.C. , Shin J.Y., Bustamante C., & Steve Press S.: Stochastic approach tothe molecular counting problem in superresolution microscopy. Proceedings of theNational Academy of Sciences, 112(2), pp. 110-118 2015.

[13] Albertazzi, L., Van Der Zwaag, D., Leenders, C.M.A., Van Der Hofstad, R.W., Mei-jer, E.W.: Probing exchange pathways in one-dimensional aggregates with super-resolution microscopy. Science, 344 (6183), pp. 491-495, 2014.

[14] Thompson, M.A., Lew, M.D., Moerner, W.E. Extending microscopic resolution withsingle-molecule imaging and active control, Annual Review of Biophysics, 41 (1), pp.321-342, 2012

[15] Holtzer, L., Meckel, T., Schmidt, T. Nanometric three-dimensional tracking of indi-vidual quantum dots in cells. Applied Physics Letters 90 (5), 2007.

[16] Holtzer, L., Schmidt, T. , Single particle tracking and single molecule energy transfer,Chapter 2: The Tracking of Individual Molecules in Cells and Tissues, pp. 25-42,Wiley-VCH, 2010.

[17] Holtzer, L. Chapter 2, Nanometric three-dimensional tracking of quantum dots inliving cells, 2009.

[18] Wang, W. Structures and Dynamics in Live Bacteria Revealed by Super-ResolutionFluorescence Microscopy, Harvard University Cambridge, Massachusetts, 2012.

[19] Schutz, G.J., Pastushenko, V.Ph., Gruber, H.J., Pragl, B., Schindler, H. 3D Imagingof Individual Ion Channels in Live Cells at 40 nm Resolution, Single Molecules 1 (1),pp. 25-31, 2000.

[20] Anderson, C.M., Georgiou, G.N., Morrisoni, I.E.G., Stevenson, G.V.W., Cherry, R.J.Tracking of cell surface receptors by fluorescence digital imaging microscopy using acharge-coupled device camera. Low-density lipoprotein and influenza virus receptormobility at 4C, Source of the DocumentJournal of Cell Science 101 (2), pp. 415-425,1992.

[21] Duda, R.O., Hart, P.E., Stork, D.G. Pattern Classification. John Wiley & Sons, 2001.

[22] Hastie, T., Tibshirani, R., Friedman, J. The Elements of Statistical Learning DataMining, Inference, and Prediction. Springer, 2009.

[23] Kulldorff, M. Tests of spatial randomness adjusted for an inhomogeneity: A generalframework, Journal of the American Statistical Association 101 (475), pp. 1289-1305,2006.

[24] Voinov, V.G., Nikulin, M.S. Chi-square goodness-of-fit test for one- and multidimen-sional discrete distributions, Journal of Mathematical Sciences 68 (4), pp. 438-450,1994.

51

[25] Feng, J., Ip, H.H.S. Chi-square goodness-of-fit test of 3D point correspondence formodel similarity measure and analysis, Lecture Notes in Computer Science 3568, pp.445-453, 2005.

[26] Voinov, V., Nikulin, M., Balakrishnan, N. Chi-Squared Goodness of Fit Tests withApplications, Academic Press, 2013.

[27] Rogerson, P.A. The detection of clusters using a spatial version of the chi-squaregoodness-of-fit statistic, Geographical Analysis 31 (2), pp. 130-147, 1999.

[28] Oden, N. Adjusting Moran’s I for population density, Statistics in Medicine,14 (1),pp. 17-26, 1995.

[29] Tango, T. A class of tests for detecting ’general’ and ’focused’ clustering of rare dis-eases, Statistics in Medicine 14 (21-22), pp. 2323-2334, 1995.

[30] Dwass, M. Modified Randomization Tests for Nonparametric Hypotheses, Ann. Math.Statist., Volume 28, Number 1, 181-187, 1957.

[31] Kulldorff, M. A spatial scan statistic, Communications in Statistics - Theory andMethods 26 (6), pp. 1481-1496, 1997.

[32] Kulldorff, M., Tango, T., Park, P.J. Power comparisons for disease clustering tests,Computational Statistics and Data Analysis 42 (4), pp. 665-684, 2003.

[33] Abramovich, F. Statistical theory; A concise introduction, CRC Press, 2013.

[34] Castro, R. Lecture 2 and 3 - Goodness-of-Fit (GoF) Tests, Lecture notes of AppliedStatistics (MasterMath), 2013.

[35] D’Agostino,R.B, Stephens, M.A. Goodness-of-fit Techniques Volume 68 of Statistics,textbooks and monograph. Dekker, New York, 1986.

[36] Streit, R.L. Poisson Point Processes: Imaging, Tracking, and Sensing, Springer-Verlag,New York, 2010.

[37] Resnick, S. Extreme Values, Regular Variation, and Point Processes, volume 4 OfApplied Probability. A Series of the Applied Probability Trust. Springer-Verlag, NewYork, 1987.

[38] Grimmett, G., Stirzaker, D. Probability and Random Processes, Oxford UniversityPress, New York, 2001.

[39] Ross, S.M. Introduction to Probability Models, Ninth Edition, Academic Press, Inc.Orlando, 2006.

[40] Marsaglia, G. Choosing a Point from the Surface of a Sphere, Ann. Math. Stat. 43,645-646, 1972.

[41] Schrijver, A. A Course in Combinatorial Optimization, CWI Amsterdam, 2012.

52

A Goodness-of-fit plots

A.1 Number of blinks original datasets

Number of blinks

Pro

babi

lity

0 50 100 150

0.00

0.05

0.10

0.15

0.20

0.25

Figure 19: PDF geometric distribution and frequency counts

5 10 15 20 25 30

0

20

40

60

80

Theoretical values

Em

piric

al q

uant

iles

Figure 20: Geometric probability plot for MLE geometric distribution and frequency counts

53

A.2 Number of blinks outliers removed

Number of blinks

Pro

babi

lity

5 10 15 20 25

0.00

0.05

0.10

0.15

0.20

0.25

Figure 21: PDF geometric distribution and frequency counts

5 10 15 20

5

10

15

20

25

Theoretical values

Em

piric

al q

uant

iles

Figure 22: Geometric probability plot for MLE geometric distribution and frequency counts

54

A.3 On-time

On time

Pro

babi

lity

dens

ity

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0

5

10

15

20

25

30

Figure 23: PDF exponential distribution and frequency counts

0.00 0.05 0.10 0.15

0.00

0.05

0.10

0.15

0.20

Theoretical values

Em

piric

al q

uant

iles

Figure 24: Exponential probability plot and frequency counts

55

B Real and simulated images

Figure 25: Real STORM image.

56

Figure 26: Real STORM image.

57

Figure 27: Simulated image.

58

Figure 28: Simulated image.

59

C Real and estimated positions emitters

Figure 29: Real positions emitters (green) versus estimated positions (red), D = 28.01nm.

60


61


62

D Calculations pA,t

First of all the Q matrix was determined,

Q =

−0.0725 0.0725 031.32 −37.445 6.125

0 0 0

. (55)

Next the diagonal matrix with the eigenvalues as elements, D and the matrix consistingof the eigenvectors, U , can be determined,

D =

−37.5057 0 00 −0.0118399 00 0 0

, (56)

U =

−0.00193678 0.766953 0.577350.999998 0.641703 0.57735

0 0 0.57735

. (57)

Since pA,t = U.etD.U−1 the probability of an emitter in the different states at time t giventhat it started in the active state equals

pA,t =

001

+ e−37.5057t

0.00161787−0.001933650.000315782

+ e−0.0118399t

0.9983820.00193365−1.00032

. (58)

63

improving stochastic optical reconstruction...

Documents