automatic cell detection in bright-field microscope images ...€¦ · a cell is expressed in...

14
1 Automatic Cell Detection in Bright-Field Microscope Images Using SIFT, Random Forests, and Hierarchical Clustering Firas Mualla 1 , Simon Schöll 1,2,3 , Björn Sommerfeldt 4 , Andreas Maier 1,3 , and Joachim Hornegger, Member, IEEE 1,3 1 Pattern Recognition Lab, Friedrich-Alexander University Erlangen-Nuremberg 2 ASTRUM IT GmbH, Erlangen 3 SAOT Graduate School in Advanced Optical Technologies 4 Institute of Bioprocess Engineering, Friedrich-Alexander University Erlangen-Nuremberg Abstract—We present a novel machine learning-based system for unstained cell detection in bright-field microscope images. The system is fully automatic since it requires no manual parameter tuning. It is also highly invariant with respect to illumination conditions and to the size and orientation of cells. Images from two adherent cell lines and one suspension cell line were used in the evaluation for a total number of more than 3500 cells. Besides real images, simulated images were also used in the evaluation. The detection error was between approximately zero and 15.5% which is a significantly superior performance compared to baseline approaches. 1. I NTRODUCTION Though fluorescence microscopy is widely used, many applications require the use of unstained cell imaging. One of motivating factors is that the signal obtained by fluorescence microscopy cannot be guaranteed to cover the entire cell, and therefore it is not suited for cell boundary segmentation [2]. Such shape information is important when studying cellular processes like mitosis and apoptosis [35]. Furthermore, the fluorescent dyes are sometimes considered non-neutral or even toxic. For instance, cell-tracing dyes used in cell viability iden- tification and cytotoxicity tests dramatically change the cell stiffness and the cell-to-probe adhesion [24]. Phase-contrast microscopy and bright-field microscopy can be used for un- stained cell imaging. Compared to phase-contrast, bright-field imaging does not need a specialized hardware and it is thus cheaper and easier to implement [9]. Unstained cell recognition in bright-field images is a chal- lenging problem [20], [21], [36], [38], [42]. Cells exhibit a great diversity in shape and size. In addition, a simple microscopy technique like bright-field does not always offer sufficient contrast between cells and background [8], [36]. In fact, adherent cells are almost invisible at focus [1], [2], [4]. The contrast can be improved by defocusing the microscope. An image is considered positively defocused when the objec- tive approaches the object and negative otherwise [1]. Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. In [4], a positively defocused image was segmented by applying the watershed algorithm on the distance transform of a thresholded image. An in-focus image was processed with the self-complementary top-hat followed by thresholding and watershed. A negatively defocused image was preprocessed with a Canny edge detector before analyzed by anisotropic contour completion [10]. These three results were then com- bined in order to select micro-injection points inside the cells. In [3], a positively defocused image was subtracted from a negatively defocused one. The difference image was thresh- olded and the result was post-processed with an appropriately chosen size filter. Each connected component of the resulting mask was used to initialize a level-set evolution. The aforementioned techniques are based on heuristics. In contrast, there is a family of methods for cell detection that are learning-based. In these approaches, a classifier labels each pixel as either a cell or a non-cell pixel. The output of the pixel-wise classifier delivers a confidence map. The maxima of the latter correspond to cell centers. For training the classifier, features are extracted from a fixed- size patch sampled in the neighborhood of the pixel under investigation. Several papers share this common strategy de- spite the differences in the classifier model and the image modality. For example, [31] used principal component analysis (PCA) features extracted from 15 x 15 sized patches which are then analyzed by an artificial neural network. In [20], Fisher discriminant analysis was used instead of PCA. In [21], a support vector machine (SVM) replaced the neural network. In [41], a bag of local Bayesian classifiers was used, each of them trained on a cluster of the training data. Most of these learning-based approaches require parame- ter tuning. These parameters are related to thresholding the confidence map, applying morphological operators on the thresholding result, and/or searching for the maxima of the confidence map. In all mentioned learning-based approaches, the optimal size of the patch should conform to the mean cell size which is not always known a priori. In addition, the square neighborhood used in these methods, does not fit non-circular cells. We think that a good cell detection system should fulfill

Upload: others

Post on 19-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

1

Automatic Cell Detection in Bright-FieldMicroscope Images Using SIFT, Random Forests,

and Hierarchical ClusteringFiras Mualla1, Simon Schöll1,2,3, Björn Sommerfeldt4, Andreas Maier1,3, and Joachim Hornegger, Member, IEEE1,3

1Pattern Recognition Lab, Friedrich-Alexander University Erlangen-Nuremberg2ASTRUM IT GmbH, Erlangen

3SAOT Graduate School in Advanced Optical Technologies4Institute of Bioprocess Engineering, Friedrich-Alexander University Erlangen-Nuremberg

Abstract—We present a novel machine learning-based systemfor unstained cell detection in bright-field microscope images. Thesystem is fully automatic since it requires no manual parametertuning. It is also highly invariant with respect to illuminationconditions and to the size and orientation of cells. Imagesfrom two adherent cell lines and one suspension cell line wereused in the evaluation for a total number of more than 3500cells. Besides real images, simulated images were also used inthe evaluation. The detection error was between approximatelyzero and 15.5% which is a significantly superior performancecompared to baseline approaches.

1. INTRODUCTION

Though fluorescence microscopy is widely used, manyapplications require the use of unstained cell imaging. One ofmotivating factors is that the signal obtained by fluorescencemicroscopy cannot be guaranteed to cover the entire cell, andtherefore it is not suited for cell boundary segmentation [2].Such shape information is important when studying cellularprocesses like mitosis and apoptosis [35]. Furthermore, thefluorescent dyes are sometimes considered non-neutral or eventoxic. For instance, cell-tracing dyes used in cell viability iden-tification and cytotoxicity tests dramatically change the cellstiffness and the cell-to-probe adhesion [24]. Phase-contrastmicroscopy and bright-field microscopy can be used for un-stained cell imaging. Compared to phase-contrast, bright-fieldimaging does not need a specialized hardware and it is thuscheaper and easier to implement [9].

Unstained cell recognition in bright-field images is a chal-lenging problem [20], [21], [36], [38], [42]. Cells exhibita great diversity in shape and size. In addition, a simplemicroscopy technique like bright-field does not always offersufficient contrast between cells and background [8], [36]. Infact, adherent cells are almost invisible at focus [1], [2], [4].The contrast can be improved by defocusing the microscope.An image is considered positively defocused when the objec-tive approaches the object and negative otherwise [1].

Copyright (c) 2013 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected].

In [4], a positively defocused image was segmented byapplying the watershed algorithm on the distance transform ofa thresholded image. An in-focus image was processed withthe self-complementary top-hat followed by thresholding andwatershed. A negatively defocused image was preprocessedwith a Canny edge detector before analyzed by anisotropiccontour completion [10]. These three results were then com-bined in order to select micro-injection points inside the cells.

In [3], a positively defocused image was subtracted from anegatively defocused one. The difference image was thresh-olded and the result was post-processed with an appropriatelychosen size filter. Each connected component of the resultingmask was used to initialize a level-set evolution.

The aforementioned techniques are based on heuristics. Incontrast, there is a family of methods for cell detection thatare learning-based. In these approaches, a classifier labels eachpixel as either a cell or a non-cell pixel. The output of thepixel-wise classifier delivers a confidence map. The maximaof the latter correspond to cell centers.For training the classifier, features are extracted from a fixed-size patch sampled in the neighborhood of the pixel underinvestigation. Several papers share this common strategy de-spite the differences in the classifier model and the imagemodality. For example, [31] used principal component analysis(PCA) features extracted from 15 x 15 sized patches which arethen analyzed by an artificial neural network. In [20], Fisherdiscriminant analysis was used instead of PCA. In [21], asupport vector machine (SVM) replaced the neural network.In [41], a bag of local Bayesian classifiers was used, each ofthem trained on a cluster of the training data.

Most of these learning-based approaches require parame-ter tuning. These parameters are related to thresholding theconfidence map, applying morphological operators on thethresholding result, and/or searching for the maxima of theconfidence map. In all mentioned learning-based approaches,the optimal size of the patch should conform to the mean cellsize which is not always known a priori. In addition, the squareneighborhood used in these methods, does not fit non-circularcells.

We think that a good cell detection system should fulfill

Page 2: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

2

crossinner

backgroundcell

SIFT Profile classifier

Weighted average

Keypoint classifier

Hierar-chical clustering

(a) A positively defocused input image. The contrast was improved for clarity.

(b) Each circle represents a keypoint. The radius lengths and angles correspond to the keypoint scale and orientation.

(c) Each keypoint is classified as either a cell keypoint or a background keypoint.

(d) A profile between each two nearby keypoints is classified as either "inner" or "cross" profile.

(e) The result of the agglomerative hierarchical clustering.

(f) The detection result. Each sign marks a cell.

Fig. 1. A general overview of our automatic cell detection approach.

the following criteria: a) It should be invariant to cell size andorientation. b) It should be invariant to illumination conditions.c) It should be automatic, i.e. it does not need manualparameter tuning. d) If tracking is required after detection, thenthe detected cells should be equipped with trackable features.e) Lastly, the degradation of the detection rate because ofthe previous invariance requirements should be minimal. Withthese criteria in mind, we developed a learning-based systemfor automatic unstained cell detection in bright-field images.

We tested the system on three cell-lines: adherent CHO,adherent L929, and Sf21 in suspension. The adherent cellshad very low contrast and were almost invisible at focus. Inaddition, a cell simulation software [17] was used to generatesynthetic images with noise and illumination artifacts. Allanalyses on the real and the simulated images were donewithout any manual parameter tuning. The only differencebetween these experiments was in the training data.

A. System overview

A cell is expressed in bright-field images as one or moreblobs in intensity. A blob is a maximum of the normalizedLaplacian in scale space [19]. It is also called an interest pointor a keypoint. SIFT [22] is a class of local image featureswhich characterize the neighborhood of each keypoint in ascale- and orientation-independent way.

Our proposed cell detection algorithm is sketched in Fig-ure 1. The starting point is the extraction of SIFT keypointsfrom a defocused image. These keypoints are classified intocell keypoints and background keypoints. For each cell key-point, an intensity profile to each nearby cell keypoint isextracted and classified into either inner or cross profile. A

profile between two keypoints is called inner if these twokeypoints belong to the same cell. Otherwise, it is called crossprofile. The output of the profile classifier is probabilistic.The probability that a profile between two keypoints is aninner profile can be seen as a similarity measure between thekeypoints. Based on this similarity measure, an agglomerativehierarchical clustering of the keypoints with a customizedlinkage method is applied. A weighted mean of the keypointcoordinates inside each cluster marks the detected cell center.

No manual tuning of any parameters is required duringtraining or detection. The system learns its parameters auto-matically in a scale- and orientation-invariant manner.

The rest of the paper is organized as follows: Section 2describes the features and the classifier model of the keypointclassifier. Section 3 discusses the different aspects of theprofile learning. In Section 4, the hierarchical clustering step isexplained. Section 5 provides details of the training phase. Thedetection quality measures are defined in Section 6. Section 7contains the results. The paper ends with a discussion inSection 8 and a conclusion in Section 9.

2. KEYPOINT LEARNING

In order to determine, whether the cells in the training datatend to have a bright or a dark appearance, we perform acalibration step before the actual cell detection procedure. Inthe remainder of this paper, we call a set of keypoints one-sided if all the keypoints in the set have positive difference-of-Gaussians (DOG) values or all of them have negative DOGvalues. In order to automatically detect the keypoint typewhich fits the training data, the system uses the followingmeasure:

Page 3: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

3

PositiveFit =

∑Nc

i=1 s(pi)|DOG(pi)|H(DOG(pi))∑Nc

i=1 s(pi)|DOG(pi)|(1)

where pi, i = 1..NC are the cell keypoints in the training data,NC is their number, s is the scale, and H is the Heaviside stepfunction. If PositiveFit evaluates to more than 0.5, the systemignores keypoints with negative DOG values during trainingand detection. Otherwise, it ignores keypoints with positiveDOG values.

After that, the maximum |DOG| in each training cell iscomputed and the first percentile of all the previous maximais considered a SIFT threshold. During training and detection,SIFT keypoints which have lower |DOG| than this thresholdwill be discarded.

In order to separate cell keypoints from non-cell keypoints,we used several sets of features from the literature and adjustedthem. We utilized SIFT to make these features scale- andorientation-invariant. We also made them invariant to localshift of intensity. The importance of invariance to the localshift of intensity is discussed in Section 8.

0.3 s

orientation

Fig. 2. Radial intensity stencil. The stencil is aligned with the keypointorientation. The distance between two successive nodes is 0.3 s, where s isthe keypoint scale.

α

Δ

I

pθk

p'

keypoint orientation

Fig. 3. Ray features adjusted to keypoints. This figure is an adapted versionof the pixel-based ray features in [34].

More specifically, the following set of features was extractedfor each keypoint:

1) Intensity stencil: We computed intensity stencils [12],[26] around each keypoint. As shown in Figure 2,we align the stencil with the keypoint orientation andmeasure the distance between the sampling points inunits of scale instead of pixels. The result is a scale-and orientation-invariant stencil. In order to make thestencil invariant to the local shift of intensity, we sub-tract the mean intensity of the stencil from all stencilnodes. The intensity values at the stencil nodes after theaforementioned subtraction form the stencil feature set.

2) Ray features: In order to compute ray features [34]at a keypoint p (cf. Figure 3) in an image I, theclosest edge point p′ along a direction θk is found.Ray features according to [34] are four sets of features.The first three are: the distance Rd(p, θk) between pand p′, the gradient norm Rn(p, θk) at p′, and thegradient angle α = Ra(p, θk) at p′. If eight valuesof the angle θ are used, i.e. k = 1..8, we obtain24 features. The fourth set is the distance differenceRdd(p, θk, θk′) = |Rd(p, θk) − Rd(p, θk′)|. Therefore,for 8 angles, the fourth feature set is 8 · 7/2 = 28features.Ray features are well-designed but are sensitive to scaleand orientation. In order to make them orientation-invariant, we define all angles, i.e. the eight θk anglesand the gradient angle feature Ra(p, θk), with respectto the keypoint orientation.In order to make the ray features scale-invariant, wemeasure the distances Rd(p, θk) and Rdd(p, θk, θk′) interms of scale. Furthermore, we compute the gradientusing the following equation for its x component:

∂I(p′)

∂x= I(p′x + τs(p), p′y)− I(p′x, p

′y) (2)

where s(p) is the scale of the keypoint at p. τ is aconstant that we set to 1. A similar equation is used forthe y component. Before applying Eq. (2), the imageis smoothed using a Gaussian kernel with a standarddeviation equal to 1.The edges are obtained using Canny edge detection [6].The thresholds were set to the default values in theMatlab implementation of Canny.

3) Variance map: Based on the variance map [40], we cre-ated a keypoint-based scale-invariant version by takinga variable-size neighborhood with a size proportionalto the keypoint scale. For each keypoint p, we extractthree variance map features VMap(p, 2), VMap(p, 4),and VMap(p, 6) which correspond to variance map ina square neighborhood of side length 2s(p), 4s(p), and6s(p), respectively. The map is by construction invariantto the local shift of intensity.

4) SIFT descriptors: SIFT features were used accord-ing to the original publication [22], as they are in-herently scale- and orientation-invariant and partiallyillumination-invariant.

5) Other features: The values of the DOG and the princi-pal curvatures ratio (PCR) at each keypoint were alsoobtained from SIFT.

Page 4: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

4

We chose random forest (RF) [5] as a background-cellclassifier. It is an ensemble of classifiers, each of which isa decision tree trained on a bootstrap sample of the trainingdata. During training, splits are based on a feature at eachnode. In contrast to conventional decision trees, only a randomsubset of the features is considered at each node of a RF tree.In order to be able to deal with imbalanced class labels, wechose a balanced random forest (BRF) [7]. Similar to [14],we set the number of trees to 500 and the number of therandomly selected features to FN/5 , where FN is the numberof features.

3. PROFILE LEARNING

The keypoints which were classified as background by thekeypoint classifier are discarded. The remaining ones are cellkeypoints. In order to resolve whether two cell keypointsbelong to the same cell, we extract an intensity profile f(v)between them, where v = 1..NF are the sampling points alongthe profile, and NF is the number of the points. We then extractprofile features and use them to classify the profile into eitherinner or cross profile.

The following features are extracted for each intensityprofile f(v):• The standard deviation, the skewness, and the kurtosis of

the profile.• The standard deviation, the skewness, the kurtosis, the

maximum, the minimum, and the mean of the firstderivative df/dv and the second derivative d2f/dv2 ofthe profile.

• Two other features:

V1 = max(f)−min(f) (3)

V2 = f(1)− 2max(f) + f(NF ) (4)

Fig. 4. Profile sets between different keypoints.

The profiles are sampled using a fixed number of points NF =50. The derivatives are Gaussian derivatives with σ = 0.1NF .

As a classifier model, we use again BRF.For the training of the profile classifier and during the

detection phase, the algorithm extracts a profile between twokeypoints only if they are nearby. In order to achieve this goalin a scale-invariant manner, the algorithm learns the maximuminner profile length from the training data.

Naturally, it is inappropriate to measure the profile lengthin pixels. We use the average scale U(I) of all cell keypointsinside the image as a profile length unit for this image.

During training, the maximum inner profile length Lk ineach training image Ik is computed in terms of U(Ik). Thenthe maximum of all the Lk values is considered the maximuminner profile length in the training data, L = max(Lk), k =1..Nt, where Nt is the number of training images.

Given a cell keypoint p1 in an image I, another cell keypointp2 in I is considered nearby if the Euclidean distance betweenthe two keypoints, in units of U(I), is smaller than ζL. ζ is asafety parameter that we set to 2.

The image area which is sampled by one intensity profileis actually very small. It is thus plausible to expect an im-provement in the detection accuracy when the profile capturesinformation from a wider image area.

Instead of extracting one profile between the two consideredkeypoints, one could extract a set of profiles, i.e. severalparallel profiles as demonstrated in Figure 4. The geometry ofthe set can be described, for instance, in terms of the maximumscale of the two keypoints. In Figure 4, the profile set containsfive parallel profiles, one profile every 0.75smax

ij . smaxij is the

maximum scale of the two considered keypoints pi and pj .Alternatively, smoothing is a well-known fast and simple

means of information consolidation. We make smoothing scaleadaptive by setting the standard deviation of the Gaussiankernel which is used to smooth an image I to the mean scaleof the one-sided keypoints of I. We call this process scaleadaptive smoothing (SAS). A third approach is to combinethe previous two ones. In our algorithm, the SAS was used.

If the number of the inner profiles in the training data istoo small, the algorithm extracts artificial inner profiles untilthe number of the inner profiles in the training data is at leastNinner. If the number of the cross profiles in the training datais too small, the algorithm generates artificial cross profilesuntil the number of the cross profiles in the training data isat least Ncross. They are generated by shifting the consideredtraining image I by βU(I) and then overlaying the original andthe shifted version. β was set to 6. Both Ninner and Ncross

were set to 15 in our experiments.

4. HIERARCHICAL CLUSTERING

In this section, we address the problem of combining theresults of the keypoint learning and the profile learning in orderto detect cells. One could employ a graph-based approach:Assume a graph G with the cell keypoints as nodes. Two nodesare connected if the profile between them is an inner profile.The nodes of G are obtained from the keypoint classificationand the edges are obtained from the profile classification.Intuitively, each connected component in G can be seen asa detected cell. This technique will be referred to later in thispaper as the connected components (CC).

The utilization of the available information using CC issuboptimal. Alternatively, one can think of clustering thekeypoints in an agglomerative manner starting from the mostreliable ones: the agglomerative hierarchical clustering (AHC)of the keypoints. The AHC is characterized by a similarity

Page 5: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

5

measure and a linkage method. Two cell keypoints pi andpj are similar if they belong to the same cell. Thus, it isplausible to define the similarity between pi and pj as theprobability that the profile between them is an inner profilePinner(pi,pj). This is equivalent to defining Pcross(pi,pj) =1−Pinner(pi,pj) as a dissimilarity measure between the twocell keypoints.

A. Linkage method

One plausible choice for the linkage method is the groupaverage link. In order to incorporate more information, we usea customized group average linkage instead of the traditionalone. According to this customized linkage, the dissimilaritybetween two clusters A and B is:

Φ(A,B) =

NA∑i=1

NB∑j=1

ωij

ΩABφ(pi,pj) (5)

φ(pi,pj) = Pcross(pi,pj) (6)

ωij =|DOG(pi)|s(pi)|DOG(pj)|s(pj)

‖pi − pj‖2(7)

ΩAB =

NA∑i=1

NB∑j=1

ωij (8)

where pi is a keypoint in the cluster A. pj is a keypoint inthe cluster B. NA and NB are the cardinalities of A and B,respectively. ωij is the weight of the dissimilarity betweenpi and pj . This weight is proportional to the scale and theabsolute DOG of the two points and inversely proportional tothe squared Euclidean norm of the profile ‖pi−pj‖2. Indeed,ωij is scale-invariant because the scale and the distance termsin the numerator and the denominator have the same order.ΩAB is the normalization factor of the weights.

The obtained dissimilarity from this customized linkage isalways in the range [0, 1]. The final clusters are obtained bycutting the dendrogram at a cutoff equal to 0.5.

In the so-called combinatorial clustering strategies, the newdissimilarities can be computed from the old ones using theLance-Williams dissimilarity update formula [15], [16]: If Aand B are merged into one cluster AB, then the dissimilaritybetween any other cluster C and the cluster AB is given by:

Π(AB,C) = ψΠ(A,C) + γΠ(B,C)+

νΠ(A,B) + µ|Π(A,C)−Π(B,C)|(9)

where ψ, γ, ν, and µ are the coefficients of this linear model.The values of the coefficients are known for the standard link-age methods like single, complete, average, median, centroid,Ward, and McQuitty [30]. As we use a customized linkage,we need to find the corresponding coefficient values. It can beshown that the model coefficients for our customized linkageare:

ψ =ΩAC

ΩAC + ΩBC(10)

γ =ΩBC

ΩAC + ΩBC(11)

ν = µ = 0 (12)

It is desirable to have monotonic clustering strategies as thereversals caused by non-monotonic strategies are inconvenientand hard to interpret [27], [29]. When µ is zero, the linearmodel Π(AB,C) is monotonic under the condition ψ+γ+ν ≥1 [16]. Obviously, this is fulfilled by our customized linkageand the trees built by Φ in Eq. (5) are thus monotonic.

B. Finding the hit-pointApplying the hierarchical clustering will result in different

clusters of keypoints. Each cluster represents one cell. In orderto determine a single hit-point, we use the following equation:

ph =1∑NL

j=1 λj

NL∑i=1

λipi (13)

where NL is the number of keypoints inside the consideredcluster. pi is the keypoint i in the cluster. λi is a reliabilityweight that we set to λi = |DOG(pi)|.

5. SYSTEM TRAINING

Equipped with the previously described methods, one isnow able to train the proposed system from a given set ofimages and their cell segmentation (ground truth). First, thesystem is calibrated to detect bright or dark cells accordingto Eq. (1). Depending on the result of this step, the keypointswith positive or negative DOG values will be discarded. Afterthat, a SIFT threshold is learned from the training data andthen applied as described in Section 2. Using the remainingset of keypoints, the mean image scale is computed andSAS is applied. Next, keypoints are detected in all trainingimages and their respective features are extracted. Based on thesegmentation, the class of each keypoint can be determined.Using this information a BRF is trained on the keypointfeatures. These steps enable the system to automatically extractkeypoints and to determine whether they belong to a cell orthe background.

Based on the cell keypoints, the profile learning process canthen be started. As a first step, we calibrate the maximuminner profile length L from the training images and theirsegmentation. Next, we train a BRF using the profile features.As classes, we use inner profile and cross profile. This yieldsa system that is automatically able to distinguish these twotypes of profiles.

For the hierarchical clustering, no training is required. Themethod can be applied directly on the output of the profileBRF.

6. EVALUATION MEASURES

We use the following measures to evaluate the detectionquality: precision, recall, detection error, and centerednesserror. Precision is strictly defined. Therefore, if several hit-points (cf. Eq. (13)) lie inside the mask of a cell, one of themis considered correct and the others are false positives.

Detection error is the arithmetic average of the precisionloss and the recall loss:

Detection error(I) =1

2(ZU

Z+HB +HO

H) (14)

Page 6: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

6

where H is the total number of the hit-points. HB is thenumber of the hit-points in the background. HO is the numberof the over-detected hit-points. For instance, if five hit-pointswere detected in one cell, then one of them is consideredcorrect and the other four are considered over-detected hit-points. Z is the total number of cells in the image I. ZU isthe number of the undetected cells, i.e. the cells which containno hit-points.

Since a hit-point can lie anywhere inside the cell mask,another measure is needed in order to evaluate the centerednessof the hit-point inside the detected cell. For this purpose, wedefine a new measure:

Centeredness error(I) =1

ZC

ZC∑i=1

‖pih − pi

m‖χi

(15)

where ZC is the number of the cells which were detected byonly one hit point. Therefore, the over-detected cells are notconsidered by this measure. The numerator is the Euclideandistance between pi

h the hit-point inside the cell i and pim

the center of mass of this cell. The denominator χi is themajor axis length of the ellipse which represents the covariancematrix of the binary mask of the cell i. This normalization isimportant in order to make the centeredness error independentof the cell size.

7. RESULTS

A. Materials

We evaluated the system on both real and simulated images.Table 1 shows a summary of the cell lines used for theevaluation.

The first three rows of the table are real cell lines: CHOadherent cells (cf. Figure 5(a)), L929 adherent cells (cf.Figure 5(b)), and Sf21 suspension cells (cf. Figure 5(c)). Bythe term adherent cell line, we mean that almost all cells wereadherent. Due to biological reasons, it was not possible to forceall cells to adhere. Figure 5(b) exemplifies this case wheresome cells in the adherent cell line L929 are in suspension.

Two bioprocess engineering experts have manually labeledthe cells in the three real cell lines using the LabelMeannotation framework [33]. The total number of the manuallylabeled cells is 3510. The cell culturing process of these celllines is explained in Section 7-A1. While the image acquisitiondetails are described in 7-A2.

The last two rows of Table 1 are our simulated images.SIMCEP [17] was used to simulate two cell lines. The first(cf. Figure 5(d)) is simulated with high SNR, while the second(cf. Figure 5(e)) is simulated under severe Gaussian noise con-ditions. The simulation details are explained in Section 7-A3.

1) Cell culturing: CHO-K1 epithelial-like cells and L929murine fibroblast cells were pre-cultured and maintained inexponential growth phase in T-25 polystyrene culture flasks(Sarsted 8318.10) using DMEM/Ham’s F-12 (1:1) (Invitrogen21331-046) with 10% fetal calf serum (PAA A15-102) and4 mM Glutamine (Sigma-Aldrich G7513-100ML) at 37Cand 7% CO2 containing atmosphere. For image acquisition,cells were detached from the T-Flask using Accutase (Sigma-Aldrich A6964-100ML) on the day before and seeded out in

Cell line Description Images CellsCHO Real CHO adherent cells 6 1431L929 Real L929 adherent cells 5 1078Sf21 Real Sf21 cells in suspension 5 1001

Simulated A Simulated cells with SNR ≈ 63 100 15000Simulated B Simulated cells with SNR ≈ 0.07 100 15000

TABLE 1THE CELL LINES USED IN THE EVALUATION

(a) CHO adherent (b) L929 adherent

(c) Sf21 suspension (d) Simulated A, SNR ≈ 63

(e) Simulated B, SNR ≈ 0.07

Fig. 5. Examples cut from the evaluation images. The defocus distance in(a), (b), and (c) is +30µm, +30µm, and +15µm, respectively.

24-well plate format using a working volume of 600 µl of thementioned medium composition. Cells were allowed to attachand spread out in the well-plate for at least 18 hours but notmore than 30 hours.

Sf21 insect cells were maintained in exponential growthphase in silicon solution treated shaker flasks in Ex-Cell 420medium (Sigma-Aldrich: 24420C) at 27C and normal air CO2

level. On the day of investigation, an aliquot of this culturewas transferred to a 24-well plate in a final volume of 600 µland cells were allowed to sediment before image acquisition.

2) Real image acquisition: The images of the three real celllines in Table 1 were manually taken with an inverted NikonEclipse TE2000U microscope using Nikon’s USB camera.Cells were illuminated by a halogen light bulb for standardbright-field microscopy. The used microscope’s objective has20x magnification, 0.45 numerical aperture, and 7.4 mmworking distance. The image resolution is 1280 x 960 pixelswith 0.49µm / pixel.

The most important acquisition parameter is probably thedefocus distance. This distance was empirically set to +30µmfor the adherent cell lines and +15µm for the suspension cell

Page 7: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

7

line. For our approach, one focus level is sufficient. However,we compare our system with other approaches which needimages at several focus levels. Therefore, in addition to theprevious focus level, we acquired images at levels 0 and−30µm for the adherent cell lines and at levels 0 and −15µmfor the suspension cell line.

3) Image simulation: As mentioned above, the softwarein [17] was used to simulate the two artificial cell lines inTable 1. It is important to point out that this software wasdesigned for fluorescence microscopy. Nonetheless, we choseto use it for the following reasons: Firstly, to the best ofour knowledge, there is no simulation software available forbright-field microscopy. Secondly, if the cytoplasm is excludedfrom the simulation, the cell nuclei resemble the negativelydefocused bright-field images even though this resemblance ispartial due to differences at the cell borders.

Cells were generated according to a shape model describedin [17] with dynamic range equal to 0.3 of the allowed imagebins number. Then an illumination field with scale 10 wasadded to each image. This scale is the ratio between theillumination energy (sum of squared values) and the idealimage energy. Afterwards, white Gaussian noise was addedso that the signal to noise ratio is approximately 63 in thefirst cell line (cf. Figure 5(d)) and 0.07 in the second (cf.Figure 5(e)). The SNR is the ratio between the ideal imageenergy and the noise energy. The illumination energy does notcontribute to the SNR.

In Section 7-B, all cell lines of Table 1 were used toevaluate the detection accuracy. The real cell lines were usedin Section 7-C to assess the contribution of the differentcomponents of our algorithm to the overall detection accuracy.In Section 7-D, we compare our system with two otherapproaches. For this comparison, the CHO cell line was usedfirst as is, and then perturbed with illumination and scalechanges. In Section 7-E, we perturb the three real cell lineswith orientation, scale, and illumination changes in orderto assess the system’s invariance to these factors. Detectiontime was evaluated in Section 7-F. Finally, the ability ofthe algorithm to learn to detect cells when the training datacontains several cell lines was evaluated in Section 7-G.

B. Evaluation of the overall detection accuracy

We evaluated the detection accuracy of the proposed systemon the five cell lines given in Table 1. For each cell line,only one image was used for training and the others wereused for testing. For the real cell lines, this was repeated foreach image. For the simulated cell lines, this was repeated fivetimes. Table 2 shows the results of this cross validation. Thedetection quality measures are defined in Section 6

Table 2 shows that the error was close to zero for the highSNR simulated images. Even under severe Gaussian noiseconditions with SNR ≈ 0.07, the error was only 4 %. It is alsoclear from the table, that the system achieved higher detectionrates with suspension cells compared to adherent cells. This isplausible, as the latter have considerably lower contrast thansuspension cells.

C. Evaluation of the system components

We also evaluated the contribution of specific componentsof the system to the detection accuracy. Tables 3, 4, and5 summarize this evaluation on the L929, CHO, and Sf21respectively. In the first column, the outputs of the two randomforests were combined using the connected components asdescribed in Section 4. The same was done in the secondcolumn but the keypoints were thresholded according to Sec-tion 2. This thresholding is also used in columns 3 to 8. In thethird column, the agglomerative hierarchical clustering withthe group average linkage was used instead of the connectedcomponents. In the fourth one, our customized linkage methodwas used instead. This customized linkage is also used incolumns 5 to 8. In the fifth column, Eq. (13) was used tofind the hit-points instead of the simple arithmetic average ofthe coordinates of the keypoints inside each cluster. Eq. (13) isalso used in columns 6 to 8. In the profile expansion columns,three strategies were tested: The first is using parallel profilesets. The second is using SAS. The third is combining bothof them.

The estimates in tables 3, 4, and 5 are cross validationestimates, where one image per cell line is used for trainingand the remaining ones are used for testing.

An important result of these tables is that SAS achieveshigher detection scores than the profile set. On the other hand,using SAS together with the profile set delivers a bit higherdetection rate than using SAS alone. Nevertheless, we sacrificethis small improvement in the detection rate in favor of betterdetection times. All other experiments in this paper were donewith SAS alone. Therefore, the SAS columns in tables 3, 4,and 5 correspond to the estimates in Table 2.

As shown in Figure 5, simulated cells tend to form negativeDOG blobs, whereas real cells tend to form positive DOGblobs in the positively defocused images. The system was ableto automatically detect the right type of blobs in each caseusing Eq. (1).

We did not conduct a thorough analysis of the feature impor-tance. However, the random forest has an internal mechanismto rank its features: the mean classification rate increase [5]. Arandom image from each cell line was used to train the systemand the feature with the highest rank was recorded. Table 8shows the result. The last column displays the highest rankedfeature when the previous five images are used together totrain the system. According to this table, at least one keypointfeature from each of the five keypoint feature sets was rankedthe best. This does not imply that all feature sets are necessaryto obtain the detection rate reported in Table 2. However, itgives an indicator that all keypoint feature sets are informative.Regarding profile features, Table 8 shows that the mean valueof the second derivative and V2 (cf. Eq. (4)) are the two highestranked features.

D. Comparison with other approaches

We compared our system with [4] and [3] (cf. Section 1).Table 6 summarizes the required input for each approach.In [3], there is a well-developed segmentation approach. Itis, however, worth pointing out that only the detection part

Page 8: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

8

CHO L929 Sf21 Simulated A Simulated BPrecision 77.7±8.0 82.8±4.4 97.3±0.9 98.9 ±0.7 95.4 ±2.1

Recall 92.9±3.0 92.6±2.9 96.4±3.2 99.4 ± 0.7 96.5 ±1.6Detection error 0.147±0.03 0.123±0.01 0.031±0.01 0.008 ±0.005 0.040 ±0.01

Centeredness error 0.477±0.13 0.377±0.07 0.164±0.02 0.173 ±0.10 0.222 ±0.11

TABLE 2CELL DETECTION ACCURACY ON DIFFERENT CELL LINES.

CC SIFT Threshold AHC average AHC custom Weighted avg. Profile expansionProfile sets SAS Both

Precision 67.1±4.2 76.8±4.2 72.6±5.1 76.0±4.5 76.9±4.4 78.9±5.1 82.8±4.4 83.4±4.0Recall 66.2±5.9 73.2±3.4 91.0±4.5 90.2±4.4 91.0±4.6 91.7±4.0 92.6±2.9 92.6±3.0

Detection error 0.334±0.04 0.250±0.01 0.182±0.01 0.169±0.01 0.161±0.01 0.147±0.02 0.123±0.01 0.120±0.01Centeredness error 0.528±0.08 0.381±0.09 0.484±0.11 0.406±0.08 0.382±0.07 0.382±0.07 0.377±0.07 0.369±0.07

TABLE 3THE CONTRIBUTION OF THE DIFFERENT SYSTEM COMPONENTS TO THE DETECTION ACCURACY (L929)

CC SIFT Threshold AHC average AHC custom Weighted avg. Profile expansionProfile sets SAS Both

Precision 70.5±5.3 76.8±5.9 68.5±10.7 73.4±8.8 74.0±8.6 76.4±8.4 77.7±8.0 78.8±7.8Recall 58.7±6.4 65.5±4.7 91.4±3.9 90.1±4.4 90.7±4.8 91.9±4.3 92.9±3.0 93.1±3.2

Detection error 0.354±0.05 0.288±0.04 0.201±0.03 0.183±0.02 0.177±0.02 0.159±0.02 0.147±0.03 0.141±0.03Centeredness error 0.472±0.20 0.432±0.07 0.652±0.24 0.503±0.14 0.501±0.14 0.455±0.10 0.477±0.13 0.446±0.11

TABLE 4THE CONTRIBUTION OF THE DIFFERENT SYSTEM COMPONENTS TO THE DETECTION ACCURACY (CHO)

CC SIFT Threshold AHC average AHC custom Weighted avg. Profile expansionProfile sets SAS Both

Precision 89.0±2.2 93.1±1.9 90.9±1.6 94.1±0.9 94.3±0.8 95.0±1.5 97.3±0.9 97.4±0.9Recall 78.6±2.4 83.3±4.1 95.8±0.9 95.1±0.9 95.2±1.2 96.3±1.1 96.4±3.2 96.8±3.1

Detection error 0.162±0.02 0.118±0.03 0.066±0.01 0.054±0.00 0.053±0.01 0.043±0.01 0.031±0.01 0.029±0.01Centeredness error 0.132±0.00 0.180±0.01 0.216±0.01 0.189±0.02 0.163±0.01 0.157±0.01 0.164±0.02 0.162±0.02

TABLE 5THE CONTRIBUTION OF THE DIFFERENT SYSTEM COMPONENTS TO THE DETECTION ACCURACY (SF21)

is used in the comparison. [4] utilizes three algorithms atthree different focus levels and combines the results. In ourevaluation, instead of combing the results of these threealgorithms, we select the one which has the minimum error.This strategy gave better results on our images.

Due to the difficulty of the manual parameter tuning, thiscomparative evaluation was performed using only one cell line,CHO, and without cross validation. One image was randomlychosen and used to train our system. The same image was usedfor parameter tuning of [4] and [3]. The rest of the images wereused for testing.

The software of [4] was obtained from its authors whilewe implemented the cell detection part of [3] ourselves. Theoptimal value of the single parameter of [3] was found byscanning the parameter domain and selecting the value whichminimizes the detection error. On the other hand, we optimizedthe parameters of [4] manually. The result of the comparisonis shown at the left hand side of Table 7.

In order to investigate how the three approaches performunder illumination and scale change, we did the followingexperiment: The comparison was applied on the CHO cellline, but after perturbing the images. An illumination field wasapplied on all CHO images. The same field was applied onall testing and training images. In addition, the testing images

were resampled using the following scales: 0.5, 0.75, 1, 1.25,and 1.5. The results are shown at the right hand side of Table 7.

[3] [4] Our approachRequired number of images 2 3 1Manually tuned parameters 1 > 9 0

TABLE 6INPUT REQUIREMENTS FOR [3], [4], AND OUR APPROACH

E. Evaluation of illumination, orientation, and scale invari-ance

The simulation software in [17] can generate illuminationartifacts on simulated images. In order to make the experimentmore realistic, we applied the simulated illumination field onour real cell lines. Figure 6 shows how the detection errorchanges with the illumination scale, i.e. the ratio between theillumination energy and the image energy. Figure 7 showsthe difference between an image at illumination scale zeroand another one at illumination scale 100. The detection errorchange between these two extreme cases, as shown in Figure 6,was 8% in the worst case.

Only one image in each cell line was used for training. Itis an image at illumination scale zero. The other images in

Page 9: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

9

CHO CHO with illumination and scale changeOur approach [3] [4] Our approach [3] [4]

Precision 88.1 ± 2.3 56.1 ± 11.1 80.9 ± 3.2 80.8 ± 3.7 81.0 ± 12.7 43.0 ± 16.2Recall 87.6 ± 4.2 91.8 ± 3.5 61.3 ± 9.3 91.4 ± 2.5 36.5 ± 18.9 23.4 ± 10.6

Detection error 0.122 ± 0.024 0.260 ± 0.067 0.288 ± 0.042 0.138 ± 0.021 0.412 ± 0.060 0.668 ± 0.041Centeredness error 0.373 ± 0.342 0.552 ± 0.396 0.495 ± 0.581 0.469 ± 0.276 0.665 ± 0.729 1.350 ± 2.03

TABLE 7COMPARISON WITH OTHER APPROACHES

CHO L929 Sf21 Simulated A Simulated B AllKeypoint features Rn(p, π) SIFT descriptor feature SIFT descriptor feature Stencil feature VMap(p, 6) DOG

Profile features mean(d2f/dv2) V2 V2 mean(d2f/dv2) V2 mean(d2f/dv2)

TABLE 8HIGHEST RANKED FEATURES ACCORDING TO THE RANDOM FOREST MEAN CLASSIFICATION RATE INCREASE.

the cell line were used for testing at each of the followingillumination scales: 0, 20, 40, 60, 80, and 100.

Figure 8 depicts the system’s invariance to image scale. Inthis experiment, the system was trained using one image percell line and tested on other up- or down-sampled images fromthe same cell line. The figure shows that the detection errorchange is in the range of 4% excluding a sudden increase inthe detection error of Sf21 at scale 0.5. This indicates thatscale invariance is limited from the bottom. The reason is thatdownsampling reduces the number of SIFT keypoints due tothe structure degradation.

Finally, Figure 9 shows the degree of invariance withrespect to cell orientation. Rotating the images in order totest the system invariance to the orientation change is notthe best choice because of the diversity of cell orientationsin each image. Therefore, we simulated cell images so thatall cells inside the same image have the same orientation. Theshape model in the simulation software [17] cannot generateelongated cells with dominant orientation. Therefore, for thisexperiment, we replaced its shape model with an elliptical one.The system was trained using one image at orientation zeroand tested on five images at each of the following orientations:0, 30, 60, 90, 120, and 150. Each image contained 150simulated cells.

F. Evaluation of the detection time

In order to investigate the feasibility of the algorithm, wemeasured the detection time for all cell lines. The evaluationwas done on a Dell laptop with 8 GB RAM and an Intel Corei7-2720QM processor with clock speed 2.20 GHz. The imple-mentation details are as follows: The feature extraction wasimplemented in Matlab, the classification was done using theR package randomForest [18], the clustering was implementedin Java, and SIFT features were obtained from VlFeat [39].All modules were put together in a single Matlab application.

The system was trained and tested as described in Sec-tion 7-B. The detection times are reported in the upper partof Table 9. The lower part of the table shows the resultswhen the same experiment was applied on subsampled images(subsampling factor 0.5). As can be seen in the table, thedetection time is approximately in the range [30, 46] seconds

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Det

ectio

n er

ror

Illumination scale

Detection error vs. illumination scale

CHO

L929

Sf21

Fig. 6. Illumination invariance: An image at illumination scale 0 was usedfor training and the other images were used for testing at different illuminationscales.

per image and drops to the range [5, 14] seconds per imageafter subsampling.

G. Evaluation of the generalization on multiple cell lines

In Section 7-B, the system was trained separately for eachcell line. In fact, it is more challenging to learn to detectcells when images from different cell lines are used in thetraining. In this experiment, one image from each cell linewas randomly chosen. The five chosen images were used totrain the system. The rest of the images were used for testing.This process was repeated five times. As the images are ofdifferent dynamic ranges and/or modalities, each image wasnormalized to [0, 1]. The simulated images were also invertedin order to have one-sided cell keypoints in the training data.Table 10 shows the results. Comparing Table 10 to Table 2,one can see a relatively considerable increase of the detectionerror for Sf21 and Simulated B. Nevertheless, the maximumdetection error is still 15.5%.

In order to investigate whether similar cells are more suitedfor joint training, we conducted two additional experiments.Table 11 shows the results of the same process described

Page 10: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

10

CHO L929 Sf21 Simulated A Simulated BDetection time (seconds/image) 45.88 ± 13.60 36.69 ± 5.59 40.65 ± 7.13 30.47 ± 0.33 31.00±1.88

Detection error 0.147±0.03 0.123±0.01 0.031±0.01 0.008 ±0.005 0.040 ±0.01

Detection time on subsampled images (seconds/image) 13.75 ± 2.11 12.68 ± 0.93 8.97 ± 1.07 7.91 ±0.34 4.93 ±0.26Detection error on subsampled images 0.128 ± 0.01 0.112 ± 0.01 0.071±0.02 0.009 ±0.001 0.043 ±0.004

TABLE 9EVALUATION OF THE DETECTION TIME. RESOLUTION OF CHO, L929, AND SF21 IMAGES IS 1280 X 960 PIXELS. RESOLUTION OF SIMULATED A AND B

IS 1200 X 1200 PIXELS. ALL RESOLUTIONS ARE GIVEN BEFORE SUBSAMPLING.

CHO L929 Sf21 Simulated A Simulated BPrecision 77.7±7.2 79.6±8.0 73.2±2.4 98.9±0.1 81.8±5.7

Recall 92.2±4.3 89.4±6.6 99.7±0.1 98.3±0.4 97.6±0.3Detection error 0.150±0.02 0.155±0.01 0.136±0.01 0.014±0.00 0.103±0.03

Centeredness error 0.498±0.08 0.438±0.12 0.179±0.04 0.177±0.00 0.256±0.01

TABLE 10JOINT TRAINING: FIVE IMAGES WERE RANDOMLY CHOSEN, ONE FROM EACH CELL LINE. THEY WERE USED TO TRAIN THE SYSTEM AND THE REST

WERE USED FOR TESTING. THIS PROCESS WAS REPEATED FIVE TIMES.

(a) CHO image at illumination scale = 0

(b) CHO image at illumination scale = 100

Fig. 7. Illumination invariance example: The upper image was used fortraining, and the lower for testing.

above, but training and testing were applied only on theadherent cell lines. The same applies for Table 12, but forthe simulated cell lines. The detection error in both tables isvery close to the detection error in Table 2.

0

0.2

0.4

0.6

0.8

1

0.5 1 1.5 2 2.5 3

Det

ectio

n er

ror

Scale

Detection error vs. scale

CHO

L929

Sf21

Fig. 8. Scale invariance: An image at scale 1 was used for training and theother images were used for testing at different scales.

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140 160

Det

ectio

n er

ror

Orientation (degrees)

Detection error vs. orientation

Fig. 9. Orientation invariance: An image at orientation 0 was used fortraining. Five images at each orientation were used for testing. Each imagecontains 150 cells simulated under an elliptical shape model.

Page 11: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

11

CHO L929Precision 76.8±5.0 80.7±5.1

Recall 94.3±2.2 92.0±3.6Detection error 0.145±0.01 0.137±0.01

Centeredness error 0.497±0.13 0.462±0.06

TABLE 11JOINT TRAINING: TWO IMAGES WERE RANDOMLY CHOSEN, ONE FROM CHO AND ANOTHER ONE FROM L929. THEY WERE USED TO TRAIN THE SYSTEM

AND THE REST WERE USED FOR TESTING. THIS PROCESS WAS REPEATED FIVE TIMES.

Simulated A Simulated BPrecision 98.8±0.2 93.3±1.0

Recall 98.1±0.4 97.0±0.3Detection error 0.015±0.00 0.048±0.00

Centeredness error 0.173±0.00 0.229±0.01

TABLE 12JOINT TRAINING: TWO IMAGES WERE RANDOMLY CHOSEN, ONE FROM SIMULATED A AND ANOTHER ONE FROM SIMULATED B. THEY WERE USED TO

TRAIN THE SYSTEM AND THE REST WERE USED FOR TESTING. THIS PROCESS WAS REPEATED FIVE TIMES.

8. DISCUSSION

Several approaches [2], [3], [4], [28] utilize, though in dif-ferent ways, images at two or three focus levels to improve thecontrast. This is appealing as it is possible to get considerablyhigher contrast from the intensity change with the defocusdistance. The drawback is that at least two images are needed.As this is not always available, our system was developed towork with a single defocused image. In fact, using one imagehas another important advantage: It facilitates extending thealgorithm in the future for more image modalities.

We found that the approach in [32] on phase-contrast mi-croscopy has a partial conceptual similarity with our approach.This similarity lies in the use of two classifiers which havegoals analogous to the goals of our keypoint and profileclassifiers. However, our features were carefully designedand heavily tested for the rotation-, scale-, and illumination-invariance. Another difference is that the whole system is fullyautomatic and its internal details, e.g. learning the maximuminner profile length, were designed for automation and invari-ance. In addition, the use of hierarchical clustering in order tooptimally aggregate the second classifier results was a novelcontribution which proved to be effective, especially using ourcustomized linkage method. Lastly, the system is adapted tolow contrast bright-field microscopy. In fact, compared to thebright-field approaches in [4] and [3] which need both multipleimages and manual parameter tuning, our approach deliveredhigher detection accuracy and higher invariance degree toillumination and scale changes.

Before using the classifiers, the system must be trained.For the training, a set of images with ground truth is required.Due to the invariance of the extracted features and the use ofrandom forests, the system can learn from a relatively smallamount of training data. Only one image per cell line was usedfor training in our experiments.

The ground truth is segmentation that may have beenperformed with any manual, semi-automatic, or automaticmethod. For the this study, we chose a manual segmentationas we did not want to introduce a bias into the systemperformance caused by suboptimal training data. The imagesshould be defocused and the same defocus distance should be

used for training and detection. This defocus can be either inthe positive or the negative direction. The defocus is importantfor at least two reasons: Firstly, as already mentioned, someadherent cells are totally invisible at focus. Secondly, thedefocus smooths out the tiny details which degrade the systemperformance in terms of time and detection accuracy.

SIFT keypoints in bright-field images result from cell struc-tures, but also from debris, noise in the background, and otherimage artifacts. It is possible to eliminate some irrelevantkeypoints by imposing a high DOG threshold or a low PCRthreshold [23]. However, some cell keypoints may have highprincipal curvatures ratio because of the elongation of thecells. Other cell keypoints may have low DOG value dueto cell adherence or insufficient contrast of the consideredimage modality. Consequently, it is not always possible toseparate cell keypoints from non-cell keypoints using thesetwo features. Nevertheless, it is possible to reduce the numberof irrelevant keypoints by computing the maximum |DOG|in each training cell, then considering the minimum of thesemaxima as a threshold. Therefore, in our experiments, weset the DOG threshold to the previous value and the PCRthreshold to infinity.

Cell keypoints have either positive or negative DOG val-ues when they form valley-like or mountain-like structures,respectively. We have found in preliminary experiments (datanot shown) that using one-sided keypoints leads to a betterprofile learning. For this reason, our system considers eitherthe positive DOG keypoints or the negative, but not both.

In [12], the intensity is sampled using a stencil instead ofa patch and used for neuron detection in electron microscopy.A similar idea was used in a completely different field [26],where a radial sampling pattern was employed to sample 3Dvessels. Both methods used fixed-size stencils. In our case,the scale and orientation of the keypoint deliver additionalinformation. This can then be used to make the stencil scale-and orientation-invariant. In order to make it invariant to thelocal shift of intensity, one can subtract the minimum or themean intensity of the stencil from all stencil nodes. We chosethe mean because it is less sensitive to outliers.

In [34], it was shown that ray features are better than Haar-

Page 12: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

12

like features for cell recognition. Thus, we used ray features inour work. Our contribution to this feature set is that we madeit scale- and orientation-invariant. In order to achieve scale-invariance for such features, computing the distances withrespect to scale is insufficient. In fact, the gradient computationhas to be additionally performed with respect to scale. If thegradient is computed using conventional kernels like Sobel orPrewitt, its norm Rn(m, θk) will be scale-dependent. We usea gradient computation that handles this issue correctly.

In [40], it was shown that variance maps can distinguishcells from background. The variance map value at a pixel issimply the variance of intensities in a neighborhood centeredat this pixel. The neighborhood size is fixed and based on thecell size [40]. We expanded this concept and made it scale-invariant.

The number of the keypoints inside each cell depends onthe noise level, the cellular details level, the defocus distance,the cell shape, and the SIFT parameters (the PCR and theDOG value). The goal of the profile learning was to connectthe keypoints which belong to the same cell together so thatthey are recognized as one cell.

We mentioned that the derivatives which are used in theprofile features are Gaussian derivatives with σ = 0.1NF . Asthe scale of the derivative, i.e. the previous sigma, is measuredin units of points, not in pixels, the derivative signal is to alarge extent scale-invariant. Note that, all profile features arealso invariant to the local shift of intensity and that the mean,the maximum, and the minimum of the profile do not belongto the profile feature set because they are sensitive to this shift.

If cells are circular and noise-free, we may get almost onekeypoint per cell. Therefore, in such case, we’ll get very fewor even zero inner profiles for training. If the cells are toofar from each other, we may get very few or even zero crossprofiles for training. This occurs because the algorithm extractsprofiles between nearby keypoints only. Both cases occurred insome preliminary experiments on data which was not shown inthis study. The algorithm handles these two cases which makesthe training robust against odd situations in the training data.

A main disadvantage of using a set of profiles is thecomputational cost. Furthermore, it turned out that SAS leadsto a higher detection rate as shown in the results section.This is, probably, due to the fact that SAS serves also asa preprocessing step. For instance, we noticed that the SASimproved the edges drastically. Consequently, the edge-basedfeatures like the ray features, gained higher discriminativepower.

In our experiments, we achieved robustness to low fre-quency changes in the illumination by using features that arelocally invariant to an offset in intensity. In a small regionof the image, this local shift of intensity can be interpretedas a constant. We regard this as an important feature of oursystem, as many real world images suffer from inhomogeneousillumination. It is worth pointing out that images with illumi-nation artifacts should be used without normalization, becausethe image measures which are usually used in normalizationlike mean, standard deviation, maximum, and minimum of theimage intensity depend on the illumination information.

If the distance between the two keypoints of a profile is

large, then the illumination artifacts at this profile cannot beapproximated by a constant. In other words, the invarianceto the local-intensity shift is beneficial only if the profilesare extracted between nearby keypoints. Therefore, the sys-tem should somehow avoid extracting profiles between farkeypoints. In order to achieve this goal, the algorithm learnsthe maximum inner profile length L in a scale-independentmanner from the training data. This contributes to the partialillumination-invariance of the profile learning and reduces thedetection and training times. In fact, learning L is particularlyuseful because this length is a cell characteristic. Whereas, forinstance, learning the cross profile length does not make sensebecause it depends on the distribution of the cells in the cellculture.

We used RF as a classifier model for the keypoint and theprofile learning. The RF does not need parameter tuning whichwas one of our design goals. It is also inherently a multi-class classifier. This makes extending the keypoint learning infurther research for debris and agglomeration detection easier.The previous two points can be considered as advantages of theRF over some other state-of-the-art classifiers like the SVM.In fact, some empirical studies [13], [14] showed that the RFoutperformed the SVM in terms of area under ROC (AUC)on imbalanced data, even though that was not always the case[25]. Moreover, the immunity of the RF against overfittinggrants the system the ability to learn from small training datasizes. This last point makes it favorable over classifiers likethe probabilistic boosting trees [37].

There is, in general, no guarantee that the cell and thebackground classes are balanced. The same applies for the twoclasses in the profile learning. The imbalance problem can besolved by using a balanced random forest BRF or a weightedrandom forest WRF [7]. Both have similar ROC curves. But,according to the same reference, the BRF is computationallymore efficient and less vulnerable to noise.

The output of the two classifiers can be seen as a graphwhose nodes are the cell keypoints and whose edges are theinner profiles. Therefore, the connected components of thisgraph can be regarded as the detected cells. However, ourresults show that higher detection rates can be achieved whenthe probabilistic output of the profile classifier is used as a sim-ilarity measure in an AHC step. Moreover, the detection ratewas further improved by using our customized linkage methodwhere more application-specific information was involved. Infact, it is plausible to expect that the AHC is more robust thanthe CC because it starts aggregating the most reliable cases.Furthermore, the decision about the less reliable cases doesnot depend on a single classification but on the average ofseveral profile classifications.

We pointed out in the introduction that a good cell detectionapproach should facilitate tracking when the latter is required.The output of our system after the clustering stage is a set ofkeypoints where each one is assigned to a cell and equippedwith a set of features. A subgroup of these features has beenused in tracking applications. See, for example, the use of theSIFT descriptor for tracking in [11].

We evaluated the system with respect to runtime on ourimage database and on a subsampled version of it. We noticed

Page 13: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

13

that, on our data, the detection time of the subsampled imageswas 3-4 times shorter than the detection time of the originalimages. This suggests that the detection time is proportional tothe number of pixels in the considered image. Moreover, withthe exception of Sf21, the detection error was not increasedby subsampling. In fact, for the adherent cell lines, it wasdecreased. This is probably caused by the sampling-inducedsmoothing of the cellular details which could otherwise mis-lead the detection algorithm and degrade its precision. How-ever, this observation cannot be generalized without taking theoriginal image resolution into account. In our experiments, thereal cell line images before subsampling had a resolution of1280 x 960 pixels with objective’s magnification equal to 20x.

The system was also tested for its generalization ability. Theresults show that it can learn to detect cells even when severalcell lines of different visual appearance are used for training.However, the detection error was smaller when only similarcell lines are used for training. Therefore, we propose to usedifferent classifiers for different cell lines.

9. CONCLUSION

We have presented a novel system for cell detection inbright-field microscope images. The system is fully automaticas no manual parameter tuning is needed in training or indetection.

For the evaluation, three real cell lines were used: CHO,L929, and Sf21. CHO and L929 were adherent cells whileSf21 were cells in suspension. Adherent cells had very lowcontrast and were almost invisible at focus. In addition,simulated cells with both high and very low SNR were used inthe evaluation. In total, more than 3500 real cells and 30000simulated cells were used.

The system was designed with special care for robustnessto illumination artifacts and invariance to cell size and orien-tation. It was trained with specific image scale, orientation,and illumination conditions. It was then tested on images ofdifferent scale, orientation, and illumination conditions. Ourresults show that the system yields a high detection accuracyand high invariance scores in reasonable computation time.

All experiments on real and simulated images were donewithout any manual parameter tuning. Because of this gen-erality, we are currently investigating the system’s ability indetecting molecules in scanning tunneling microscope images.

ACKNOWLEDGMENT

We highly appreciate the efforts of our partners in theInstitute of Bioprocess Engineering, Friedrich-Alexander-University, Erlangen-Nuremberg. These efforts include cellculturing, image acquisition, and labeling. The authors wouldalso like to thank the Bavarian Research Foundation BFS forfunding the project COSIR under contract number AZ-917-10 and our industrial partners for the fruitful collaboration.We gratefully acknowledge funding of the Erlangen GraduateSchool in Advanced Optical Technologies (SAOT) by theGerman Research Foundation (DFG) in the framework of theGerman excellence initiative. Special thanks for Dr. GabrieleBecattini for providing us with the software and the usermanual of his cell detection method. Great thanks go to Dr.Elli Angelopoulou for proofreading the article.

REFERENCES

[1] U. Agero, C. H. Monken, C. Ropert, R. T. Gazzinelli, and O. N.Mesquita, “Cell surface fluctuations studied with defocusing mi-croscopy,” Physical Review E, vol. 67, no. 5, p. 051904, May 2003.

[2] R. Ali, M. Gooding, M. Christlieb, and M. Brady, “Phase-based seg-mentation of cells from brightfield microscopy,” in Proceedings of theIEEE International Symposium on Biomedical Imaging: From Nano toMacro, April 2007, pp. 57–60.

[3] R. Ali, M. Gooding, T. Szilágyi, B. Vojnovic, M. Christlieb, andM. Brady, “Automatic segmentation of adherent biological cell bound-aries and nuclei from brightfield microscopy images,” Machine Visionand Applications, vol. 23, no. 4, pp. 607–621, 2012.

[4] G. Becattini, L. Mattos, and D. Caldwell, “A novel framework forautomated targeting of unstained living cells in bright field microscopy,”in Proceedings of the IEEE International Symposium on BiomedicalImaging: From Nano to Macro, April 2011, pp. 195–198.

[5] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp.5–32, 2001.

[6] J. Canny, “A computational approach to edge detection,” IEEE Transac-tions on Pattern Analysis and Machine Intelligence, vol. 8, pp. 679–698,November 1986.

[7] C. Chen, A. Liaw, and L. Breiman, “Using random forest to learnimbalanced data,” University of California, Berkeley, Tech. Rep., 2004.

[8] C. Curl, T. Harris, P. Harris, B. Allman, C. Bellair, A. Stewart, andL. Delbridge, “Quantitative phase microscopy: a new tool for measure-ment of cell culture growth and confluency in situ,” Pflügers ArchivEuropean Journal of Physiology, vol. 448, no. 4, pp. 462–468, 2004.

[9] K. Ghatak, Techniques and Methods in Biology. PHI Learning Pvt.Ltd., 2010.

[10] D. Gil, P. Radeva, and F. Vilariño, “Anisotropic contour completion,”in Proceedings of the International Conference on Image Processing,vol. 1, September 2003, pp. 869–872.

[11] R. Jiang, D. Crookes, N. Luo, and M. Davidson, “Live-cell trackingusing sift features in dic microscopic videos,” IEEE Transactions onBiomedical Engineering, vol. 57, no. 9, pp. 2219–2228, September 2010.

[12] E. Jurrus, A. Paiva, S. Watanabe, J. Anderson, B. Jones, R. Whitaker,E. Jorgensen, R. Marc, and T. Tasdizen, “Detection of neuron mem-branes in electron microscopy images using a serial neural networkarchitecture,” Medical Image Analysis, vol. 14, no. 6, pp. 770–783,December 2010.

[13] M. Khalilia, S. Chakraborty, and M. Popescu, “Predicting disease risksfrom highly imbalanced data using random forest,” BMC MedicalInformatics and Decision Making, vol. 11, no. 1, p. 51, 2011.

[14] T. Khoshgoftaar, M. Golawala, and J. Van Hulse, “An empirical study oflearning from imbalanced data using random forest,” in Proceedings ofthe IEEE International Conference on Tools with Artificial Intelligence,no. 2, October 2007, pp. 310–317.

[15] G. N. Lance and W. T. Williams, “A generalized sorting strategy forcomputer classifications,” Nature, vol. 212, p. 218, October 1966.

[16] ——, “A general theory of classificatory sorting strategies: 1. hierarchi-cal systems,” The Computer Journal, vol. 9, no. 4, pp. 373–380, 1967.

[17] A. Lehmussola, P. Ruusuvuori, J. Selinummi, H. Huttunen, and O. Yli-Harja, “Computational framework for simulating fluorescence micro-scope images with cell populations,” IEEE Transactions on MedicalImaging, vol. 26, no. 7, pp. 1010 –1016, July 2007.

[18] A. Liaw and M. Wiener, “Classification and regression by randomforest,”R News, vol. 2, no. 3, pp. 18–22, 2002. [Online]. Available:http://CRAN.R-project.org/doc/Rnews/

[19] T. Lindeberg, “Feature detection with automatic scale selection,” Inter-national Journal of Computer Vision, vol. 30, no. 2, pp. 79–116, 1998.

[20] X. Long, W. Cleveland, and Y. Yao, “A new preprocessing approachfor cell recognition,” IEEE Transactions on Information Technology inBiomedicine, vol. 9, no. 3, pp. 407–412, 2005.

[21] ——, “Automatic detection of unstained viable cells in bright fieldimages using a support vector machine with an improved trainingprocedure,” Computers in Biology and Medicine, vol. 36, no. 4, pp.339–362, 2006.

[22] D. Lowe, “Object recognition from local scale-invariant features,” inProceedings of the IEEE International Conference on Computer Vision,no. 2, 1999, pp. 1150–1157.

[23] ——, “Distinctive image features from scale-invariant keypoints,” In-ternational Journal of Computer Vision, vol. 60, no. 2, pp. 91–110,November 2004.

[24] V. Lulevich, Y.-P. Shih, S. H. Lo, and G.-Y. Liu, “Cell tracing dyessignificantly change single cell mechanics.” The journal of physicalchemistry. B, vol. 113, no. 18, pp. 6511–9, 2009.

Page 14: Automatic Cell Detection in Bright-Field Microscope Images ...€¦ · A cell is expressed in bright-field images as one or more blobs in intensity. A blob is a maximum of the normalized

14

[25] D. Meyer, F. Leisch, and K. Hornik, “The support vector machine undertest,” Neurocomputing, vol. 55, no. 1-2, pp. 169–186, 2003.

[26] S. Mittal, Y. Zheng, B. Georgescu, F. Vega-Higuera, S. Zhou, P. Meer,and D. Comaniciu, “Fast automatic detection of calcified coronarylesions in 3d cardiac ct images,” in Machine Learning in MedicalImaging, ser. Lecture Notes in Computer Science. Springer, 2010,vol. 6357, pp. 1–9.

[27] B. Morgan and A. Ray, “Non-uniqueness and inversions in clusteranalysis,” Journal of the Royal Statistical Society. Series C (AppliedStatistics), vol. 44, no. 1, pp. 117–134, 1995.

[28] F. Mualla, S. Schöll, B. Sommerfeldt, and J. Hornegger, “Using theMonogenic Signal for Cell-Background Classification in Bright-FieldMicroscope Images,” in Proceedings des Workshops Bildverarbeitungfür die Medizin 2013, 2013, pp. 170–174.

[29] F. Murtagh, “Multidimensional clustering algorithms,” Compstat Lec-tures, Vienna: Physika Verlag, 1985, vol. 4, 1985.

[30] F. Murtagh and P. Contreras, “Algorithms for hierarchical clustering: anoverview,” Wiley Interdisciplinary Reviews: Data Mining and KnowledgeDiscovery, vol. 2, no. 1, pp. 86–97, 2012.

[31] T. Nattkemper, H. Ritter, and W. Schubert, “Extracting patterns oflymphocyte fluorescence from digital microscope images,” IntelligentData Analysis in Medicine and Pharmacology, vol. 99, pp. 79–88, 1999.

[32] J. Pan, T. Kanade, and M. Chen, “Learning to detect different types ofcells under phase contrast microscopy,” in Microscopic Image Analysiswith Applications in Biology, September 2009.

[33] B. Russell, A. Torralba, K. Murphy, and W. Freeman, “Labelme:A database and web-based tool for image annotation,” InternationalJournal of Computer Vision, vol. 77, no. 1, pp. 157–173, May 2008.

[34] K. Smith, A. Carleton, and V. Lepetit, “Fast ray features for learningirregular shapes,” in Proceedings of the IEEE International Conferenceon Computer Vision, October 2009, pp. 397–404.

[35] D. J. Stephens and V. J. Allan, “Light microscopy techniques for livecell imaging.” Science (New York, N.Y.), vol. 300, no. 5616, pp. 82–86,2003.

[36] M. Tscherepanow, F. Zöllner, M. Hillebrand, and F. Kummert, “Auto-matic segmentation of unstained living cells in bright-field microscopeimages,” in Advances in Mass Data Analysis of Images and Signalsin Medicine, Biotechnology, Chemistry and Food Industry, ser. LectureNotes in Computer Science. Springer, 2008, vol. 5108, pp. 158–172.

[37] Z. Tu, “Probabilistic boosting-tree: Learning discriminative models forclassification, recognition, and clustering,” in Proceedings of the IEEEInternational Conference on Computer Vision, vol. 2, October 2005, pp.1589–1596.

[38] W. van Opstal, C. Ranger, O. Lejeune, P. Forgez, H. Boudin, J. Bisconte,and W. Rostene, “Automated image analyzing system for the quantitativestudy of living cells in culture,” Microscopy Research and Technique,vol. 28, no. 5, pp. 440–447, 1994.

[39] A. Vedaldi and B. Fulkerson, “VLFeat: An open and portable library ofcomputer vision algorithms,” http://www.vlfeat.org/, 2008.

[40] K. Wu, D. Gauthier, and M. Levine, “Live cell image segmentation,”IEEE Transactions on Biomedical Engineering, vol. 42, no. 1, pp. 1–12, January 1995.

[41] Z. Yin, R. Bise, M. Chen, and T. Kanade, “Cell segmentation inmicroscopy imagery using a bag of local bayesian classifiers,” in TheIEEE International Symposium on Biomedical Imaging (ISBI) 2010,April 2010.

[42] A. Zaritsky, S. Natan, J. Horev, I. Hecht, L. Wolf, E. Ben-Jacob, andI. Tsarfaty, “Cell motility dynamics: A novel segmentation algorithmto quantify multi-cellular bright field microscopy images,” PLoS ONE,vol. 6, no. 11, p. e27593, 11 2011.