shape, colour and texture in mitosis detectionpersonal.maths.surrey.ac.uk/t.decampos/papers/... ·...

1

Shape, Colour and Texture in

Mitosis Detection

Violet Snell

Supervisors:

Prof. J. Kittler & Dr. W. Christmas

Centre for Vision, Speech and Signal Processing

University of Surrey

Guildford, UK

Violet Snell, CVSSP2

Pipeline

Colour Matching

Colour-based likelihood

ClassifierFeature

Extraction

Candidate Locations

Grey-scale Conversion

Patch Extraction & Segmentation

Patch Image & Object(s) Mask


Pre-processing:

stain variation

1 13 25 37 49 61 73 85 97 109

121

133

145

157

169

181

193

205

217

229

241

253

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

Green histograms, per patient

1 13 25 37 49 61 73 85 97 109

121

133

145

157

169

181

193

205

217

229

241

253

0.000

0.005

0.010

0.015

0.020

0.025

Red histograms, per patient


Pre-processing: White holes

Strong bias on histograms

Not clinically significant

Mask based on Green Thresholding

Threshold found as dip in Green

histogram

0 50 100 150 200 250

0

0.5

1

1.5

2

2.5

Thr

esho

ld=

209

Green Histogram for single HPF%


White Hole Exclusion

1 16 31 46 61 76 91 106

121

136

151

166

181

196

211

226

241

256

0.000

0.005

0.010

0.015

0.020

0.025

Red histograms, per patient

1 16 31 46 61 76 91 106

121

136

151

166

181

196

211

226

241

256

0.000

0.005

0.010

0.015

0.020

Blue histograms, per patient


Histogram Matching


Colour-based Likelihood

Projections of 3D (R,G,B) histograms, 64 bins in each dimension

All pixels vs Mitotic nuclei

10 pixel radius around ground-truth marked locations

Ratio gives a basic likelihood for each pixel based on its colour alone


Likelihood to Candidate Locations

Low-pass filter to provide

some spatial coherence

5x5 box

Threshold

Closed-contour search

Centre of each contour is

initial location


Conversion to Grey-scale

PCA of pixels within Mitotic nuclei (10 pix radius)

Principal axis projection


Conversion to Grey-scale


Segmentation

70x70pix patches

Grey-level threshold search, range: 40-145

Optimised for 2 objectives:

High average gradient across resulting

boundary

Low variance within the object

Weighted according to each parameter's std.

deviation

Minimum Area Limit

Contrast between background and foreground

Location refined to centroid of segmented object


Segmentation

Telophase pairs

2nd object present in patch

Comparable areas

Comparable contrast

101K patches in training set

180:1 class imbalance

145:1 for single objects

800:1 for pairs

Questions?


Feature Extraction

Area

Circularity = Perimeter2/Area

Convex Hull Area relative to object Area

Elongation of minimum area enclosing rectangle

Boundary Radial profile

High-pass filter

Fourier Shape Descriptors

Normalised magnitudes

1-5 as separate features

Sum of the higher terms


Feature Extraction

Contrast

Background excluding white holes

Segmentation level relative to object's mean

intensity

Average Gradient across segmentation boundary

Average Edge steepness

Contrast-independent

Morphology at intermediate thresholds

1/3 and 2/3 of [min..max] interval

Average object area at each one


Feature Extraction

Local Variance

Inside object 7x7

Background 5x5

Low-pass filtered background variance

High-pass filtered background variance

High-pass filtered foreground variance

Object internal variance


Classifier A: SVM

23 rotation-invariant features, normalised to unit variance

RBF kernel

Class imbalance 145:1 & Training set size 75K

Model averaging combined with random sub-sampling

All the positive examples

A different random portion of negative examples

Class weights

Parameter optimisation

Cross-validation is patient-based

Standard grid search fails for F1


Pairs

Features describe a single object

Telophase pairs need special treatment

Class imbalance over 800:1

Each object assessed separately first

At least one has a high enough prediction

Reduce class imbalance to 30:1, training set size ~1K

Features that assess balance of constituent objects' attributes

Average and ratio for a subset of single-object features

Total of the two objects' prediction scores


Classifier A: Results

Cross-validation F-score ~45%

Recall slightly higher than precision

Submission #1 optimised for overall F-score

Submission #2 optimised for weighted average of patient F-scores

Weighted by number of images – still over-representing high grade

Submission #3 biased in favour of Recall

Submission Precision Recall F1

#1 41.2% 26.5% 32.2%

#2 38.2% 28.0% 32.3%

#3 35.7% 33.2% 34.4%


Results Analysis

Big gap between cross-validation and test

Recall consistently lower than precision in test

Training set still not large enough to cover patient and tissue variation

Total number of detections too low on all submissions

Good correlation to expected number of detections for each patient

Correlation coefficient 0.82

Patients with fewer mitoses are “harder”

Under-represented in training set

Questions?


Classifier B: GP-LVM

Joint submission

Sheffield Institute for Translational Neuroscience,

Sheffield University, UK

Teo de Campos

GP-LVM

Gaussian Process Latent Variable Model

Probabilistic

Generative

Non-linear dimensionality reduction

Used for data-driven human motion synthesis


Latent Variable Models

Mapping from latent space X, related to underlying

physical processes, to observed variables Y, e.g. pixel

intensity values

Gaussian Process used to optimise the model's fit to

example points through kernel parameters θ

Estimate of noise level, i.e. parts of signal not explained by the model

Computational complexity

O(N3) in number of training data points

Test samples require an iterative optimisation to estimate their

position in latent space, and hence likelihood


Pipeline

Colour Matching

Colour-based likelihood

Classifier

Candidate Locations

Grey-scale Conversion

Patch Extraction, Segmentation &

Rotation

Patch Images


GP-LVM Experiments

Separate models for Positive and Negative classes

Spatial pyramid to provide connections between pixels

6555-dimensional observed space

Negatives for training selected by clustering

Equal number of Positive and Negative training points

Very high levels of noise (2dB SNR) compared to other applications (~30dB)

Negative model produces much higher likelihoods than Positive model

Weights based on average density of mitotic figures

Very slow in test as well as training

May benefit from GPU acceleration


GP-LVM Results

Positive ModelDimensions 2 & 18

Positive ModelDimensions 8 & 17

F1=11.3%


Summary

Histogram matching to cancel stain variations

Colour-based likelihood as 1st phase of detection

Two very different approaches to classification

Traditional feature extraction and SVM

Balance of shape, intensity and texture attributes

Latent Variable Models with Gaussian Process

Future work

Deep GP-LVM

Direct to grade, or a mitotic count bracket, rather than locations

Requires much larger data sets, but a lot less labelling


Questions?


Likelihood Threshold

Threshold controls trade-off between

Number of missed mitoses

Number of candidate locations requiring further assessment

And therefore the class imbalance

Err on the side of caution

14 of 550 training locations missed (2.5%)

115K potentials to check

shape, colour and texture in mitosis detectionpersonal.maths.surrey.ac.uk/t.decampos/papers/... ·...

Documents