automated image analysis of hodgkin lymphoma...automated image analysis of hodgkin lymphoma...

1
Automated Image Analysis of Hodgkin Lymphoma References [1] M.-L. Hansmann and K. Willenbrock. Die WHO-Klassifikation des Hodgkin-Lymphoms und ihre molekularpathologische Relevanz. Der Pathologe, 23:207–218, 2002. [2] R. Küppers. The biology of Hodgkin’s Lymphoma. Nature Reviews Cancer, 9:15–27, 2009. [3] A.E. Carpenterv, T.R. Jones, M.R. Lamprecht, C. Clarke, I.H. Kang, O. Friman, D.A. Guertin, J.H. Chang, R.A. Lindquist, J. Moffat, P. Golland, D.M. Sabatini. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biology 7:R100, 2006 . Alexander Schmitz, Hendrik Schäfer, Tim Schäfer, Jörg Ackermann, Norbert Dichter, Sylvia Hartmann, Martin-Leo Hansmann, Ina Koch Institute of Computer Science, Department of Molecular Bioinformatics, Goethe-University, Frankfurt a. M. Hierarchical clustering: Distinguish between tissue and background Filter image for region of in- terest (CD30 positive areas) Many tiles of the original image can be ignored in further pro- cessing: Non-tissue area: ~25-50 % Non-CD30 area: ~50-75 % Pre-processing Input images All image tiles of the high re- solution image belonging to the ROI are considered Primary object detection and the calculation of cell shape descriptors are done using CellProfiler Detected cells are labeled with one or multiple tags de- pending on their shape: Large, Elongated, Cut Cell recognition Region of in tere s t Digitalized with an Aperio Scan- Scop scanning device Precision: 0.23 µm per pixel ~30 Gb uncompressed data per image Image data Tissue sections Image Data Pre-Processing Cell Recognition in CellProfiler Outlook Cell Recognition in CellProfiler Pre-Processing Image Data Nodular sclerosis Mixed cellularity Lymphadenitis Fig. 2: In our study we use three image sets as input. The two cHL subtypes nodular sclerosis and mixed cellularity and in addition a non-lymphoma group, which contains lymphadenitis cases. Fig. 1: Example images for the two cHL subtypes. A, nodular sclerosis; B, mixed cellularity. The images are double stained: hematoxilin (blue) and an immunostaining which targets CD30 (red), a tumor necrosis factor receptor. A B tissue background potential CD30 hematoxilin CD30 nucleus unstained non CD30 red low intensity Layer 3: Layer 2: Layer 1: Resolution low high Fig. 3: Hierarchical clustering of pixels in different layers Fig. 4: An example for a ROI defined by CD30. Top, original image; Bottom, detected ROI. Results Pre-Processing: Is the relative amount of the tissue classes a possible feature to distinguish the three image sets? Fig. 5: Detection and labeling of CD30 positive cells. In the original image (left), primary objects are detected (center). After removing small objects (green outlines), the cells remain and are labeled according to their shape (right). UnmixColors UnmixColors Split stainings into separate images IdentifyPrimaryObjects IdentifyPrimaryObjects Identify primary objects by applying a threshold in the CD30 image MeasureObjectSizeShade MeasureObjectSizeShade Calculate area shape descriptors for all detected primary objects FilterObjects FilterObjects Filter out small cell fragments ExportToDataBase ExportToDataBase Export to MySQL database Fig. 6: Overview of the modules used in the CellProfiler pipeline NS cHL cell cell_L cell_E cell_L_E cell_C cell_L_C cell_E_C cell_L_E_C 0 0.1 0.2 0.3 0.4 0.5 0.6 cell labels: L = Large E = Elongated C = Cut NS-1 NS-2 Lymph-1 Lymph-2 MC-1 MC-2 Fig. 8: Relative amount of the eight possible labels. Data for six example images is depicted, two from each image set (NS cHL, MC cHL and Lymphadenitis). Original image Cell outlines Labeled cells Hodgkin lymphoma is an unusual type of lymphoma [1], arising from malignant B-cells [2]. Morphological and immunohistochemical features of malignant cells and their distribution differ from other cancer types. Based on systematic tissue image analysis, computer-aided exploration can provide new insights into Hodgkin lymphoma pathology. Here, we report results from an image analysis of CD30 immunostained classical Hodgkin lymphoma (cHL) tissue section images. We have imple- mented an automatic procedure to handle and explore image data in Aperio's SVS format. We use pre-processing approaches to separate the image objects from the background, then select regions of interest and split the large images into tiles. Then, we use a CellProfiler [3] pipeline to detect primary objects. Therefore, the images are split into their color stains using a color deconvolution approach. By setting a threshold in the CD30 stain image we identify CD30 positive cells and compute their shape descriptors. We label the cells based on size, elongation and compactness. We present results for a small set of nodular sclerosis, mixed type and non-lymphoma images. Pixel-based classification Minimum distance to mean clustering Six pixel classes Non-tissue: Background, Low intensity Tissue: CD30, non CD30 red, Nucleus, Unstained Hierarchical clustering Pixel descriptors: Mean pixel value, Saturation, Brightness Only tiles containing class of interest are con- sidered in higher resolution (tissue, potential CD30) Captured with Aperio ScanScope scanning device Precision 0.23 µm / pixel in high resolution layer ~30 GB uncompressed data per image Three image sets, ~150 images total: Nodular sclerosis (NS cHL) Mixed cellularity (MC cHL) Lymphadenitis (non-lymphoma) Minimum distance to mean clustering for all images Descriptors: relative amount of the four tissue pixel classes 60% correctly classified, but we have a big overlap between MC cHL and the other image sets Outlook Cell recognition for complete image database Density and distribution for labeled cells Graphs for detected cells based on neighborhood and comparison of graph topology Additional immunohistological images for the microenvironment 25-50 % non-tissue tiles can be discarded Up to 50 % of the remaining tiles contain no CD30 positive pixels and can be ignored in further processing NS cHL tissue sections contain a higher amount of cells labeled as Large and Cut. These cells seem to be a specific feature of NS cHL. Fig. 7: Labels for detected cells based on cell shape. Each dot represents a cell object. The coloring is based on the labeling: Each RGB channel encodes one of the labels. Mixed colors are used when several labels are assigned to a single cell (e.g., magenta = Large and Cut). Examples for both cHL subtypes are depicted. The amount of large cells is much higher in NS cHL than in MC cHL. Outlook

Upload: others

Post on 05-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated Image Analysis of Hodgkin Lymphoma...Automated Image Analysis of Hodgkin Lymphoma References [1] M.-L. Hansmann and K. Willenbrock. Die WHO-Klassifikation des Hodgkin-Lymphoms

Automated Image Analysis of Hodgkin Lymphoma

References[1] M.-L. Hansmann and K. Willenbrock. Die WHO-Klassifikation des Hodgkin-Lymphoms und ihre molekularpathologische Relevanz. Der Pathologe, 23:207–218, 2002.[2] R. Küppers. The biology of Hodgkin’s Lymphoma. Nature Reviews Cancer, 9:15–27, 2009.[3] A.E. Carpenterv, T.R. Jones, M.R. Lamprecht, C. Clarke, I.H. Kang, O. Friman, D.A. Guertin, J.H. Chang, R.A. Lindquist, J. Moffat, P. Golland, D.M. Sabatini. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biology 7:R100, 2006.

Alexander Schmitz, Hendrik Schäfer, Tim Schäfer, Jörg Ackermann, Norbert Dichter, Sylvia Hartmann, Martin-Leo Hansmann, Ina Koch

Institute of Computer Science, Department of Molecular Bioinformatics, Goethe-University, Frankfurt a. M.

● Hierarchical clustering:● Distinguish between tissue

and background● Filter image for region of in-

terest (CD30 positive areas)

● Many tiles of the original image can be ignored in further pro- cessing:

● Non-tissue area: ~25-50 %● Non-CD30 area: ~50-75 %

Pre-processing

Input images

● All image tiles of the high re- solution image belonging to the ROI are considered

● Primary object detection and the calculation of cell shape descriptors are done using CellProfiler

● Detected cells are labeled with one or multiple tags de- pending on their shape:

● Large, Elongated, Cut

Cell recognition

Region

of interest

● Digitalized with an Aperio Scan- Scop scanning device

● Precision: 0.23 µm per pixel● ~30 Gb uncompressed data per image

Image data

Tissuesections

<description>

Image Data

Pre-Processing

Cell Recognition in CellProfiler

Outlook

Cell Recognition in CellProfiler

Pre-Processing

Image Data

Nodular sclerosis Mixed cellularity Lymphadenitis

Fig. 2: In our study we use three image sets as input. The two cHL subtypes nodular sclerosis and mixed cellularity and in addition a non-lymphoma group, which contains lymphadenitis cases.

Fig. 1: Example images for the two cHL subtypes. A, nodular sclerosis; B, mixed cellularity. The images are double stained: hematoxilin (blue) and an immunostaining which targets CD30 (red), a tumor necrosis factor receptor.

A B

tissue background

potential CD30 hematoxilin

CD30 nucleus unstainednon CD30red

lowintensity

Layer 3:

Layer 2:

Layer 1:

Res

olut

ion

low

highFig. 3: Hierarchical clustering of pixels in different

layers

Fig. 4: An example for a ROI defined by CD30. Top, original image; Bottom, detected ROI.

Results Pre-Processing:

Is the relative amount of the tissue classes a possible feature to distinguish the three image sets?

Fig. 5: Detection and labeling of CD30 positive cells. In the original image (left), primary objects are detected (center). After removing small objects (green

outlines), the cells remain and are labeled according to their shape (right).

UnmixColorsUnmixColors

Split stainings into separate images

IdentifyPrimaryObjectsIdentifyPrimaryObjects

Identify primary objects by applying a threshold in the CD30 image

MeasureObjectSizeShadeMeasureObjectSizeShade

Calculate area shape descriptors for all detected primary objects

FilterObjectsFilterObjects

Filter out small cell fragments

ExportToDataBaseExportToDataBase

Export to MySQL database

Fig. 6: Overview of the modules used in the CellProfiler pipeline

NS cHL

cellcell_L

cell_Ecell_L_E

cell_Ccell_L_C

cell_E_Ccell_L_E_C

0

0.1

0.2

0.3

0.4

0.5

0.6

cell labels: L = Large E = Elongated C = Cut

NS-1NS-2Lymph-1Lymph-2MC-1MC-2

Fig. 8: Relative amount of the eight possible labels. Data for six example images is depicted, two from each image set (NS cHL, MC cHL and Lymphadenitis).

Original image Cell outlines Labeled cells

Hodgkin lymphoma is an unusual type of lymphoma [1], arising from malignant B-cells [2]. Morphological and immunohistochemical featuresof malignant cells and their distribution differ from other cancer types. Based on systematic tissue image analysis, computer-aided explorationcan provide new insights into Hodgkin lymphoma pathology.

Here, we report results from an image analysis of CD30 immunostained classical Hodgkin lymphoma (cHL) tissue section images. We have imple-mented an automatic procedure to handle and explore image data in Aperio's SVS format. We use pre-processing approaches to separate the imageobjects from the background, then select regions of interest and split the large images into tiles. Then, we use a CellProfiler [3] pipeline to detect primary objects. Therefore, the images are split into their color stains using a color deconvolution approach. By setting a threshold in the CD30 stain image we identify CD30 positive cells and compute their shape descriptors. We label the cells based on size, elongation and compactness. We present results for a small set of nodular sclerosis, mixed type and non-lymphoma images.

● Pixel-based classification● Minimum distance to mean clustering

● Six pixel classes● Non-tissue: Background, Low intensity● Tissue: CD30, non CD30 red, Nucleus,

Unstained

● Hierarchical clustering● Pixel descriptors: Mean pixel value, Saturation, Brightness

● Only tiles containing class of interest are con- sidered in higher resolution (tissue, potential CD30)

● Captured with Aperio ScanScope scanning device

● Precision 0.23 µm / pixel in high resolution layer

● ~30 GB uncompressed data per image● Three image sets, ~150 images total:

● Nodular sclerosis (NS cHL)● Mixed cellularity (MC cHL)● Lymphadenitis (non-lymphoma)

● Minimum distance to mean clustering for all images

● Descriptors: relative amount of the four tissue pixel classes

● 60% correctly classified, but we have a big overlap between MC cHL and the other image sets

Outlook

● Cell recognition for complete image database

● Density and distribution for labeled cells

● Graphs for detected cells based on neighborhood and comparison of graph topology

● Additional immunohistological images for the microenvironment

● 25-50 % non-tissue tiles can be discarded

● Up to 50 % of the remaining tiles contain no CD30 positive pixels and can be ignored in further processing

NS cHL tissue sections contain a higher amount of cells labeled as Large and Cut. These cells seem to be a specific feature of NS cHL.

Fig. 7: Labels for detected cells based on cell shape. Each dot represents a cell object. The coloring is based on the labeling: Each RGB channel encodes one of the labels. Mixed colors are used when several labels are assigned to a single cell (e.g., magenta = Large and Cut). Examples for both cHL subtypes are depicted. The amount of large cells is much higher in NS cHL than in MC cHL.

Outlook