faithful sampling for spectral clustering to analyze high throughput flow cytometry data parisa...
TRANSCRIPT
![Page 1: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/1.jpg)
Faithful Sampling for Spectral Clustering to Analyze High
Throughput Flow Cytometry Data
Parisa Shooshtari
School of Computing Science, Simon Fraser University, Burnaby
Brinkman’s Lab, Terry Fox Laboratory, BC Cancer Agency, Vancouver
![Page 2: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/2.jpg)
Outline:
• Flow Cytometry (FCM) Data• Clustering of FCM data• Spectral Clustering• Faithful Sampling for Spectral Clustering• Result• Summary
![Page 3: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/3.jpg)
Basics of Flow Cytometry Technique
3
Sample
Wave Length
Wave Length
Inte
nsity
Inte
nsity
MHC-II
MHC-II
MHC-II
MHC-II
CD-11c
CD-11c
Int-1
Int-2
CD-11c
MHC-IIInt-1Int-2
![Page 4: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/4.jpg)
Cell Population Identification in Flow Cytometry (FCM)
X%
Adapted from the Science Creative Quarterly (2)
Para
met
er 3
Parameter 4Pa
ram
eter
2
Parameter 1
![Page 5: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/5.jpg)
Importance of FCM Data Clustering
• Manual Gating is– Subjective– Error-prone– Time-Consuming– It ignores the multi-variation nature of the data
• Analyzing large size FCM data sets (with up to 19 dimensions and 1000,000 points) is impractical without the aim of automated techniques
![Page 6: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/6.jpg)
Which Clustering Algorithm Is Suitable?• Model-Based algorithms like FlowClust, FlowMerge and FLAME
are not suitable for non-elliptical shape clusters.
6
FlowMergeA Good Clustering
GFP
![Page 7: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/7.jpg)
Our Motivation for Using Spectral Clustering
• Spectral clustering does not require any priori assumption on cluster size, shape or distribution
• It is not sensitive to outliers, noise and shape of clusters
7
![Page 8: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/8.jpg)
Spectral Clustering in One SlideRepresent data sets by a similarity graph
Construct the Graph:• Vertices: data points p1, p2, …, pn
• Weights of edges: similarity values Si, j as
Clustering: Find a cut through the graph• Define a cut objective function• Solve it
![Page 9: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/9.jpg)
The Bottleneck of Spectral Clustering
• Serious empirical barriers when applying this algorithm to large datasets
• Time complexity: O(n3) ---- > 2 years for 300,000 data points (cells)
• Required memory: O(n2) ---- > 5 terabytes for 300,000 data points (cells)
9
![Page 10: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/10.jpg)
Faithful Sampling: Our Solution for Applying Spectral Clustering to Large Data
• Uniform Sampling:Low density populations close to dense ones may not remain distinguishable
10
• Faithful Sampling:Tends to choose more samples from non-dense parts of the data.
![Page 11: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/11.jpg)
How Does Our Faithful SamplingPreserve Information?
1.1. Space Uniform Sampling: Space Uniform Sampling: It preserves low-density parts of the data by selecting more samples from them compared to the uniform sampling.
2.2. Keeping the list of points in Keeping the list of points in neighbourhood of samples: neighbourhood of samples: This will be used to define similarities between communities.
![Page 12: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/12.jpg)
Clustering Result• Low density populations surrounded by dense ones
![Page 13: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/13.jpg)
Clustering Result• Populations with Non-elliptical Shapes
• Subpopulations of a major population
13
SamSPECTRAL flowMerge FLAME
![Page 14: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/14.jpg)
Summary• Spectral clustering can now be applied to large size data
by our proposed Faithful (Information Preserving) sampling.
• This sampling method can be used in combination with other graph-based clustering algorithms with different objective functions to reduce size of the data.
• We have shown that SamSPECTRAL has advantage over model-based clusterings in identification of– Cell populations with non-elliptical shapes– Low-density populations surrounded by dense ones– Sub-populations of a major population
![Page 15: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/15.jpg)
Acknowledgement• Committee:
– Dr. Arvind Gupta– Dr. Ryan Brinkman– Dr. Tobias Kollman
• Co-authors on SamSPECTRAL – Habil Zare
• Data Providers – Connie Eaves– Peter Landsdrop– Keith Humphries
![Page 16: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cc95503460f94991bb9/html5/thumbnails/16.jpg)
Thanks for Thanks for Your Attention!Your Attention!