algorithms for smoothing array cgh data
DESCRIPTION
Algorithms for Smoothing Array CGH data. Kees Jong (VU, CS and Mathematics) Elena Marchiori (VU, Computer Science) Aad van der Vaart (VU, Mathematics) Gerrit Meijer (VUMC) Bauke Ylstra (VUMC) Marjan Weiss (VUMC). Tumor Cell. Chromosomes of tumor cell:. CGH Data. C o p y #. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/1.jpg)
1
Algorithms forSmoothing Array CGH data
Kees Jong (VU, CS and Mathematics)Elena Marchiori (VU, Computer Science)Aad van der Vaart (VU, Mathematics)Gerrit Meijer (VUMC)Bauke Ylstra (VUMC)Marjan Weiss (VUMC)
![Page 2: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/2.jpg)
2
Tumor Cell
Chromosomes of tumor cell:
![Page 3: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/3.jpg)
3
CGH Data
Clones/Chromosomes
Copy#
![Page 4: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/4.jpg)
4
Naïve Smoothing
![Page 5: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/5.jpg)
5
“Discrete” Smoothing
Copy numbers are integers
![Page 6: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/6.jpg)
6
Why Smoothing ?• Noise reduction
• Detection of Loss, Normal, Gain, Amplification
• Breakpoint analysis
Recurrent (over tumors) aberrations may indicate:–an oncogene or –a tumor suppressor gene
![Page 7: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/7.jpg)
7
Is Smoothing Easy?
Measurements are relative to a reference sample
Printing, labeling and hybridization may be uneven
Tumor sample is inhomogeneous
•vertical scale is relative
•do expect only few levels
![Page 8: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/8.jpg)
8
Smoothing: example
![Page 9: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/9.jpg)
9
Problem Formalization
A smoothing can be described by• a number of breakpoints • corresponding levels
A fitness function scores each smoothing according to fitness to the data
An algorithm finds the smoothing with the highest fitness score.
![Page 10: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/10.jpg)
10
Smoothing
breakpoints
levelsvariance
![Page 11: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/11.jpg)
11
Fitness Function
We assume that data are a realization of a Gaussian noise process and use the maximum likelihood criterion adjusted with a penalization term for taking into account model complexity
We could use better models given insight in tumor pathogenesis
![Page 12: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/12.jpg)
12
Fitness Function (2)CGH values: x1 , ... , xn
breakpoints: 0 < y1< … < yN < xN
levels:
error variances:
likelihood:
![Page 13: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/13.jpg)
13
Fitness Function (3)
Maximum likelihood estimators of μ and 2 can be found explicitly
Need to add a penalty to log likelihood tocontrol number N of breakpoints
penalty
![Page 14: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/14.jpg)
14
Algorithms
Maximizing Fitness is computationally hard
Use genetic algorithm + local search to find approximation to the optimum
![Page 15: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/15.jpg)
15
Algorithms: Local Search
choose N breakpoints at random
while (improvement)
- randomly select a breakpoint
- move the breakpoint one position to left
or to the right
![Page 16: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/16.jpg)
16
Genetic Algorithm
Given a “population” of candidate smoothings create a new smoothing by
- select two “parents” at random from population- generate “offspring” by combining parents
(e.g. “uniform crossover” or “union”)- apply mutation to each offspring- apply local search to each offspring- replace the two worst individuals with the offspring
![Page 17: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/17.jpg)
17
Experiments• Comparison of
– GLS
– GLSo
– Multi Start Local Search (mLS)
– Multi Start Simulated Annealing (mSA)
• GLS is significantly better than the other algorithms.
![Page 18: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/18.jpg)
18
Comparison to Expert
expert
algorithm
![Page 19: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/19.jpg)
19
Relating to Gene Expression
![Page 20: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/20.jpg)
20
Relating to Gene Expression
![Page 21: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/21.jpg)
21
Algorithms forSmoothing Array CGH data
Kees Jong (VU, CS and Mathematics)Elena Marchiori (VU, CS)Aad van der Vaart (VU, Mathematics)Gerrit Meijer (VUMC)Bauke Ylstra (VUMC)Marjan Weiss (VUMC)
![Page 22: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/22.jpg)
22
![Page 23: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/23.jpg)
23
Conclusion
• Breakpoint identification as model fitting to search for most-likely-fit model given the data
• Genetic algorithms + local search perform well• Results comparable to those produced by hand
by the local expert• Future work:
– Analyse the relationship between Chromosomal aberrations and Gene Expression
![Page 24: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/24.jpg)
24
Example of a-CGH Tumor
Clones/Chromosomes
Value
![Page 25: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/25.jpg)
25
a-CGH vs. Expression
a-CGH• DNA
– In Nucleus
– Same for every cell
• DNA on slide• Measure Copy
Number Variation
Expression• RNA
– In Cytoplasm
– Different per cell
• cDNA on slide• Measure Gene
Expression
![Page 26: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/26.jpg)
26
Breakpoint Detection
• Identify possibly damaged genes:– These genes will not be expressed anymore
• Identify recurrent breakpoint locations:– Indicates fragile pieces of the chromosome
• Accuracy is important:– Important genes may be located in a region
with (recurrent) breakpoints
![Page 27: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/27.jpg)
27
Experiments
• Both GAs are Robust:– Over different randomly initialized runs breakpoints
are (mostly) placed on the same location
• Both GAs Converge:– The “individuals” in the pool are very similar
• Final result looks very much like (mean error = 0.0513) smoothing conducted by the local expert
![Page 28: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/28.jpg)
28
Genetic Algorithm 1 (GLS)
initialize population of candidate solutions randomly
while (termination criterion not satisfied)
- select two parents using roulette wheel
- generate offspring using uniform crossover
- apply mutation to each offspring
- apply local search to each offspring
- replace the two worst individuals with the offspring
![Page 29: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/29.jpg)
29
Genetic Algorithm 2 (GLSo)
initialize population of candidate solutions randomly
while (termination criterion not satisfied)
- select 2 parents using roulette wheel
- generate offspring using OR crossover
- apply local search to offspring
- apply “join” to offspring
- replace worst individual with offspring
![Page 30: Algorithms for Smoothing Array CGH data](https://reader035.vdocuments.site/reader035/viewer/2022062410/568159a5550346895dc7038e/html5/thumbnails/30.jpg)
30
Fitness function (2)CGH values: x1 , ... , xn
breakpoints: 0 < y1< … < yN < xN
likelihood:
levels:
error variances: