msc. thesis presentation - aalto · 2009-09-03 · bioinformatics i bioinformatics analyses...
TRANSCRIPT
![Page 1: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/1.jpg)
MSc. thesis presentation
Tommi Suvitaival
3.9.2009
![Page 2: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/2.jpg)
I Title: Bayesian Two-Way Analysis of High-DimensionalCollinear Metabolomics Data
I Instructor: MSc. Ilkka Huopaniemi
I Supervisor: Prof. Samuel Kaski
![Page 3: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/3.jpg)
Contents
I Introduction to analysis of high-throughput biological data
I The focus is in metabolomics and multi-way analysis
I A new method is proposed and applied to biological data
![Page 4: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/4.jpg)
Bioinformatics
I Bioinformatics analyses observations from biological organisms
I Analysis is performed using computational and statisticalmethods
I Lines of bioinformatics study genome, gene activity, proteinconcentration and metabolite concentration.
I Aim at gaining new knowledge on functioning of the biologicalsystem
I Often motivated by an interest in finding an explanation to adisease
![Page 5: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/5.jpg)
Metabolomics
I A line of bioinformatics studying concentrations of smallmolecules, metabolites
I Metabolite is a substrate or product of a biological processthat is catalysed by proteins
I Lipids are a sub-group of metabolites
I Lipids take part in many important biological processes, suchas cell signaling
I Changes in lipid concentrations are related to many metabolicdiseases, such as diabetes
![Page 6: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/6.jpg)
Experiment setup in bioinformatics
I High-throughput measurements produce observations fromlarge numbers of features
I n < p problem: less samples than features in the data
I Number of samples is low due to high financial and ethicalcosts
I In metabolomic data, one feature corresponds toconcentration of one metabolite
I One sample is a vector of features measured from one patienton one occasion
![Page 7: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/7.jpg)
A metabolomic data set (1)
Figure: An example data matrix, where patients have two treatments.
![Page 8: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/8.jpg)
A metabolomic data set (2)
Figure: Simulated data. Can you identify treatment effects?
![Page 9: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/9.jpg)
Traditional solutions
I ANOVA (analysis of variance): univariate method handlingone feature at a time
I MANOVA (multivariate analysis of variance): multivariate butnon-functioning for n < p data
![Page 10: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/10.jpg)
Bayesian method: justification
I To deal with the n < p problem
I To estimate uncertainty of the model
I To bring prior knowledge into the model
![Page 11: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/11.jpg)
Bayesian method: clustering and multi-way analysis
I Features are clustered according to similarity
I Common treatment effects for each cluster are estimated
![Page 12: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/12.jpg)
Bayesian method vs. a traditional approach
normalizationdimensionality
reductionmulti-wayanalysis
data knowledge> > > >
Figure: The usual process of high-throughput data analysis
I The proposed model includes all three steps
I Instead of performing the steps sequentially, they are donesimultaneously within the model
![Page 13: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/13.jpg)
Bayesian method: the plate graph
Figure: The plate graph
![Page 14: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/14.jpg)
Type 1 diabetes study (1)
I Finnish children were screened for type 1 diabetes
I The children were monitored 1 to 4 times a year
I Certain antibody levels in blood were measured
I These antibodies are useful in indicating the onset of thedisease
I It is already too late to prevent the disease at the time theantibodies emerge
![Page 15: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/15.jpg)
Type 1 diabetes study (2)
I Could be detected earlier from the metabolic profile?
I Around 100 children took part in a more detailed study, wherelipid profiles were measured from blood serum
I 53 lipids were identified
I Only 54 patients were included in analysis due to missing timepoints
I The Bayesian method was used to find possible predictors ofthe disease
![Page 16: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/16.jpg)
Results with a lipidomic data set (1)
Figure: Estimated treatment effects of a two-way data set
![Page 17: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/17.jpg)
Results with a lipidomic data set (2)
Figure: Estimated time and time-disease interaction effect of a timeseries data set
![Page 18: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/18.jpg)
Results with simulated data
Figure: Estimated treatment effects as function of sample-size
![Page 19: MSc. thesis presentation - Aalto · 2009-09-03 · Bioinformatics I Bioinformatics analyses observations from biological organisms I Analysis is performed using computational and](https://reader034.vdocuments.site/reader034/viewer/2022042101/5e7d460ef4e63e1b0a0087d2/html5/thumbnails/19.jpg)