analysis of drug-gene interaction data florian ganglberger sebastian nijman lab

70
Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Analysis of Drug-Gene Interaction Data

Florian GanglbergerSebastian Nijman Lab

Page 2: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Nijman Lab

Page 3: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Nijman Lab

• Working on specialised target-oriented cancer therapies

• Cancer = cell mutation

Drug

Mutation

Mutation

Mutation

Mutation

Mutation

Mutation

Drug

Drug

Drug

Page 4: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Motivation

• Testing various drugs on various mutated cells

• 100 drugs vs 100 mutations = 10.000 interactions

• Analyse the generated data to find new treatments

Page 5: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Overview

• Background– Biological Background– Technical Procedure– Initial State– Special Aspects– Previous Approach

• Analysis– Explorative Data Analysis– Drug Noisiness

Data generation

Page 6: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Overview

• Hit detection– Statistical Methods– Filtering Methods– The Algorithm– Evaluation of the result

Page 7: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Biological Background

• Idea behind cancer treatment– Kill cancer cells while leaving normal cells

alive

• Common chemotherapies– Kill cells with higher division rate– Problem: moth-, throat-, bowel-mucosa and

hair cells– Feel sick, loosing hair etc.

Page 8: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Biological Background

• Synthetic lethality approach– Some biochemical process which are

necessary for cell growth are redundant– e.g. DNA repair– Biochemical processes are chained

= “protein pathway”

Page 9: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Protein pathways

Protein A

Protein B

Protein C

Cell growth

Drug Gene

Page 10: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Synthetic lethality

• Choose a cancer which has a mutation of a gene in one of that pathways

• Find a drug which inhibits the other pathway

Page 11: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Synthetic lethality

• Produce cells with mutations which are normally present in cancer

• Find drug• Possible that this will work in real cancer

– Tumours have more than one mutation can influence each other

Page 12: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Technical Procedure

• Standard dataset consists of 38.400 interactions

• 96 drugs x 100 mutations x 4 • Testing would be inefficient

Page 13: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Technical Procedure

• Idea: Testing different cell lines in one well

384 wells

Page 14: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Before the experiment

Page 15: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Before the experiment

Page 16: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

After the experiment

• Copy the barcodes of the cells by a polymerase chain reaction (PCR) amplifies the signal

• Adding a vitamin to the barcode which can stick on a dye-containing protein

• Amount of barcode correlates with the amount of remaining cells

Page 17: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

After the experiment

Page 18: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Allocation

• Red and infrared emitted light barcode mutation

• Green reflected light cell amount – Arbitrary unit which correlates with the cell

amount– Called “Reporter”

• Drug because of the used well

Page 19: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Initial state

• Because drugs are dissolved in a dilution, we can use wells without drugs use as control

Page 20: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Back to statistics....

Page 21: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Special Aspects

• Biological and technical factors cause noisy and not directly usable data Inter- and intraindividual variability

Page 22: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Interindividual Variability

• Variability between observation units• Cells with the same mutation = one

observation unit = “one virtual cancer patient”

• Variation among different mutated cells• Reasons

– Mutations can be toxic itself – Characteristics of the technical process

Page 23: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Interindividual Variability

• Average amount of remaining mutations

Page 24: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Variability of Technical Procedure

• Limited precision– Precision of drug dosing– Precision of cell amount– Quality of the measurement equipment

• Decreased sensitivity to a lower signal– Detection limit– Killed cells don’t get a zero signal

background noise with different variability

Page 25: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Variability of Technical Procedure

• Amplification problems– Copying the barcodes by PCR needs material – If some cell lines are completely killed

more material for other cell lines higher amplification of survived cells

Page 26: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Amplification Problems

Page 27: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Previous Approach

• Visual method, based on scatter plots• Identify outliers visually

Page 28: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Previous Approach

1. Calculating the effect1. Median normalization of drugs

2. Calculate a relative ratio

Page 29: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

• Plotting the ratio against the median of a mutation

Previous Approach

Page 30: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 31: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 32: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 33: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

There are some problems....

• If two lines overlap, hits can be obscured• No comparable value that estimates the

significance of outliers• Intraindividual variability referred to

replicates is ignored• Human errors outlier-detection is

subjective• Slow, not automatable method

Page 34: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Overview

• Background– Biological Background– Technical Procedure– Initial State– Special Aspects– Previous Approach

• Analysis– Explorative Data Analysis– Drug Noisiness

Page 35: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Explorative Data Analysis

• Necessary for hit detection• Analysis of the behaviour of the data• Closer look at

– Distribution of mutations– Variability of mutations and replicates– Skewness of mutations– Noisiness of Drugs

Page 36: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Distribution of Mutations

• Choosing the right statistical test• Test will be applied on mutations to see

which drug works best• Effect is point of interest Matrix of

relative ratios

Page 37: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 38: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Variability of Mutations

• Decreased sensitivity to lower signal• Maybe a detection limit• Spread vs Level plot

Page 39: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 40: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 41: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Replicate Variability

• Important factor is the multiple testing of cells by the same drugs.

• Indicator for accurateness and reproducibility of the technical procedure.

Page 42: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 43: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 44: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Skewness of Mutations

• Another indicator for different behaviour below the threshold

• Right skewed distributions because of background noise in lower signal

Page 45: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 46: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Drug Noisiness

• Nothing to do with background noise• Caused by technical procedure

– Overdosing of cells or drugs– Toxicity (“Dosis facit venenum“)

• Different effect– Strong resistance– Strong sensitivity

Page 47: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Amplification Problems

Page 48: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 49: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Strong Noisiness

• Easy to identify• Dedicated outliers• High amount of false positive hits• Idea: Noisiness causes weak correlation

to the control

Page 50: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 51: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 52: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Weak Noisiness

• Also numerous differences in sensitivity or resistance

• Contrast to normal drugs is not well defined

• Visual methods failed• Also a lot of false positive hits

Page 53: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 54: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Strong Noisiness vs Weak Noisiness

Page 55: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Overview

• Hit detection– Statistical Methods– Filtering Methods– The Algorithm– Evaluation of the result

Page 56: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Hit detection

• Definition of a Hit– Indicate synthetic lethality – Resistance is also interesting from a

biological point of view– Not noisy

• 2 Stages:1.Finding potential hits 2.Filtering false-positive hits and incomparable

data

Page 57: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Statistical Test

• Mutations not normally distributed• Compare the 4 replicates to their

mutation• Mann-Whithney u-test

– Compares two medians – Needs approximately identical distribution

form of random variables X and Y– No symmetry or normal distribution needed

Page 58: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Statistical Test

• Disadvantages– Rank-sum tests are based on the order, not

on the magnitudes– Weak outlying interactions get the same p-

values as strong outliers– P-values are not interindividual comparable,

but the significance is an indicator for it.– Strong noisy drugs are usually extreme

outliers reduce the significance

Page 59: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Multiple testing

• Multiple testing of interactions against their mutations

• Increases the error• 100 different interactions• =

Page 60: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Multiple testing

• Bonferroni correction needed• How to achieve significant results?

– Calculate the median of replicates– Testing just the upper and lower 10% of the

data

Page 61: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Filtering Drugs

• Filtering strong noisy drugs by correlation coefficient

• Filter before the test to increase the significance

• Note: Drugs shouldn’t be filtered automatically, just identified. If drugs are toxic or not is the decision of a biologist

Page 62: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Filtering strong noisy drugs

Page 63: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Filtering weak noisy drugs

• Much harder to identify • Idea: Weak noisy drugs producing many

false-positive hits with high significance– Calculating p-value– Order by significance– Frequency of drugs in the top hits is an

indicator for weak noisiness

Page 64: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Top Drugs

Page 65: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Filter Mutations

Filter data below a detection limitIdeas• Filter by threshold: 30% of the data

just one dataset no universal validity of the threshold about 250

• Filter by skewness: 17% of the data• Filter by variationcoefficient 12%

Page 66: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 67: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Threshold Estimation

• Idea: Modification of skewness filter method

• Outliers of skewness are below the threshold

• Last non-outlier above the skewness outliers are normal data

• Threshold should be approximately in the middle of these points

Page 68: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
Page 69: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

The Algorithm

• R-Demo

Page 70: Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Results