correction des résultats expérimentaux lors du processus de criblage à haut débit
DESCRIPTION
Correction des résultats expérimentaux lors du processus de criblage à haut débit. Vladimir Makarenkov Université du Québec à Montréal ( UQAM ) Montréal, Canada. Talk outline. What is HTS ? Hit selection Random and systematic error Statistical methods to correct HTS measurements: - PowerPoint PPT PresentationTRANSCRIPT
1
Correction des résultats expérimentaux lors du processus de
criblage à haut débit
Vladimir MakarenkovUniversité du Québec à Montréal
(UQAM)Montréal, Canada
2
Talk outline
What is HTS ? Hit selection Random and systematic error Statistical methods to correct HTS measurements:
Removal of the evaluated background Well correction
Comparison of methods for HTS data correction and hit selection
Conclusions
3
What is HTS?
High-throughput screening (HTS) is a large-scale process to screen thousands of chemical compounds to identify potential drug candidates (i.e. hits).
Malo, N. et al., Nature Biotechnol. 24, no. 2, 167-175 (2006).
A biological assay(specific target &
reagents)
Hits
Leads
Drug
Confirmed hits
Primary screen
Secondary screen& counter screen
Structure-activity relationships (SAR)& medicinal chemistry
Clinical trials(phase 1, 2, 3)
A large library ofchemical compounds
4
What is HTS? An HTS procedure consists of
running chemical compounds arranged in 2-dimensional plates through an automated screening system that makes experimental measurements.
5
What is HTS?
Samples are located in wells The plates are processed in sequence Screened samples can be divided into active (i.e.
hits) and inactive ones Most of the samples are inactive The measured values for the active samples are
substantially different from the inactive ones
6
HTS technology
Automated plate handling system and plates of SSI Robotics
7
Classical hit selection
Compute the mean value and standard deviation of the observed plate (or of the whole assay)
Identify samples whose values differ from the mean by at least c (where c is a preliminary chosen constant)
For example, in case of an inhibition assay, by choosing c = 3, we select samples with values lower than - 3
8
Random and systematic error
Random error : a variability that randomly deviates all measured values from their “true” levels. This is a result of random influences in each particular well on a particular plate (~ random noise).
Usually, it cannot be detected or compensated.
Systematic error : a systematic variability of the measured values along all plates of the assay.
It can be detected and its effect can be removed from the data.
9
Sources of systematic error
Various sources of systematic errors can affect experimental HTS data, and thus introduce a bias into the hit selection process (Heuer et al. 2003), including:
Systematic errors caused by ageing, reagent evaporation or cell decay. They can be recognized as smooth trends in the plate means/medians.
Errors in liquid handling and malfunction of pipettes can also generate localized deviations from expected values.
Variation in incubation time, time drift in measuring different wells or different plates, and reader effects.
10
Random and systematic error
Examples of systematic error (computed and plotted by HTS Corrector):
HTS assay, Chemistry Department, Princeton University : inhibition of the glycosyltransferase MurG function of E. coli., 164 plates, 22 rows and 16 columns.
20
22
24
26
28
30
32
34
36
38
40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Row number
Mea
n n
um
ber
of
hit
s p
er w
ell
(a)
20
22
24
26
28
30
32
34
36
38
40
1 3 5 7 9 11 13 15 17 19 21
Column number
Mea
n n
um
ber
of
hit
s p
er w
ell
(b)
Hit distribution by rows and columns of Assay 1 computed for 164 plates. Hits were selected with the threshold -1.
11
Hit distribution surface of Assay 1
(a) raw data (b) approximated data 1
4
7
10
13
16
19
22
1
6
11
16
10
15
20
25
30
35
40
45
50
55
Num
ber
of h
its
Column
Row
(a)
1
4
7
10
13
16
19
22
1
6
11
16
10
15
20
25
30
35
40
45
50
55
Num
ber
of h
its
Column
Row
(b)
12
Raw and evaluated background: an example
Computed and plotted by HTS Corrector
13
Systematic error detection tests comparison
Simulation 1. Plate Size: 96 wells – Cohen’s Kappa vs Error Size (from Dragiev et al., 2011)Systematic error size: 10% (at most 2 columns and 2 rows affected). First column: (a) - (c): a = 0.01; Second column: (d) - (f): a = 0.1. Systematic Error Detection Tests: () t-test and () K-S test.
14
Systematic error detection tests comparison
Simulation 2, Plate Size: 96 wells, Cohen’s Kappa vs Hit Percentage (from Dragiev et al., 2011)Systematic error size: 10% (at most 2 columns and 2 rows affected). First column: cases (a) - (b): a = 0.01; Second column: cases (c) - (d): a = 0.1. Systematic Error Detection Tests: () t-test, () K-S test and () goodness-of-fit test.
2
15
Hit distribution surface of Assay 1
(a) raw data (b) approximated data 1
4
7
10
13
16
19
22
1
6
11
16
10
15
20
25
30
35
40
45
50
55
Num
ber
of h
its
Column
Row
(a)
1
4
7
10
13
16
19
22
1
6
11
16
10
15
20
25
30
35
40
45
50
55
Num
ber
of h
its
Column
Row
(b)
16
Data normalization (called z-score normalization)
Applying the following formula, we can normalize the elements of the input data:
where xi - input element, x'i - normalized output element, - mean value, - standard deviation, and n – total number of elements in the plate.
The output data conditions will be x’=0 and x’=1.
ii
xx'
17
Evaluated background
An assay background can be defined as the mean of normalized plate measurements, i.e.
where :
x'i,j - normalized value in well i of plate j,
zi - background value in well i, N - total number of plates.
N
jjii x
Nz
1,'
1
18
Removing the evaluated background: main steps
Normalization of the HTS data by plate
Elimination of the outliers
Computation of the evaluated background
Elimination of systematic errors by subtracting the
evaluated background from the normalized data
Selection of hits in the corrected data
19
Removing the evaluated background: hit distribution surface before and after the correction
20
Hit distribution after the correction HTS assay, Chemistry Department, Princeton University : inhibition of the
glycosyltransferase MurG function of E. coli., 164 plates, 22 rows and 16 columns.
Hit distribution by rows in Assay 1 (164 plates): (a) hits selected with the threshold -1 ; (b) hits selected with the threshold -2
20
22
24
26
28
30
32
34
36
38
40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Row number
Mea
n n
um
ber
of
hit
s p
er w
ell
No correction
Evaluated background removed
Approximated background removed
(a)
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Row number
Mea
n n
um
ber
of
hit
s p
er w
ell No correction
Evaluated background removed
Approximated background removed
(b)
21
Well correction method: main ideas
Once the data are plate normalized, it is possible to analyze their values in each particular well along the entire assay
The distribution of inactive measurements along a fixed well should be zero-mean centered (if there is no systematic error) and the compounds are randomly distributed along this well
Values along each well may have ascending and descending trends that can be detected via a polynomial approximation
22
Example of systematic error in the McMaster dataset
12
34
56
78
910
12
34
56
78
0
50
100
150
200
250
300
Nu
mb
er
of
hit
s
Column Row
Hit distribution surface for a McMaster dataset (1250 plates - a screen of compounds that inhibit the Escherichia coli dihydrofolate reductase). Values deviating from the plate means for more than 1 standard deviation (SD) were taken into account during the computation. Well, row and column positional effects are shown.
23
-3
-2
-1
0
1
2
3
0 200 400 600 800 1000 1200
Plate number
Varia
tio
n o
f valu
es
Experimental trend
Evaluated trend
-3
-2
-1
0
1
2
3
0 200 400 600 800 1000 1200
Plate number
Varia
tio
n o
f valu
es
Experimental trend
Evaluated trend
(a)
(b)
Well correction method: an example McMaster University HTS laboratory screen of compounds to inhibit the
Escherichia coli dihydrofolate reductase. 1250 plates with 8 rows and 10 columns.
24
Well correction method: main steps
Normalize the data within each plate (plate normalization)
Compute the trend along each well using polynomial approximation
Subtract the obtained trend from the plate normalized values
Normalize the data along each well (well normalization)
Reexamine the hit distribution surface
25
Hit distribution for different thresholds for the raw and corrected McMaster data
12 3
45
67
89
101
23
45
67
80
20
40
60
80
100
120
Nu
mb
er
of
hit
s
Column Row
(c)
12 3
45
67
89
101
23
45
67
80
20
40
60
80
100
120
Nu
mb
er
of
hit
s
Column Row
(d)
12 3
45
67
89
101
23
45
67
80
10
20
30
40
50
Nu
mb
er
of
hit
s
Column Row
(e)
12 3
45
67
89
101
23
45
67
80
10
20
30
40
50
Nu
mb
er
of
hit
s
Column Row
(f)
12 3
45
67
89
101
23
45
67
80
50
100
150
200
250
300
Nu
mb
er
of
hit
s
Column Row
(a)
12 3
45
67
89
101
23
45
67
80
50
100
150
200
250
300
Nu
mb
er
of
hit
sColumn Row
(b)
Hit distributions for the raw (a, c, and e) and well-corrected (b, d, and f) McMaster datasets (a screen of compounds that inhibit the Escherichia coli dihydrofolate reductase; 1250 plates of size 8x10) obtained for the thresholds:
(a and b) : (c and d) : – 1.5(e and f) : - 2
26
Simulations with random data: Constant noiseComparison of the hit selection methods
10
20
30
40
50
60
70
80
90
100
Noise coefficient
Hit
det
ecti
on r
ate
(in
%)
8
9
10
11
12
13
14
Noise coefficientF
alse
pos
itiv
e ra
te (
in %
)
Hit detection rate False positives
� - classical hit selection
O - background subtraction method ∆ - well correction method
27
Simulations with random data: Variable noiseComparison of the hit selection methods
� - classical hit selection
Hit detection rate False positives
O - background subtraction method ∆ - well correction method
10
20
30
40
50
60
70
80
90
100
Systematic error
Tru
e p
osit
ive
rate
(in
%)
0 5 /10/10 2 /10 3 /10 4 /108
10
12
14
16
18
20
Systematic error
Fal
se p
osit
ive
rate
(in
%)
0 5 /10/10 2 /10 3 /10 4 /10
(a) (b)
28
Comparison of methods for data correction in HTS (1)
Method 1. Classical hit selection using the value - c as a hit selection threshold, where the mean value and standard deviation are computed separately for each plate and c is a preliminary chosen constant (usually c equals 3).
Method 2. Classical hit selection using the value - c as a hit selection threshold, where the mean value and standard deviation are computed over all the assay values, and c is a preliminary chosen constant. This method can be chosen when we are certain that all plates of the given assay were processed under the “same testing conditions”.
29
Comparison of methods for data correction in HTS (2)
Method 3. Median polish procedure (Tukey 1977) can be used to remove the impact of systematic error. Median polish works by alternately removing the row and column medians. In our study, Method 2 was applied to the values of the matrix of the obtained residuals in order to select hits.
Method 4 (designed at Merck Frost). B score normalization procedure (Brideau et al., 2003) is designed to remove row/column biases in HTS (Malo et al., 2006). The residual (rijp) of the measurement for row i and column j on the p-th plate is obtained by median polish. The B score is calculated as follows:
B score =
where MADp = median{|rijp – median(rijp)|}. To select hits, this computation was followed by Method 2, applied to the B score matrix.
)4826.1( p
ijp
MAD
r
30
Comparison of methods for data correction in HTS (3)
Method 5. Well correction procedure followed by Method 2 applied to the well-corrected data. The well correction method consists of two main steps:
1. Least-squares approximation of the data carried out separately for each well of the assay.
2. Z score normalization of the data within each well across all plates of the assay
31
Simulation design Random standard normal datasets N(0, 1) were generated. Different percentages of hits (1% to 5%) were added to them. Systematic error following a standard normal distribution
with parameters N(0, c), where c equals 0, 0.6, 1.2, 1.8, 2.4, or 3 was added to them. Two types of systematic error were :
A. Systematic errors stemming from row x column interactions: Different constant values were applied to each row and each column of the first plate. The same constants were added to the corresponding rows and columns of all other plates.
B. Systematic error stemming from changing row x column interactions: As in (A) but with the values of the row and column constants varying across plates.
32
Systematic error stemming from constant row x column interactions
A. Systematic errors stemming from row x column interactions: Different constant values were applied to each row and each column of the first plate. The same constants were added to the corresponding rows and columns of all other plates.
33
Results for systematic error stemming from row x column interactions which are constant across plates (SD = )
30
40
50
60
70
80
90
100
0.0SD 0.6SD 1.2SD 1.8SD 2.4SD 3.0SD
Systematic error
De
tec
tio
n r
ate
(%
)
(a)
50
60
70
80
90
100
0.5% 1.0% 1.5% 2.0% 2.5% 3.0%
Hit percentage
Det
ecti
on
rat
e (%
)
(b)
0
100
200
300
400
500
600
700
0.0SD 0.6SD 1.2SD 1.8SD 2.4SD 3.0SD
Systematic error
FP
+ F
N
(c)
0
250
500
750
1000
1250
1500
0.5% 1.0% 1.5% 2.0% 2.5% 3.0%
Hit percentage
FP
+ F
N
(d)
The results were obtained with the methods using plates’ parameters (i.e., Method 1, ◊), assay parameters (i.e., Method 2, □), median polish (x), B score (○), and well correction (). The abscissa axis indicates the noise factor (a and c - with fixed hit percentage of 1%) and the percentage of added hits (b and d - with fixed
error rate of 1.2SD).
34
Systematic error stemming from varying row x column interactions
B. Systematic error stemming from changing row x column interactions: As in (A) but with the values of the row and column constants varying across plates.
35
Results for systematic error stemming from row x column interactions which are varying across plates (SD = )
The results were obtained with the methods using plates’ parameters (i.e., Method 1, ◊), assay parameters (i.e., Method 2, □), median polish (x), B score (○), and well correction (). The abscissa axis indicates the noise factor (a and c - with fixed hit percentage of 1%) and the percentage of added hits (b and d - with fixed
error rate of 1.2SD).
30
40
50
60
70
80
90
100
0.0SD 0.6SD 1.2SD 1.8SD 2.4SD 3.0SD
Systematic error
De
tec
tio
n r
ate
(%
)
(a)
50
60
70
80
90
100
0.5% 1.0% 1.5% 2.0% 2.5% 3.0%
Hit percentage
De
tec
tio
n r
ate
(%
)
(b)
0
100
200
300
400
500
600
700
0.0SD 0.6SD 1.2SD 1.8SD 2.4SD 3.0SD
Systematic error
FP
+ F
N
(c)
0
250
500
750
1000
1250
1500
0.5% 1.0% 1.5% 2.0% 2.5% 3.0%
Hit percentage
FP
+ F
N
(d)
36
HTS Corrector Software
Visualization of HTS assays
Data partitioning using k-means
Evaluation of the background surface
Correction of experimental datasets
Hit selection using different methods
Chi-square analysis of hit distribution
Main features
Contact: [email protected]
Download from: http://www.info2.uqam.ca/~makarenv/HTS/home.php
37
Conclusions The proposed well correction method (Makarenkov et al, 2007)
rectifies the distribution of assay measurements by normalizing data within each considered well across all assay plates.
Simulations suggest that the well correction procedure is a robust method that should be used prior to the hit selection process.
When neither hits nor systematic errors were present in the data, the well correction method showed the performances similar to the traditional methods of hit selection.
Well correction generally outperformed the Median polish and B score methods as well as the classical hit selection procedure.
The simulation study confirmed that Method 2 based on the assay parameters was more accurate than Method 1 based on the plates’ parameters. Therefore, in case of identical testing conditions for all plates of the given assay, all assay measurements should be treated as a single batch.
38
Future research
Application of clustering methods for hit selection in HTS
Incorporation into the method the information about chemical properties of the tested compounds
Possible combinations with the virtual HTS screening (vHTS) information and combinatorial chemistry models
39
References Brideau C., Gunter B., Pikounis W., Pajni N. and Liaw A. (2003) Improved statistical
methods for hit selection in HTS. J. Biomol. Screen., 8, 634-647. Dragiev, P., Nadon, R. and Makarenkov, V. (2011). Systematic error detection in
experimental high-throughput screening, BMC Bioinformatics, 12:25. Heyse S. (2002) Comprehensive analysis of high-throughput screening data. In Proc.
of SPIE 2002, 4626, 535-547. Kevorkov D. and Makarenkov V. (2005) Statistical analysis of systematic errors in
HTS. J. Biomol. Screen., 10, 557-567. Makarenkov V., Kevorkov D., Zentilli P., Gagarin A., Malo N. and Nadon R. (2006)
HTS-Corrector: new application for statistical analysis and correction of experimental data, Bioinformatics, 22, 1408-1409.
Makarenkov V., Zentilli P., Kevorkov D., Gagarin A., Malo N. and Nadon R. (2007) An efficient method for the detection and elimination of systematic error in HTS, Bioinformatics, 23, 1648-1657.
Malo N., Hanley J.A., Cerquozzi S., Pelletier J. and Nadon R. (2006) Statistical practice in HTS data analysis. Nature Biotechnol., 24, 167-175.
Tukey J.W. (1977) Exploratory Data Analysis. Cambridge, MA: Addison-Wesley.