Real-Time Quantitative PCR Data Analysis
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
Josep Lluís Mosquera
UNITAT D’ESTADÍSTICA I BIOINFORMÀTICA
● Recapitulation● Normalization
○ Absolute Quantification○ Relative Quantification
● Data Analysis Pipeline● Software
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
OUTLINE
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RECAPITULATION (1): Basic Concepts
● RT-qPCR is a method for determining the amount of nucleic acid present in a sample.
● ∆Rn: increment of fluorescent signal at each time point.
● Baseline: cycles in which a signal is accumulating but is beneath the limits of detection.
● Threshold: arbitrary level of fluorescence chosen on the basis of the baseline variability.
● Ct: the fractional PCR cycle number at which the fluorescence is greater than the threshold.
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RECAPITULATION (2): Basic Equations
● Target Reporter Fluorescence is determined by
● Amplification Efficiency (at threshold)
● Fluorescence increase id proportional to the amount of target DNA
( )EexpCt
oCt+⋅= 1RR
( )110
1
−=− sE exp
RkI Ct⋅=
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
PIPELINE OF RT-QPCR DATA ANALYSIS
1. Quality assessment2. Normalisation3. Data visualisation4. Testing for statistical significance5. Anotation/Mapping features
QUALITY ASSESSMENT
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
NORMALIZATION
� When analyzing results of RT-qPCR assays you are faced with several uncontrolled variables, which can lead to misinterpretation of the results.
� Uncontrolled variation:� The amount of starting material� Enzymatic efficiencies� Differences between: tissues, individuals, experimental conditions� …
� To correct systematic variationBUT NOT biological variation ⇒ NORMALIZATION
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
NORMALIZATION: Methods
� The most commonly known and used methods of normalization:
� Normalization to the original number of cells
� Normalization to the total RNA mass
� Normalization to one or more housekeeping genes
� Normalization to an internal or external calibrator
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
NORMALIZATION: Quantification Methods
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
BIOLOGICAL QUESTIONS:
If I’d like to know…
1) the number of viral particles in a given amount of blood, or
2) the fold change of p53 mRNA in an “equivalent amount” ofcancerous vs. normal tissue
ANALYSIS METHODS:
… what can I do?
1) Absolute Quantification, or
2) Relative Quantification
… are commonly used to address with these two scenarios
NORMALIZATION: Biological Meaning and Quantification Methods
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
ABSOLUTE QUANTIFICATION
● Absolute quantification requires a standard curve of known copy numbers● It can be constructed using several standards
Most frequently used quantification standards. From Nucleic Acid Research Group, (NARG) survey 2007, http://www.abrf.org/NARG/
● Absolute quantification is achieved by comparing CT valuesof each sample to a standard curve
● Standard curve is obtained by
● Using different known concentrations,● for which CT are calculated● and plotted vs the (log) (known) quantity
DATA ANALYSIS: Absolute Quantification. Standard Curve
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
EXAMPLE:
● Determining Absolute Copy Number from Absolute Quantification
● The standard curve is used only for interpolation but not forextrapolation (relation may not be linear outside the limits tested)
SAMPLE REPLICATE Ct COPIES
A 1 18.61 204.577
A 2 18.41 234.115
A 3 19.87 172.300
Average 203.664 ± 30.917
B 1 17.06 564.789
B 2 17.07 563.823
B 3 17.00 591.173
Average 574.928 ± 14.381
DATA ANALYSIS: Absolute Quantification. Standard Calibration Curve
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION (1)
● Relative quantification is the most widely used technique.
● Gene expression levels are calculated by the ratio between the amount of target gene and an endogenous reference gene, which ispresent in all samples.
● The reference gene has to be chosen so that its expression does not change under the experimental conditions or between different tissues (Cook NL et al., 2008).
● There are simple and more complex methods for relative quantification, depending on the PCR efficiency, and the number of reference genes used.
� Most common approaches are
� Livak or ∆∆Ct method� Pfafl method� Relative Standard Curve Method
RELATIVE QUANTIFICATION (2)
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION: Delta delta Ct (∆∆Ct) method
● The simplest one: a direct comparison of Ct values, target gene vsreference gene.
● PCR efficiencies of both should be
• close to 100 % and • not differ by more than 10 %.
● Involves the choice of a calibrator sample
• the untreated sample,• the time = 0 sample, or • Any sample you want to compare your unknown to.
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION: Delta delta Ct (∆∆Ct) method
1) Normalize ∆Ct of the target gene to the reference gene is calculated for each sample
∆Ct = Cttarget – Ctreference
2) Normalize the ∆Ct of the test sample to the ∆Ct of the calibrator
∆∆Ct = (Cttarget – Ctreference)test – (Cttarget – Ctreference)calibrator
3) Calculate the fold difference in expression
2-∆∆Ct = normalized expression ratio
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION: Delta delta Ct (∆∆Ct) method
EXAMPLE:
1) ∆Ctcalibrator = 15.0 – 16.5 = -1.5 and ∆Cttest = 12.0 – 15.9 = -3.9
2) ∆∆Ct = ∆Cttest – ∆ctcalibrator = -3.9 – (-1.5) = -2.4
3) 2-∆∆Ct = 2-(-2.4) = 5.3
Tumor cells express p53 at a 5.3-fold higher level than control cells
15.912.0Tumor (test)
16.515.0Control (calibrator)
Ct GAPDH (reference)Ct p53 (target)
GENESAMPLE
● If difference in PCR efficiencies > 10%, between the reference gene and the target gene ∆∆Ct method is inaccurate
● The value used is calculated with Pfaffl method
where Egene : is the efficieny of the target, gene =target or refence
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION: Pfaffl Methods
⇒
EE
testcalibrator
testcalibrator
RQ)(
reference
)(
target
Ct
Ct
reference
target
−
−
∆
∆
=
)()()( CtCtCt targettargettarget testcalibratortestcalibrator −=−∆
● It is used to determine changes in amount of a given sample relative to another, internal, control sample
● Does NOT require standards with known concentrations
1) Normalize the target gene to the reference gene
2) Normalize the sample test to the calibrator
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
RELATIVE QUANTIFICATION: Relative Standard Curve Method
)(
)(
)(
)(
reference
target
reference
target
calibratorcalibrator
testtest
RQ
QtyQty
QtyQty
=
)(
)(
reference
target
test
testSampleTest Qty
Qty)(
)(
reference
target
calibrator
calibratorCalibrator Qty
Qty=
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
DATA ANALYSIS: Relative Standard Curve Method
Example:
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
NORMALIZATION: Other Methods
● There are many different normalization methods among others
● Geometric mean calculates the average Ct value for each sample, and scales all Ct values according to the ratio of these mean Ct values across samples.
● Scale rank invariant computes the pairwise rank-invariant features, but then takes only the features found in a certain number of samples, and used the average Ct value of those as a scaling factor for correcting all Ct values.
● Normal rank invariant computes all rank-invariant sets of features between pairwisecomparisons of each sample against a reference, such as a pseudo-mean. The rank-invariant features are used as a reference for generating a smoothing curve, which is then applied to the entire sample.
● Quantile makes the distribution of Ct values more or less identical across samples.
● Two main types of analyses
● Comparative analyses
● Relatively rigorous● Check a predefined hypotheses● Relies on statistical testing
● Expression profiling:
● Search for trends and patterns in the data● Exploratory, hypothesis generating approach● Less rigorous ● Cluster analysis or PCA
STATISTICAL ANALYSIS
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
● Statistical analyses of RT-qPCR data relies on three assumptions
● One gene-at-a-time
● We are sampling from two different (unknown) independent populations
● There exist unknown mechanisms that contribute to variability
STATISTICAL ANALYSIS : Basic Premises
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
● Use random sampling and randomization to obtain independent and representative samples
● Apply experimental design principles to minimize confounding variability
● Perform statistical testing
● DO NOT FORGET about multiple testing adjustments
● Standard statistical approach:
● Confirmatory study Reject or● Accept predefined hypothesis
STATISTICAL ANALYSIS: From Assumptions to Strategies
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
STATISTICAL ANALYSIS: Comparing Two Groups
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
STATISTICAL ANALYSIS: Comparing More Than Groups
SOFTWARE
StatMinerIntegromics
HTqPCR, ddCt,…Bioconductor
GenExbioMCC
REST – Relative Expression Software ToolBiogazelle
DataAssistGeneExpression
ABI
SOFTWARESOURCE
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
UEB CAN HELP YOU…
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.
Sir Ronald A.Fisher1
1. Father of modern Mathematical Statistics and Developer of Experimental Design and ANOVA.
TECNOLOGÍAS DE ALTO RENDIMIENTO EN GENÓMICA. Curso 2012
REMEMBER!!!!