statistics pres 10 27 2015 roy sabo
TRANSCRIPT
ADLT 673: TEACHING AS SCHOLARSHIP IN MEDICAL EDUCATION
TUESDAY, OCTOBER 27 , 2015
An Overview of Quantitative Data Analysis
Outline of Today’s Class
Brief Description of Statistical Thinking
Analytic Methods Summary Measures Appropriate Research Questions Determining the appropriate Statistical Methodology Group Discussion
Designing a Data Collection Plan Sources, Capture and Storage
Sample Size Determination Group Discussion
Additional Resources
Statistical ThinkingPopulation
All possible subjects EX: All US patients
Sample Subjects you observe EX: Patients seen at VCU
Sampling from Population Sample should be Microcosm Are other samples different? Small samples rare events Larger samples are better
Population
Sample
Statistical Thinking
If this is the population…
Does sample look like this…
…or this…
Statistical Thinking (Example)
Sample: Experimental drug to reduce side effects from surgical
procedure Historical rate: 10% experience no side effects New Trial: 33 successes (no side effects) in 200 patients
Sample percentage: =33/200 = 16.5%
What does evidence imply about population? Does new drug show improvement? What happens if we run this experiment again? Is this sample “big” enough to represent population? If historical rate is truly 10%, what would samples look like?
Statistical Thinking (Example)
# of Successes out of 200
Frequency out of
1000
Proportion
# of Successes out of 200
Frequency out of
1000
Proportion
5 1 0.001 21 75 0.0756 0 0.000 22 78 0.0787 1 0.001 23 71 0.0718 0 0.000 24 54 0.0549 2 0.002 25 33 0.03310 8 0.008 26 25 0.02511 9 0.009 27 27 0.02712 20 0.020 28 11 0.01113 25 0.025 29 8 0.00814 35 0.035 30 11 0.01115 48 0.048 31 4 0.00416 65 0.065 32 4 0.00417 69 0.069 33 1 0.00118 99 0.099 34 1 0.00119 94 0.094 35 1 0.00120 119 0.119 36 1 0.001
Simulation Study of Samples with 200 Dichotomous Observations with Known 10% Success Rate
Statistical Thinking (Example)
Histogram of 1,000 Simulated Samples of 200 Dichotomous Outcomes with Assumed p = 10% Success Rate
0.025
0.030
0.035
0.040
0.045
0.050
0.055
0.060
0.065
0.070
0.075
0.080
0.085
0.090
0.095
0.100
0.105
0.110
0.115
0.120
0.125
0.130
0.135
0.140
0.145
0.150
0.155
0.160
0.165
0.170
0.175
0.180
0
20
40
60
80
100
120
140
Statistical Thinking (Example)
If the true proportion was really p = 10%… Then our event (33 successes) would be observed about 1 in every 1000
trials Estimated Probability: 1/1000 = 0.001
Two possible explanations for our sample: Rate really is 10% we observed a rare event Our assumption (p = 10%) was incorrect
Revised Statistical Thinking: If observed event is likely given our assumptions, then our assumptions
are probably correct
If observed event is unlikely given our assumptions, then our assumptions are probably NOT correct
Analytic Methods: Summary Measures
Representative Measures Reflect the most “typical” or “average” data value
Continuous Measurements: Mean (Average), Median and Mode
Categorical Measurements: Frequencies and Proportions
Measures of Variability Reflect how much subjects differ from one another
Continuous Measurements: Standard deviation, range, interquartile range
Categorical Measurements: None that are meaningful (sorry!)
Analytic Methods : Research Question
Translating Research Question into Testable Hypotheses Research question must be in form that allows statistical method to be assigned Three components: # of groups, measurement type, # of measures
1. Subjects Identify population under consideration Determine # of groups (and what distinguishes them)
2. Measurements Identify the measurement Type (continuous or categorical) Determine the summary measure (e.g., mean or proportion) # of Times Measured (Once, twice or greater?)
3. Statement (i.e., what are you trying to do?) Estimation (is something simply being measured?) Change (is something being tracked over time? Before and after an event?) Comparison (is something compared between groups?)
Analytic Methods : Research Question
Who is under consideration? Identify population under consideration Determine # of groups (and what distinguishes them)
Ex.: Is the proportion of patients up to date on their colon cancer screening greater for those who receive an email reminder from their physician than for those who do not receive such email messages?
Population? Number of Groups?
Analytic Methods : Research Question
Is it measureable? Identify the measurement Type (continuous or categorical) Determine the summary measure (e.g., mean or proportion) # of Times Measured (Once, twice or greater?)
Ex.: Is the proportion of patients up to date on their colon cancer screening greater for those who receive an email reminder from their physician than for those who do not receive such email messages? Measurement Type? Summary Measures? Number of Times Measured?
Analytic Methods: Research Question
Is the “question” clear? Estimation (is something simply being measured?) Change (is something being tracked over time? Before
and after an event?) Comparison (is something compared between groups?)
Ex.: Is the proportion of patients up to date on their colon cancer screening greater for those who receive an email reminder from their physician than for those who do not receive such email messages? Statement: Estimation, Change or Comparison?
Combination?
Analytic Methods: Continuous Data
# of Measurements# of
SamplesSingle Pre/Post Repeated Measures
1 Sample t-test Paired t-test Repeated Measures ANOVA (RMA) / Linear Mixed Model (LMM)*
2 Samples Two-sample t-
test
RMA / LMM* RMA / LMM*
“k” Samples
Analysis of Variance (ANOVA)
RMA / LMM* RMA / LMM*
Adjusting for
Covariates:
Multiple Linear Regression*, Analysis of Covariance (ANCOVA)*, Linear Mixed Models*
*Will likely require statistical assistance
Analytic Methods: Categorical Data
# of Measurements# of
SamplesSingle Pre/Post Repeated Measures
1 Sample z-test McNemar’s Test
Generalized Linear Mixed Models
(GLMM)*2 Samples Chi-square
TestGLMM* GLMM*
“k” Samples
Chi-square Test
GLMM* GLMM*
Adjusting for
Covariates:
Multiple Logistic Regression*, Generalized Linear Mixed Models*
*Will likely require statistical assistance
Analytical Methods: Examples
Research Question: What are CD3 cell counts in BMT recipients 60 days after transplantation?
Research Question: Are CD3 cell counts in BMT recipients 60 days after transplantation larger than counts at baseline (day 0)?
Research Question: Are CD3 cell counts in BMT recipients receiving a 5.1-unit ATG dose as big as the countss in recipients receiving a 7.5-unit dose?
Group Discussions
Please break into groups by table
For the next 10-15 minutes, take turns discussing what analytic approaches are appropriate for your proposed study Is your outcome continuous or categorical? How many groups are you investigating? How many measurements are you taking? What statistical methodology should you use?
If your study is qualitative, discuss how statistical methodologies could be used (e.g. data summary, association)
Data Collection Plan: Sources
What information do you need to answer your research question? Electronic Health Records (EHR):
CERNER ONCORE
Integrated Personal Health Record (IPHR): MyPreventiveCare
Chart reviews Surveys Prospective biological measurements
Need to know: Who will physically obtain/collect data? How often will it be done? If prospective biological measures: how will it be done?
Data Collection Plan: Capture
How will you obtain the necessary information? EHR or IPHR extraction Chart audits Surveys
In-Person, mail, email or online Prospective measurements
Need to know: Who will do this? How often will it be done? If prospective biological measures: how will it be done?
Data Collection Plan: Storage
Where will your data be stored? Paper records No Microsoft Excel or Access REDCAP
Collects and stores survey data…and much more SAS database (or SPSS, R, etc.)
Work with your statistician to create dataset
Need to know: How often it will be updated? Is it secure? Is it IRB/HIPAA compliant?
Data Collection Plan
Helpful Suggestions: Consult a statistician or database manager before you
start collecting data Preferably the person who will be analyzing your data
If you are collecting and storing data yourself: Record it directly into storage unit as you collect it
E.g., Microsoft Excel Record it as it will be analyzed
One row per subject per time point• New row for each additional time point
One column per measurement
Data Collection Plan: Example
Sample Size Determination
As a general rule, larger sample sizes: Lead to more representative samples Lead to better estimation of parameters (e.g.,
representative measures) Provide estimators with lower variability
N=9 N=36
N=100
Sample Size Determination
Averages over 10,000 SimulationsSample
SizeSample Mean
Sample Std. Dev.
Standard Error*
9 204.4 36.5 12.316 204.3 37.1 9.525 204.2 37.2 7.836 204.1 37.5 6.549 204.1 37.6 5.564 204.2 37.7 4.981 204.1 37.7 4.2100 204.1 37.7 3.91000 204.1 37.7 1.2
*SE: explains variability in estimator; not the sample data
Sample Size Determination
Possible Decisions
Type I Error: find difference where there shouldn’t be one
Type II Error: fail to find difference where it should be Power = 1 - β
True StateYour Decision H0 is “True” HA is TrueReject H0 Type I Error
αCorrect Decision
Fail to Reject H0
Correct Decision
Type II Error
β
Sample Size Determination
Determinants of Required Sample Size
Significance Level (α): probability of rejecting H0 when it’s true
Power (1-β): probability of failing to reject H0 when it’s false
These values are selected during design phase α = 5% 1-β = 80% (sometimes 90%).
Sample Size Determination
Determinants of Required Sample Size
Measure of variability (usually standard deviation) inherent in study population As measurement becomes more variable… Standard error of test statistic increases… p-value increases… Ability to reject H0 decreases… Power decreases
Controlling variability: Better measurement methodology Homogeneous samples
Sample Size Determination
Determinants of Required Sample Size
Effect Size: smallest difference or change in outcome you hope to find As difference you want to observe decreases… Test statistic decreases… p-value increases… Ability to reject H0 decreases… Power decreases
Considerations: Clinical significance Clinical possibility
Large differences easier to detect and harder to find
Sample Size Determination
Calculating Required Sample Size Equations exist (involving α, β, variability and effect
size) for simple analytic methods (t-test, chi-square, etc.)
Advanced methods require professional assistance
Where do you find variability and effect size? Previous literature of similar populations Retrospective Study / Chart Audits Pilot study Guesstimate!
Sample Size Determination
What if required sample size is too large? Consider a different outcome
Continuous measures generally require smaller sample sizes than categorical measures
Consider fewer groups or add multiple sites Fewer Groups More subjects per group Multiple Sites Larger subject pool (maybe more representative…)
Will require more sophisticated analytic methods
Reconfigure study as a “pilot” Switch emphasis from “hypothesis testing” to “estimation” Goal: data summaries and confidence intervals Use to power larger study
Group Discussion
Please break into groups by table
For the next 10-15 minutes, take turns discussing: Data Management:
Where will you get your data? How will you capture it? How will you store it?
Sample Size Determination: Are you able to power your study? Where will (did) you find information for your power
analysis?
Additional Resources
VCU Department of Biostatistics 15 full-time faculty
Can assist with: study design, sample size determination, interim and final analyses, dissemination
Grant funding (or prospects of funding) usually required
BIOS 516 Biostatistical Consulting: graduate students available for FREE consultations Contact Russ Boyle ([email protected]) Provide a protocol Offer co-authorship
Additional Resources
VCU Center for Clinical and Translation Research
Research Incubator: study design, sample size determination, and other resources (e.g. grant writing) Contact: Pam Dillon ([email protected])
Biomedical Informatics: data management and storage (e.g. REDCAP) Support requested online:
(http://www.cctr.vcu.edu/informatics/index.html)
Additional Resources
Textbook (i.e., shameless plug): Statistical Research Methods: A Guide for Non-
Statisticians Sabo and Boone, Springer, 2013 Available on the web: