statistics pres 10 27 2015 roy sabo

ADLT 673: TEACHING AS SCHOLARSHIP IN MEDICAL EDUCATION

TUESDAY, OCTOBER 27 , 2015

An Overview of Quantitative Data Analysis

Outline of Today’s Class

Brief Description of Statistical Thinking

Analytic Methods Summary Measures Appropriate Research Questions Determining the appropriate Statistical Methodology Group Discussion

Designing a Data Collection Plan Sources, Capture and Storage

Sample Size Determination Group Discussion

Additional Resources

Statistical ThinkingPopulation

All possible subjects EX: All US patients

Sample Subjects you observe EX: Patients seen at VCU

Sampling from Population Sample should be Microcosm Are other samples different? Small samples rare events Larger samples are better

Population

Sample

Statistical Thinking

If this is the population…

Does sample look like this…

…or this…

Statistical Thinking (Example)

Sample: Experimental drug to reduce side effects from surgical

procedure Historical rate: 10% experience no side effects New Trial: 33 successes (no side effects) in 200 patients

Sample percentage: =33/200 = 16.5%

What does evidence imply about population? Does new drug show improvement? What happens if we run this experiment again? Is this sample “big” enough to represent population? If historical rate is truly 10%, what would samples look like?


# of Successes out of 200

Frequency out of

1000

Proportion

# of Successes out of 200

Frequency out of

1000

Proportion

5 1 0.001 21 75 0.0756 0 0.000 22 78 0.0787 1 0.001 23 71 0.0718 0 0.000 24 54 0.0549 2 0.002 25 33 0.03310 8 0.008 26 25 0.02511 9 0.009 27 27 0.02712 20 0.020 28 11 0.01113 25 0.025 29 8 0.00814 35 0.035 30 11 0.01115 48 0.048 31 4 0.00416 65 0.065 32 4 0.00417 69 0.069 33 1 0.00118 99 0.099 34 1 0.00119 94 0.094 35 1 0.00120 119 0.119 36 1 0.001

Simulation Study of Samples with 200 Dichotomous Observations with Known 10% Success Rate


Histogram of 1,000 Simulated Samples of 200 Dichotomous Outcomes with Assumed p = 10% Success Rate

0.025

0.030

0.035

0.040

0.045

0.050

0.055

0.060

0.065

0.070

0.075

0.080

0.085

0.090

0.095

0.100

0.105

0.110

0.115

0.120

0.125

0.130

0.135

0.140

0.145

0.150

0.155

0.160

0.165

0.170

0.175

0.180

0

20

40

60

80

100

120

140


If the true proportion was really p = 10%… Then our event (33 successes) would be observed about 1 in every 1000

trials Estimated Probability: 1/1000 = 0.001

Two possible explanations for our sample: Rate really is 10% we observed a rare event Our assumption (p = 10%) was incorrect

Revised Statistical Thinking: If observed event is likely given our assumptions, then our assumptions

are probably correct

If observed event is unlikely given our assumptions, then our assumptions are probably NOT correct

Analytic Methods: Summary Measures

Representative Measures Reflect the most “typical” or “average” data value

Continuous Measurements: Mean (Average), Median and Mode

Categorical Measurements: Frequencies and Proportions

Measures of Variability Reflect how much subjects differ from one another

Continuous Measurements: Standard deviation, range, interquartile range

Categorical Measurements: None that are meaningful (sorry!)

Analytic Methods : Research Question

Translating Research Question into Testable Hypotheses Research question must be in form that allows statistical method to be assigned Three components: # of groups, measurement type, # of measures

1. Subjects Identify population under consideration Determine # of groups (and what distinguishes them)

2. Measurements Identify the measurement Type (continuous or categorical) Determine the summary measure (e.g., mean or proportion) # of Times Measured (Once, twice or greater?)

3. Statement (i.e., what are you trying to do?) Estimation (is something simply being measured?) Change (is something being tracked over time? Before and after an event?) Comparison (is something compared between groups?)


Who is under consideration? Identify population under consideration Determine # of groups (and what distinguishes them)

Ex.: Is the proportion of patients up to date on their colon cancer screening greater for those who receive an email reminder from their physician than for those who do not receive such email messages?

Population? Number of Groups?


Is it measureable? Identify the measurement Type (continuous or categorical) Determine the summary measure (e.g., mean or proportion) # of Times Measured (Once, twice or greater?)

Ex.: Is the proportion of patients up to date on their colon cancer screening greater for those who receive an email reminder from their physician than for those who do not receive such email messages? Measurement Type? Summary Measures? Number of Times Measured?

Analytic Methods: Research Question

Is the “question” clear? Estimation (is something simply being measured?) Change (is something being tracked over time? Before

and after an event?) Comparison (is something compared between groups?)

Ex.: Is the proportion of patients up to date on their colon cancer screening greater for those who receive an email reminder from their physician than for those who do not receive such email messages? Statement: Estimation, Change or Comparison?

Combination?

Analytic Methods: Continuous Data

# of Measurements# of

SamplesSingle Pre/Post Repeated Measures

1 Sample t-test Paired t-test Repeated Measures ANOVA (RMA) / Linear Mixed Model (LMM)*

2 Samples Two-sample t-

test

RMA / LMM* RMA / LMM*

“k” Samples

Analysis of Variance (ANOVA)

RMA / LMM* RMA / LMM*

Adjusting for

Covariates:

Multiple Linear Regression*, Analysis of Covariance (ANCOVA)*, Linear Mixed Models*

*Will likely require statistical assistance

Analytic Methods: Categorical Data

# of Measurements# of

SamplesSingle Pre/Post Repeated Measures

1 Sample z-test McNemar’s Test

Generalized Linear Mixed Models

(GLMM)*2 Samples Chi-square

TestGLMM* GLMM*

“k” Samples

Chi-square Test

GLMM* GLMM*

Adjusting for

Covariates:

Multiple Logistic Regression*, Generalized Linear Mixed Models*

*Will likely require statistical assistance

Analytical Methods: Examples

Research Question: What are CD3 cell counts in BMT recipients 60 days after transplantation?

Research Question: Are CD3 cell counts in BMT recipients 60 days after transplantation larger than counts at baseline (day 0)?

Research Question: Are CD3 cell counts in BMT recipients receiving a 5.1-unit ATG dose as big as the countss in recipients receiving a 7.5-unit dose?

Group Discussions

Please break into groups by table

For the next 10-15 minutes, take turns discussing what analytic approaches are appropriate for your proposed study Is your outcome continuous or categorical? How many groups are you investigating? How many measurements are you taking? What statistical methodology should you use?

If your study is qualitative, discuss how statistical methodologies could be used (e.g. data summary, association)

Data Collection Plan: Sources

What information do you need to answer your research question? Electronic Health Records (EHR):

CERNER ONCORE

Integrated Personal Health Record (IPHR): MyPreventiveCare

Chart reviews Surveys Prospective biological measurements

Need to know: Who will physically obtain/collect data? How often will it be done? If prospective biological measures: how will it be done?

Data Collection Plan: Capture

How will you obtain the necessary information? EHR or IPHR extraction Chart audits Surveys

In-Person, mail, email or online Prospective measurements

Need to know: Who will do this? How often will it be done? If prospective biological measures: how will it be done?

Data Collection Plan: Storage

Where will your data be stored? Paper records No Microsoft Excel or Access REDCAP

Collects and stores survey data…and much more SAS database (or SPSS, R, etc.)

Work with your statistician to create dataset

Need to know: How often it will be updated? Is it secure? Is it IRB/HIPAA compliant?

Data Collection Plan

Helpful Suggestions: Consult a statistician or database manager before you

start collecting data Preferably the person who will be analyzing your data

If you are collecting and storing data yourself: Record it directly into storage unit as you collect it

E.g., Microsoft Excel Record it as it will be analyzed

One row per subject per time point• New row for each additional time point

One column per measurement

Data Collection Plan: Example

Sample Size Determination

As a general rule, larger sample sizes: Lead to more representative samples Lead to better estimation of parameters (e.g.,

representative measures) Provide estimators with lower variability

N=9 N=36

N=100


Averages over 10,000 SimulationsSample

SizeSample Mean

Sample Std. Dev.

Standard Error*

9 204.4 36.5 12.316 204.3 37.1 9.525 204.2 37.2 7.836 204.1 37.5 6.549 204.1 37.6 5.564 204.2 37.7 4.981 204.1 37.7 4.2100 204.1 37.7 3.91000 204.1 37.7 1.2

*SE: explains variability in estimator; not the sample data


Possible Decisions

Type I Error: find difference where there shouldn’t be one

Type II Error: fail to find difference where it should be Power = 1 - β

True StateYour Decision H0 is “True” HA is TrueReject H0 Type I Error

αCorrect Decision

Fail to Reject H0

Correct Decision

Type II Error

β


Determinants of Required Sample Size

Significance Level (α): probability of rejecting H0 when it’s true

Power (1-β): probability of failing to reject H0 when it’s false

These values are selected during design phase α = 5% 1-β = 80% (sometimes 90%).



Measure of variability (usually standard deviation) inherent in study population As measurement becomes more variable… Standard error of test statistic increases… p-value increases… Ability to reject H0 decreases… Power decreases

Controlling variability: Better measurement methodology Homogeneous samples



Effect Size: smallest difference or change in outcome you hope to find As difference you want to observe decreases… Test statistic decreases… p-value increases… Ability to reject H0 decreases… Power decreases

Considerations: Clinical significance Clinical possibility

Large differences easier to detect and harder to find


Calculating Required Sample Size Equations exist (involving α, β, variability and effect

size) for simple analytic methods (t-test, chi-square, etc.)

Advanced methods require professional assistance

Where do you find variability and effect size? Previous literature of similar populations Retrospective Study / Chart Audits Pilot study Guesstimate!


What if required sample size is too large? Consider a different outcome

Continuous measures generally require smaller sample sizes than categorical measures

Consider fewer groups or add multiple sites Fewer Groups More subjects per group Multiple Sites Larger subject pool (maybe more representative…)

Will require more sophisticated analytic methods

Reconfigure study as a “pilot” Switch emphasis from “hypothesis testing” to “estimation” Goal: data summaries and confidence intervals Use to power larger study

Group Discussion

Please break into groups by table

For the next 10-15 minutes, take turns discussing: Data Management:

Where will you get your data? How will you capture it? How will you store it?

Sample Size Determination: Are you able to power your study? Where will (did) you find information for your power

analysis?


VCU Department of Biostatistics 15 full-time faculty

Can assist with: study design, sample size determination, interim and final analyses, dissemination

Grant funding (or prospects of funding) usually required

BIOS 516 Biostatistical Consulting: graduate students available for FREE consultations Contact Russ Boyle ([email protected]) Provide a protocol Offer co-authorship

mailto:[email protected]


VCU Center for Clinical and Translation Research

Research Incubator: study design, sample size determination, and other resources (e.g. grant writing) Contact: Pam Dillon ([email protected])

Biomedical Informatics: data management and storage (e.g. REDCAP) Support requested online:

(http://www.cctr.vcu.edu/informatics/index.html)

mailto:[email protected]


Textbook (i.e., shameless plug): Statistical Research Methods: A Guide for Non-

Statisticians Sabo and Boone, Springer, 2013 Available on the web:

statistics pres 10 27 2015 roy sabo

Education