data analysis workshop
DESCRIPTION
Data Analysis Workshop. Chuck Spiekerman (cspieker@u) Karl Kaiyala (kkaiyala@u). Course Outline. February 20 How to describe your study Choosing an Analysis method March 13 Student presentations of study designs and data-analysis plans March 20 Student presentations of data analyses. - PowerPoint PPT PresentationTRANSCRIPT
Data Analysis Workshop
Chuck Spiekerman (cspieker@u)
Karl Kaiyala (kkaiyala@u)
Course Outline
• February 20– How to describe your study– Choosing an Analysis method
• March 13– Student presentations of study designs and
data-analysis plans
• March 20 – Student presentations of data analyses
Describing your study
• Next session (3/13) we are asking you to present a description of your planned study
• The next few slides give an outline of suggested components of this description
• Attention to all these components should help you (and/or a consultant) decide on appropriate methods of statistical analysis
Study Design Description
• Specific Aims (what?)
• Background (why?)
• Previous work (who?) *
• Study methods (how?)– several components
*optional for student presentations
Specific Aims
• Describe the scientific question(s)
• Be specific and precise
• Stick to the study at hand
Background and Motivation
• Relevance of this research
– Existing knowledge
– Identify gap this research will fill
• Relate to specific aims
• If part of a larger study, where does this
study fit?
Study Methods Components
• Primary outcomes
• Study population• Methods and procedures *
• Data analysis plan
*optional for student presentations
Primary Outcomes
• Precise definition of key measurement (individual data item) of interest
• Justify why this outcome and not something else.
– Relate to specific aim
• Details of collection can be left to methods and procedures section
Study population
• How were the subjects selected?
– Exclusion and inclusion criteria
– Group classification?
– Matching?
– Randomization?
Data analysis plan
• Outline data analysis for each specific aim
• Make clear which procedures are being used toward which aim
• Usually some simple tables and plots should be sufficient
• Keep it simple
Forming an analysis plan
Two important questions
1. What do you want to do/show?
2. What kind of data …i. …will answer your question best?ii. … can you get?iii. … do you have?
Types of data
• Continuous– Differences between values have meaning, and
are interpretable independent of the values themselves
– E.g. difference between 8 and 9 basically the same as difference between 1 and 2.
• Ordinal– Values have an order, but differences are not
easily interpretable (e.g. good, fair, poor)
Types of data (cont.)
• Categorical
– Values are descriptive but do not have any obvious ordering. E.g. tx A, tx B, tx C.
• Binary, Dichotomous
– Fancy names for categorical variables with only two possible values.
Types of data (sampling)
• one-sample– Refers to situation when values of interest all
come from one group and will be compared to a known quantity (e.g. “change greater than zero”)
• two-sample– When data are divided/sampled in two groups
and observed values compared between groups.
What do you want to do?• Show evidence of differences
• Estimate population parameters
• Demonstrate equivalence
• Show evidence of association
• Create/validate a predictive model
• Assess agreement or reliability
• Other?
Showing evidence of differences• Standard hypothesis testing procedures, usually
comparing means or proportions• Which test will depend on type of data. Usual
suspects (YMMV)– T-test or ANOVA for Continuous data– Chi-square test for Categorical data– Rank-based tests (e.g. Wilcoxon) for Ordinal data
• Use Rosner flowchart for guidance• Supplement p-value with estimate of difference
(with confidence interval)
Estimate Population Parameters
• P-values and hypothesis tests aren’t always necessary
• Sometimes you don’t really want to compare things but only estimate values
• Estimate parameters of interest and supplement with confidence intervals (IMPORTANT!) .
Demonstrate equivalence
• In some instances the goal is to show equivalence of, say, two treatments.
• Failing to show a difference using a standard hypothesis test is usually not sufficient evidence of equivalence
• Two strategies– Estimate difference and show ‘worst cases’
with confidence interval– Compute a standard hypothesis test with very
good power (> 95%)
Demonstrate associations
Independent variable
outcome variable
dichotomous continuous
categorical•Chi-square
•Logistic regression
•T-test/ANOVA
•Linear regression
continuous•Logistic regression
•T-test/ANOVA (backwards)
•Correlation
•Linear regression
•Scatterplots
Prediction• Dichotomous outcome
– Logistic regression*
– Sensitivities, specificities†
– ROC curves† (continuous predictor)
• Continuous outcome– Linear regression*
– “Leave one out” statistics or cross validation†
* Predictive model building
† assessing predictive model
Reliability/Agreement
• Kappa statistic is commonly used for categorical data and two raters.
• Intra-class correlation coefficient for multiple raters
• If you have a ‘gold standard’ it makes the most sense to tabulate percent correct or average distance from correct.
more Reliability/Agreement
• If trying to demonstrate agreement between two continuous measures the correlation coefficient is tangential at best
• Better to tabulate statistics related to mean pairwise differences between judges
• See – Bland JM, Altman DG. (1986). Statistical methods for assessing
agreement between two methods of clinical measurement. Lancet, i, 307-
310. – Available at http://www-users.york.ac.uk/~mb55/meas//ba.htm
Other?
• Time-to-event data– Kaplan-Meier survival estimate– Cox regression
• Other other?
Correlated Data Issues• Data consist of “clusters” of correlated
observations. This is common in dental studies (many teeth from same mouth)
• Common Solutions?– Collapse data to independent units (patient-
level averages)– Adjust for correlation using generalized
estimating equations (GEE) or mixed model regression approaches
Homework for Feb. 29
• Following the guidelines presented in class today, present a concise description of your study and planned data analysis to the class.
• Plan to keep your talk under ____ minutes
• Limited office hours will be available with myself and Dr. Kaiyala to help. Call or email us for appointments.