Download - BASIC STATISTICS
BASIC BIOSTATISTICS
Diane Flynn, LTC, MC
Colin Greene, LTC, MC
Objectives
Overview of Biostatistical
Terms and Concepts Application of Statistical Tests
Why Use Statistics?
Descriptive Statistics• identify patterns • leads to hypothesis generating
Inferential Statistics• distinguish true differences from
random variation• allows hypothesis testing
Why Use Statistics?
Cardiovascular Mortality in Males
00.20.40.60.8
11.2
'35-'44
'45-'54
'55-'64
'65-'74
'75-'84
SMR BangorRoseto
AJPH 1992
Types of Data
Numerical
• Continuous
• Discrete Categorical
• Ordinal
• Nominal
Descriptive Statistics
Identifies patterns in the data Identifies outliers Guides choice of statistical test
Percentage of Specimens Testing Positive for RSV
Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun
South 2 2 5 7 20 30 15 20 15 8 4 3
North-east
2 3 5 3 12 28 22 28 22 20 10 9
West 2 2 3 3 5 8 25 27 25 22 15 12
Mid-west
2 2 3 2 4 12 12 12 10 19 15 8
Descriptive Statistics
Percentage of Specimens Testing Postive for RSV 1998-99
0
5
10
15
20
25
30
35
Jul Sep Nov Jan Mar May Jul
SouthNortheastWestMidwest
Describing the Data with Numbers
Measures of Central Tendency• MEAN -- average• MEDIAN -- middle value• MODE -- most frequently observed
value(s)
Distribution of Course Grades
0
2
4
6
8
10
12
14
Number of Students
A A- B+ B B- C+ C C- D+ D D- F
Grade
Describing the Data with Numbers
Measures of Dispersion• RANGE • STANDARD DEVIATION• SKEWNESS
Measures of Dispersion
• RANGE • highest to lowest values
• STANDARD DEVIATION• how closely do values cluster around the
mean value• SKEWNESS
• refers to symmetry of curve
Measures of Dispersion
• RANGE • highest to lowest values
• STANDARD DEVIATION• how closely do values cluster around the
mean value• SKEWNESS
• refers to symmetry of curve
Standard Deviation
B
A
Curve B
Curve A
Measures of Dispersion
• RANGE • highest to lowest values
• STANDARD DEVIATION• how closely do values cluster around the
mean value• SKEWNESS
• refers to symmetry of curve
Skewness
Curve A
Curve B
negative skew
Mo
de
Me
dia
n
Me
an
The Normal Distribution
Mean = median = mode
Skew is zero 68% of values fall
between 1 SD 95% of values fall
between 2 SDs
.
Me
an
, Med
ian
, Mo
de
1
2
Inferential Statistics
Used to determine the likelihood that a conclusion based on data from a sample is true
Terms
p value: the probability that an observed difference could have occurred by chance
Hypertension Trial
DRUG Baseline mean SBP F/u mean SBP
A 150 130
B 150 125
Terms
confidence interval:
The range of values we can be reasonably certain includes the true value.
30 Day % Mortality
Study IC STK Control p N
Khaja 5.0 10.0 0.55 40
Anderson 4.2 15.4 0.19 50
Kennedy 3.7 11.2 0.02 250
95% Confidence Intervals
-.40 -.35 -.30 -.25 -.20 -.15 -.10 -.05 .00 .05 .10 .15 .20
Khaja(n=40)
Anderson(n=50)
Kennedy(n=250)
Types of Errors
Nodifference
Difference
Nodifference
TYPE IIERROR ()
Difference TYPE IERROR ()
Truth
Conclusion
Power = 1-
What Test Do I Use?
1. What type of data?
2. How many samples?
3. Are the data normally distributed?
4. What is the sample size?