![Page 1: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/1.jpg)
Statistics: Data Presentation & Analysis
Fr Clinic I
![Page 2: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/2.jpg)
Overview
• Tables & Graphs• Populations & Samples• Mean, Median, & Variance• Error Bars
– Standard Deviation, Standard Error & 95% Confidence Interval (CI)
• Comparing Means of Two Populations• Linear Regression (LR)
![Page 3: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/3.jpg)
Warning• Statistics is a huge field, I’ve simplified considerably
here. For example:– Mean, Median, and Standard Deviation
• There are alternative formulas
– 95% Confidence Interval• There are other ways to calculate CIs (e.g., z statistic instead of t;
difference between two means, rather than single mean…)
– Error Bars• Don’t go beyond the interpretations I give here!
– Comparing Means of Two Data Sets• We just cover the t test for two means when the variances are
unknown but equal, there are other tests
– Linear Regression• We only look at simple LR and only calculate the intercept, slope and
R2. There is much more to LR!
![Page 4: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/4.jpg)
Tables
Water
(1)
Turbidity (NTU)
(2)
True Color (Pt-Co)
(3)
Apparent Color
(Pt-Co) (4)
Pond Water 10 13 30
Sweetwater 4 5 12
Hiker 3 8 11
Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters
Consistent Format, Title, Units, Big FontsDifferentiate Headings, Number Columns
![Page 5: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/5.jpg)
Figures
11
Figure 1: Turbidity of Pond Water, Treated and Untreated
0
5
10
15
20
25
Pond Water Sweetwater Miniworks Hiker Pioneer Voyager
Turb
idit
y (N
TU
)
Filter
20
10
75
1
11
Consistent Format, Title, UnitsGood Axis Titles, Big Fonts
![Page 6: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/6.jpg)
Populations and Samples• Population
– All possible outcomes of experiment or observation • US population• Particular type of steel beam
• Sample– Finite number of outcomes measured or observations made
• 1000 US citizens• 5 beams
• Use samples to estimate population properties– Mean, Variance
• E.g., Height of 1000 US citizens used to estimate mean of US population
![Page 7: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/7.jpg)
Central Tendency
• Mean and MedianMean = xbar = Sum of values divided by sample size
= (1+3+3+6+8+10)/6 = 5.2 NTU
Median = m = Middle number Rank - 1 2 3 4 5 6Number - 1 3 3 6 8 10
For even number of sample points, average middle two
= (3+6)/2 = 4.5
13368
10
Excel: Mean – AVERAGE; Median - MEDIAN
![Page 8: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/8.jpg)
Variability
• Variance, s2
– sum of the square of the deviation about the mean divided by degrees of freedom
– s2 = n(xi – xbar)2/(n-1)
– Where xi = a data point and n = number of data points
• Example (cont.)– s2 = [(1-5.2)2 + (3-5.2)2 + (3-5.2)2 + 6-5.2)2 + (8-5.2)2
+ (10-5.2)2] /(6-1) = 11.8 NTU2
Excel: Variance – VAR
![Page 9: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/9.jpg)
Error Bars
• Show data variability on plot of mean values • Types of error bars include:
• Max/min, ± Standard Deviation, ± Standard Error, ± 95% CI
0
2
4
6
8
10
Filter 1 Filger 2 Filter 3
Filter Type
Tu
rbid
ity
(NT
U)
![Page 10: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/10.jpg)
Standard Deviation, s
• Square-root of variance• If phenomena follows Normal Distribution
(bell curve), 95% of population lies within 1.96 standard deviations of the mean
• Error bar is s above & below mean
Normal Distribution
-4 -2 0 2 4
Standard Deviation
-1.96 1.96
95%
Standard Deviations from Mean
2ss
Excel: standard deviation – STDEV
![Page 11: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/11.jpg)
Standard Error of Mean• Also called St-Err or sxbar
• For sample of size n taken from population with standard deviation estimated as s
• As n ↑, sxbar estimate↓, i.e., estimate of population mean improves
• Error bar is St-Err above & below mean
n
ssX
Xs
![Page 12: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/12.jpg)
95% Confidence Interval (CI) for Mean
• A 95% Confidence Interval is expected to contain the population mean 95 % of the time (i.e., of 95%-CIs from 100 samples, 95 will contain pop mean)
• t95%,n-1 is a statistic for 95% CI from sample of size n– t95%,n-1 = TINV(0.05,n-1)– If n 30, t95%,n-1 ≈ 1.96 (Normal Distribution)
• Error bar is above & below mean
X1n%,95 stX
Xn st 1%,95
![Page 13: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/13.jpg)
Using Error Bars to compare data• Standard Deviation
– Demonstrates data variability, but no comparison possible
• Standard Error– If bars overlap, any difference in means is not statistically significant– If bars do not overlap, indicates nothing!
• 95% Confidence Interval– If bars overlap, indicates nothing!– If bars do not overlap, difference is statistically significant
• We’ll use 95 % CI in this class– Any time you have 3 or more data points, determine mean,
standard deviation, standard error, and t95%,n-1, then plot mean with error bars showing the 95% confidence interval
![Page 14: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/14.jpg)
Adding Error Bars to an Excel Graph• Create Graph
– Column, scatter,…
• Select Data Series• In Layout Tab-Analysis Group, select Error Bars • Select More Error Bar Options• Select Custom and Specify Values and select
cells containing the valuesXn st 1%,95
![Page 15: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/15.jpg)
Example 1: 95% CITurbidity Data +/- 95% CI
1 2 3 mean St Dev n St-Err t95%,2 t95%,2St-Err
NTU NTU NTU NTU NTU NTUFilter 1 2.1 2.1 2.2 2.1 0.06 3 0.03 4.30 0.14Filter 2 3.2 4.4 5 4.2 0.92 3 0.53 4.30 2.28Filter 3 4.3 4.2 4.5 4.3 0.15 3 0.09 4.30 0.38
2.1
4.2 4.3
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Filter 1 Filter 2 Filter 3
Portable Water Filter
Tu
rbid
ity
(NT
U)
![Page 16: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/16.jpg)
What can we do?
• Lift weight multiple times using different solar panel combinations (or hyrdoturbines, or gear boxes) and plot mean and 95 % Confidence interval error bars.– If error bars overlap between to different test conditions,
indicates nothing!– If error bars do not overlap, difference is statistically
significant
![Page 17: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/17.jpg)
T Test
• A more sophisticated way to compare means• Use t test to determine if means of two
populations are different• E.g., lift times with different solar panel combinations
or turbines or…
![Page 18: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/18.jpg)
Comparing Two Data Sets using the t test
• Example - You lift weight with two panels in series and two in parallel.– Series: Mean = 2 min, s = 0.5 min, n = 20– Parallel: Mean = 3 min, s = 0.6 min, n = 20
• You ask the question - Do the different panel combinations result in different lift times?– Different in a statistically significant way
![Page 19: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/19.jpg)
Are the Lift Times Different?• Use TTEST (Excel)
• Fractional probability of being wrong if you claim the two populations are different– We’ll say they are significantly different if
probability is ≤ 0.05
Series Parallel1.5 3
2 2.42.2 2.21.8 2.6
3 3.41.6 3.61.2 3.82.1 3.51.9 2.72.2 2.42.6 3.51.7 3.81.8 2.11.5 2.52.4 3.42.5 3.32.7 2.41.4 3.61.5 2.32.6 3.7
![Page 20: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/20.jpg)
Marbles
![Page 21: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/21.jpg)
Linear Regression
• Fit the best straight line to a data set
y = 1.897x + 0.8667R2 = 0.9762
0
5
10
15
20
25
0 2 4 6 8 10 12
Gra
de
Po
int
Ave
rag
e
Height (m)
Right-click on data point and select “trendline”. Select options to show equation and R2.
![Page 22: Statistics: Data Presentation & Analysis Fr Clinic I](https://reader035.vdocuments.site/reader035/viewer/2022081513/56649d595503460f94a39103/html5/thumbnails/22.jpg)
R2 - Coefficient of multiple Determination
• R2 = n(ŷi - ybar)2 / n(yi - ybar)2
– ŷi = Predicted y values, from regression equation
– yi = Observed y values
– Ybar = mean of y
• R2 = fraction of variance explained by regression– R2 = 1 if data lies along a straight line