the green lab - [07-a] data analysis
TRANSCRIPT
![Page 1: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/1.jpg)
1 Het begint met een idee
Data AnalysisDescriptive Statistics and EDA
Giuseppe Procaccianti
![Page 2: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/2.jpg)
Vrije Universiteit Amsterdam
2 Giuseppe Procaccianti / S2 group / The Green Lab
Quick Recap
Experiment scoping
Experiment planning
Idea
Experiment operation
Analysis & interpretation
Presentation & package
![Page 3: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/3.jpg)
Vrije Universiteit Amsterdam
3 Giuseppe Procaccianti / S2 group / The Green Lab
Analysis and Interpretation
● Understanding the data
○ descriptive statistics
○ exploratory data analysis (EDA, e.g. boxplots, scatter plots)
● (Optional) data reduction
● Hypothesis testing
● Results interpretation
![Page 4: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/4.jpg)
Vrije Universiteit Amsterdam
4 Giuseppe Procaccianti / S2 group / The Green Lab
Descriptive Statistics
● Goal: get a ‘feeling’ about how data is distributed
● Properties:
○ Central Tendency (e.g. Mean, Median)
○ Dispersion (e.g. Frequency, Standard Deviation)
○ Dependency (e.g. Correlation)
![Page 5: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/5.jpg)
Vrije Universiteit Amsterdam
5 Giuseppe Procaccianti / S2 group / The Green Lab
Parameter vs. statistic
● Parameter: feature of the population
○ μ: mean
○ σ: standard deviation
● Statistic: feature of the sample
○ : mean
○ s: standard deviation
● Statistics are an estimation of parameters
![Page 6: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/6.jpg)
Vrije Universiteit Amsterdam
6 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency
● Arithmetic mean:
● Geometric Mean:
![Page 7: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/7.jpg)
Vrije Universiteit Amsterdam
7 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency: example
● Average of scores:
6 - 7 - 8 - 9 - 10
● Arithmetic mean: 8
● Geometric mean: ~7.87
![Page 8: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/8.jpg)
Vrije Universiteit Amsterdam
8 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency: example
● Average of returns of investments:
90% ; 10% ; 20% ; 30% ; -90%
● Arithmetic mean:
(90+10+20+30-90)/5= 12%
● Geometric mean:
[(1.9 x 1.1 x 1.2 x 1.3 x 0.1) ^ 1/5] - 1 =0.2008= -20.08%
![Page 9: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/9.jpg)
Vrije Universiteit Amsterdam
9 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency
● Median (or 50% percentile): middle value separating the
greater and lesser halves of a data set
X = [13, 18, 13, 14, 13, 16, 14, 21, 13]
Xsort
= [13, 13, 13, 13, 14, 14, 16, 18, 21]
![Page 10: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/10.jpg)
Vrije Universiteit Amsterdam
10 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency
● Mode: most frequent value in data set
X = [13, 18, 13, 14, 13, 16, 14, 21, 13]
Mox = 13
![Page 11: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/11.jpg)
Vrije Universiteit Amsterdam
11 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency - Skewness
![Page 12: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/12.jpg)
Vrije Universiteit Amsterdam
12 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion
● Sample variance:
● Standard Deviation:
● Standard Deviation is dimensionally equivalent to the data
![Page 13: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/13.jpg)
Vrije Universiteit Amsterdam
13 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - three-sigma-rule
"Empirical Rule" by Dan Kernler - Own work. Licensed under CC BY-SA 4.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Empirical_Rule.PNG#/media/File:Empirical_Rule.PNG
![Page 14: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/14.jpg)
Vrije Universiteit Amsterdam
14 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - three-sigma-rule
● Range:
● Coefficient of variation:(in percentage of mean)
● Coefficient of variation only has meaning if all values are positive (ratio scale, not interval scale e.g. temperatures)
![Page 15: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/15.jpg)
Vrije Universiteit Amsterdam
15 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - example
● Dataset: [100, 100, 100]Mean: 100
● Variance: 0
● Standard Deviation: 0
● Coeff. Variation: 0
● Range: 0
![Page 16: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/16.jpg)
Vrije Universiteit Amsterdam
16 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - example
● Dataset: [90, 100, 110]Mean: 100
● Sample Variance: 100
● Standard Deviation: 10
● Coeff. Variation: 10%
● Range: 20
![Page 17: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/17.jpg)
Vrije Universiteit Amsterdam
17 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - example
● Dataset: [1, 5, 6, 8, 10, 40, 65, 88]Mean: 27.875
● Sample Variance: 1082.69
● Standard Deviation: 32.9
● Coeff. Variation: 1.18%
● Range: 87
![Page 18: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/18.jpg)
Vrije Universiteit Amsterdam
18 Giuseppe Procaccianti / S2 group / The Green Lab
Basic visualizations
Box Plot
Median
3rd quartile
1st quartile
![Page 19: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/19.jpg)
Vrije Universiteit Amsterdam
19 Giuseppe Procaccianti / S2 group / The Green Lab
Basic visualizations
Box Plot
![Page 20: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/20.jpg)
Vrije Universiteit Amsterdam
20 Giuseppe Procaccianti / S2 group / The Green Lab
Basic visualizations
Box Plot
By Gbdivers (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
outliers positive skewness
![Page 21: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/21.jpg)
Vrije Universiteit Amsterdam
21 Giuseppe Procaccianti / S2 group / The Green Lab
Dependency: correlation
● Sample correlation coefficient (Pearson):
● Meaningful when comparing paired values/datasets
![Page 22: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/22.jpg)
Vrije Universiteit Amsterdam
22 Giuseppe Procaccianti / S2 group / The Green Lab
Dependency: correlation
● Spearman’s rank correlation coefficient:
● Kendall’s rank correlation coefficient: ○ smaller values○ more accurate on small samples
● Pearson correlation coefficient assumes normally distributed data
![Page 23: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/23.jpg)
Vrije Universiteit Amsterdam
23 Giuseppe Procaccianti / S2 group / The Green Lab
Dependency: example
Age vs. body fat %
● Pearson: r = 0.7921
● Spearman: = 0.7539
● Kendall: = 0.5762
![Page 24: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/24.jpg)
Vrije Universiteit Amsterdam
24 Giuseppe Procaccianti / S2 group / The Green Lab
Basic Visualizations
Scatter Plot
![Page 25: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/25.jpg)
Vrije Universiteit Amsterdam
25 Giuseppe Procaccianti / S2 group / The Green Lab
Basic Visualizations
Image Source: http://www.cqeacademy.com/cqe-body-of-knowledge/continuous-improvement/quality-control-tools/the-scatter-plot-linear-regression/
Scatter plots per different values of r
![Page 26: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/26.jpg)
Vrije Universiteit Amsterdam
26 Giuseppe Procaccianti / S2 group / The Green Lab
Correlation does NOT imply causation!
● Spurious Correlations: http://tylervigen.com/
![Page 27: The Green Lab - [07-A] Data Analysis](https://reader034.vdocuments.site/reader034/viewer/2022052705/58a27fbf1a28ab891a8b580f/html5/thumbnails/27.jpg)
Vrije Universiteit Amsterdam
Thank you!
[email protected] 27 Giuseppe Procaccianti / S2 group / The Green Lab