measures of centrality
DESCRIPTION
measures of centrality. Last lecture summary. Which graphs did we meet? scatter plot ( bodový graf ) bar chart (sloupcový graf) histogram pie chart (koláčový graf) How do they work, what are their advantages and/or disadvantages?. SDA women – histogram of heights 2014. n = 48 or N = 48 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/1.jpg)
MEASURES OF CENTRALITY
![Page 2: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/2.jpg)
Last lecture summary• Which graphs did we meet?
• scatter plot (bodový graf)• bar chart (sloupcový graf)• histogram• pie chart (koláčový graf)
• How do they work, what are their advantages and/or disadvantages?
![Page 3: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/3.jpg)
SDA women – histogram of heights 2014
n = 48 or N = 48
bin size = 3.8
![Page 4: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/4.jpg)
Distributions
negatively skewedskewed to the left
positively skewedskewed to the left
http://turnthewheel.org/free-textbooks/street-smart-stats/
e.g., life expectancy e.g., body height e.g., income
![Page 5: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/5.jpg)
STATISTICS IS BEATIFULnew stuff
![Page 6: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/6.jpg)
Life expectancy data• Watch TED talk by Hans Rosling, Gapminder Foundation:
http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html
![Page 7: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/7.jpg)
STATISTICS IS DEEP
![Page 8: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/8.jpg)
UC BerkeleyThough data are fake, the paradox is the same
Simpson’s paradox
www.udacity.com – Introduction to statistics
![Page 9: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/9.jpg)
Male
Applied Admitted Rate [%]MAJOR A 900 450MAJOR B 100 10
www.udacity.com – Introduction to statistics
![Page 10: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/10.jpg)
Male
Applied Admitted Rate [%]MAJOR A 900 450 50MAJOR B 100 10 10
www.udacity.com – Introduction to statistics
![Page 11: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/11.jpg)
Female
Applied Admitted Rate [%]MAJOR A 100 80MAJOR B 900 180
www.udacity.com – Introduction to statistics
![Page 12: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/12.jpg)
Female
Applied Admitted Rate [%]MAJOR A 100 80 80MAJOR B 900 180 20
www.udacity.com – Introduction to statistics
![Page 13: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/13.jpg)
Gender bias
What do you think, is there a gender bias?
Who do you think is favored? Male or female?
Applied Admitted Rate [%]MAJOR A 900 450 50MAJOR B 100 10 10
Applied Admitted Rate [%]MAJOR A 100 80 80MAJOR B 900 180 20
www.udacity.com – Introduction to statistics
![Page 14: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/14.jpg)
Gender bias
Applied Admitted Rate [%]MAJOR A 900 450 50MAJOR B 100 10 10
Both 1000 460 46
Applied Admitted Rate [%]MAJOR A 100 80 80MAJOR B 900 180 20
Both 1000 260 26
male
female
www.udacity.com – Introduction to statistics
![Page 15: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/15.jpg)
Gender bias
Rate [%]MAJOR A 50MAJOR B 10
Both 46
Rate [%]MAJOR A 80MAJOR B 20
Both 26
male
female
www.udacity.com – Introduction to statistics
![Page 16: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/16.jpg)
Statistics is ambiguous• This example ilustrates how ambiguous the statistics is.
• In choosing how to graph your data you may majorily impact what people believe to be the case.
“I never believe in statistics I didn’t doctor myself.”“Nikdy nevěřím statistice, kterou si sám nezfalšuji.”
Who said that?
Winston Churchill
www.udacity.com – Introduction to statistics
![Page 17: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/17.jpg)
What is statistics?• Statistics – the science of collecting, organizing,
summarizing, analyzing and interpreting data• Goal – use imperfect information (our data) to infer facts,
make predictions, and make decisions
• Descriptive statistic – describing and summarising data with numbers or pictures
• Inferential statistics – making conclusions or decisions based on data
![Page 18: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/18.jpg)
Variables• variable – a value or characteristics that can vary from
individual to individual• example: favorite color, age
• How variables are classified?
• quantitative variable – numerical values, often with units of measurement, arise from the how much/how many question, example: age, annual income, number children• continuous (spojitá proměnná), example: height, weight• discrete (diskrétní proměnná), example: number of children
• continuous variables can be discretized
![Page 19: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/19.jpg)
Variables• categorical (qualitative) variables
• categories that have no particular order• example: favorite color, gender, nationality
• ordinal• they are not numerical but their values have a natural order• example: tempterature low/medium/high
![Page 20: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/20.jpg)
variable(proměnná)
quantitative(kvantitativní)
categorical(kategorická)
continuous(spojitá)
discrete(diskrétní)
ordinal(ordinální)
Variables
![Page 21: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/21.jpg)
Choosing a profession
Chemistry Geography
50 000 – 60 000 40 000 – 55 000
www.udacity.com – Statistics
![Page 22: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/22.jpg)
Choosing a profession• We made an interval estimate.• But ideally we want one number that describes the entire
dataset. This allows us to quickly summarize all our data.
www.udacity.com – Statistics
![Page 23: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/23.jpg)
Choosing a profession
1. The value at which frequency is highest.
2. The value where frequency is lowest.
3. Value in the middle.
4. Biggest value of x-axis.
5. Mean
Chemistry Geography
www.udacity.com – Statistics
![Page 24: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/24.jpg)
Three big M’s
• The value at which frequency is highest is called the mode. i.e. the most common value is the mode.
• The value in the middle of the distribution is called the median.
• The mean is the mean (average is the synonymum).
Chemistry Geography
www.udacity.com – Statistics
![Page 25: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/25.jpg)
Quick quiz• What is the mode in our data?
2 5 6 5 2 6 9 8 5 2 3 5
www.udacity.com – Statistics
![Page 26: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/26.jpg)
Mode in negatively skewed distribution
www.udacity.com – Statistics
![Page 27: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/27.jpg)
Mode in uniform distribution
www.udacity.com – Statistics
![Page 28: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/28.jpg)
Multimodal distribution
www.udacity.com – Statistics
![Page 29: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/29.jpg)
Mode in categorical data
www.udacity.com – Statistics
![Page 30: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/30.jpg)
More of modeTrue or False?
1. The mode can be used to describe any type of data we have, whether it’s numerical or categorical.
2. All scores in the dataset affect the mode.
3. If we take a lot of samples from the same population, the mode will be the same in each sample.
4. There is an equation for the mode.
• Ad 3.• http://onlinestatbook.com/stat_sim/sampling_dist/ • http://www.shodor.org/interactivate/activities/Histogram/ - mode changes as you
change a bin size.
• Because 3. is not true, we can’t use mode to learn something about our population. Mode depends on how you present the data.
www.udacity.com – Statistics
![Page 31: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/31.jpg)
Life expectancy data
www.coursera.org – Statistics: Making Sense of Data
![Page 32: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/32.jpg)
Minimum
Sierra Leone
minimum = 47.8
www.coursera.org – Statistics: Making Sense of Data
![Page 33: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/33.jpg)
Maximum
Japan
maximum = 84.3
www.coursera.org – Statistics: Making Sense of Data
![Page 34: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/34.jpg)
Life expectancy data
all countries
www.coursera.org – Statistics: Making Sense of Data
![Page 35: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/35.jpg)
Life expectancy data
1 197
Egypt
99
73.2half larger
half smaller
www.coursera.org – Statistics: Making Sense of Data
![Page 36: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/36.jpg)
Life expectancy data
Minimum = 47.8
Maximum = 83.4
Median = 73.2
www.coursera.org – Statistics: Making Sense of Data
![Page 37: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/37.jpg)
Q1
1 197
Sao Tomé & Príncipe
50 (¼ way)
1st quartile = 64.7
www.coursera.org – Statistics: Making Sense of Data
![Page 38: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/38.jpg)
Q1
¾ larger¼ smaller
1st quartile = 64.7
www.coursera.org – Statistics: Making Sense of Data
![Page 39: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/39.jpg)
Q3
1 197
NetherlandAntilles
148 (¾ way)
3rd quartile = 76.7
www.coursera.org – Statistics: Making Sense of Data
![Page 40: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/40.jpg)
Q3
3rd quartile = 76.7
¾ smaller ¼ larger
www.coursera.org – Statistics: Making Sense of Data
![Page 41: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/41.jpg)
Life expectancy data
Minimum = 47.8
Maximum = 83.4
Median = 73.2
1st quartile = 64.7
3rd quartile = 76.7
www.coursera.org – Statistics: Making Sense of Data
![Page 42: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/42.jpg)
Box Plot
www.coursera.org – Statistics: Making Sense of Data
![Page 43: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/43.jpg)
Box plot
1st quartile
3rd quartilemedian
minimum
maximum
![Page 44: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/44.jpg)
Modified box plot
IQRinterquartile range
1.5 x IQR
outliers
outliers
![Page 45: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/45.jpg)
Quartiles, median – how to do it?
79, 68, 88, 69, 90, 74, 87, 93, 76
Find min, max, median, Q1, Q3 in these data. Then, draw the box plot.
www.coursera.org – Statistics: Making Sense of Data
![Page 46: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/46.jpg)
![Page 47: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/47.jpg)
Another example
Min. 1st Qu. Median 3rd Qu. Max.
68.00 75.00 81.00 88.50 93.00
78, 93, 68, 84, 90, 74
![Page 48: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/48.jpg)
Percentiles
věk [roky]http://www.rustovyhormon.cz/on-line-rustove-grafy
![Page 49: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/49.jpg)
3rd M – Mean• Mathematical notation:
• … Greek letter capital sigma• means SUM in mathematics
• Another measure of the center of the data: mean (average)
• Data values:
![Page 50: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/50.jpg)
Salary of 25 players of the American football (NY red Bulls) in 2012.
33 750
33 750
33 750
33 750
44 000
44 000
44 000
44 000
45 566
65 000
95 000
103 500
112 495
138 188
141 666
181 500
185 000
190 000
194 375
195 000
205 000
292 500
301 999
4 600 000
5 600 000
median = 112 495
mean = 518 311
Mean is not a robust statistic.
Median is a robust statistic.
Robust statistic
![Page 51: measures of centrality](https://reader036.vdocuments.site/reader036/viewer/2022062321/56813a48550346895da23c04/html5/thumbnails/51.jpg)
10% trimmed mean … eliminate upper and lower 10% of data
Trimmed mean is more robust.
Trimmed mean33 750
33 750
33 750
33 750
44 000
44 000
44 000
44 000
45 566
65 000
95 000
103 500
112 495
138 188
141 666
181 500
185 000
190 000
194 375
195 000
205 000
292 500
301 999
4 600 000
5 600 000
median = 112 495
mean = 518 311
10% trimmed mean = 128 109