statistics math 416. game plan introduction introduction census / poll / survey census / poll /...
TRANSCRIPT
StatisticsStatistics
Math 416Math 416
Game PlanGame Plan
IntroductionIntroduction Census / Poll / SurveyCensus / Poll / Survey Population – Sample – BiasPopulation – Sample – Bias Sample Proportion Sample Proportion Mean Median ModeMean Median Mode Box and Whisker PlotBox and Whisker Plot Box and Whisker Interpretation Box and Whisker Interpretation
Stats IntroStats Intro There are lies, there are damn lies and There are lies, there are damn lies and
then there are statistics then there are statistics
- Mark Twain- Mark Twain The goal is by the use of number The goal is by the use of number
describe a characteristic of a describe a characteristic of a population.population.
The idea is to win your argument by The idea is to win your argument by providing facts and too many people providing facts and too many people consider statistics to be absolute facts.consider statistics to be absolute facts.
Stats IntroStats Intro In general, most people do not In general, most people do not
understand statistics.understand statistics. Hypothesis: Student A has a school Hypothesis: Student A has a school
average of 10%average of 10% Conclusion: Student A is a bad Conclusion: Student A is a bad
person. person. The statistic does not measure the The statistic does not measure the
person’s goodness or badness.person’s goodness or badness. What does that statistic mean?What does that statistic mean? If all there marks were the same for If all there marks were the same for
all courses, it would be 10%all courses, it would be 10%
Statistics Statistics Life is a continual battle to get your Life is a continual battle to get your
ideas across and have other people ideas across and have other people trying to get their ideas across to you.trying to get their ideas across to you.
You are constantly being bombarded You are constantly being bombarded by arguments and statistics. by arguments and statistics.
1.1. CommercialsCommercials2.2. TeachersTeachers
To understand the world around you, To understand the world around you, need to be aware of statistics meaning need to be aware of statistics meaning and reliability.and reliability.
Where do statistics come from?Where do statistics come from?
Population Population
First we establish the population.First we establish the population. Population: the complete group that we Population: the complete group that we
are investigating are investigating Characteristic: A particular identifying Characteristic: A particular identifying
object exhibited by the population object exhibited by the population
i.e. hair colouri.e. hair colour
favorite colourfavorite colour
math knowledge, political math knowledge, political opinion etc. opinion etc.
PopulationPopulation
The next problem is interpreting The next problem is interpreting how to measure a characteristic and how to measure a characteristic and obtain the data.obtain the data.
Obtaining the Data: Three methodsObtaining the Data: Three methodsMethod #1: Ask the whole Method #1: Ask the whole population population
- Called a censusCalled a census- Problems – hard to do – Problems – hard to do – depending on population depending on population
CensusCensus
Method #1: Ask the whole Method #1: Ask the whole population population
- Called a censusCalled a census- Problems – hard to do – Problems – hard to do – depending on population depending on population
PollPoll
Method #2: - Ask a Method #2: - Ask a representative “sample” of the representative “sample” of the populationpopulation
- Called a poll- Called a poll
Problems: representative may be Problems: representative may be trickytricky
SurveySurvey
Method #3: Ask only experts of Method #3: Ask only experts of the populationthe population
- called a survey- called a survey
Problems: who is an expertProblems: who is an expert
Representative sample?Representative sample?
BiasBiasBiasBias If data is obtained or presented in an If data is obtained or presented in an
unfair manner than all conclusions are unfair manner than all conclusions are not correct. The results are said to be not correct. The results are said to be biased (or unfair).biased (or unfair).
In collectionIn collection How and who you ask is the main source How and who you ask is the main source
of biasof bias There are 4 types of bias (bad sampling, There are 4 types of bias (bad sampling,
non pertinence, wording of question & non pertinence, wording of question & attitude of pollster).attitude of pollster).
BiasBiasEg asking 5 yr olds their favorite beer Eg asking 5 yr olds their favorite beer Bad sampling Bad sampling
Eg Do you like to play an instrument? (to Eg Do you like to play an instrument? (to find favorite color) find favorite color)
Non-pertinence Non-pertinence
Eg man I hate Bush, are you in favor of Eg man I hate Bush, are you in favor of war?war?
Wording of questions Wording of questions
Eg a policeman asking were you speeding? Eg a policeman asking were you speeding? Attitude of pollster Attitude of pollster
Presentation BiasPresentation Bias
In presentation, imagine you In presentation, imagine you disregard a grade level and disregard a grade level and claim that they do not matter in claim that they do not matter in a school’s decision. a school’s decision.
I need to prove my product is I need to prove my product is the best, how can I get these the best, how can I get these numbers to show that?numbers to show that?
Buy This Stock!Buy This Stock!
$400$300
$0
$100
$200
$500
Jan April
March
Feb Jan
Not!
A statistical presentation is always biased
Stencil #1-3
Representative SampleRepresentative Sample
Creating a representative Creating a representative sample can be an art form in sample can be an art form in itself. The sample should be in itself. The sample should be in all the same proportions, an all the same proportions, an impossibility. impossibility.
You must focus on the You must focus on the characteristics (the poll or characteristics (the poll or survey is focusing on!)survey is focusing on!)
Representative SampleRepresentative Sample Consider a school has 50 boys and Consider a school has 50 boys and
25 girls and a representative 25 girls and a representative sample of 10 needs to be created.sample of 10 needs to be created.
We note the population is We note the population is described in terms of boys and described in terms of boys and girls hence we will need to create girls hence we will need to create our sample on that basisour sample on that basis
Three stepsThree steps
Representative SampleRepresentative Sample1) Relative (by percent)1) Relative (by percent)
50/75 = 67%
25/75 = 33%
n = 75
2) Theory - sample = 10
10x.67=6.7 10x.33 = 3.3
Difficult to get .7 or .3 of a person!
3) Reality
7 3 Total of 10 & has added bias
Some RulesSome Rules
If it starts at zero it stays at zeroIf it starts at zero it stays at zero If it appears to be zero be If it appears to be zero be
careful!careful! Make decisions on a category Make decisions on a category
not overallnot overall
Creating a SampleCreating a Sample1)1) Given the following, create a sample Given the following, create a sample
of 10of 10 Hudson Non-HudsonHudson Non-Hudson
Youngn = 109
Relative
Middle AgedOld
0 120 2431 33
MA 18% 22%
0 28% 30%
Y 0 0 Is it really 0 people?
Creating a SampleCreating a Sample
Y
open here
Reality
MAO
0 01.8
2.22.
83
Y 0 0
Theory
MAO
2 23 3
Stencil 4,5, 6 Do relative, theory and reality for #4; in #5 & 6 put theory & relative together
Statistics Central Tendency Statistics Central Tendency - Mean- Mean
Mean meansMean meansthe averageSymbol x
Found by dividing the sum ∑xi
by the number of elements n. i.e. x = ∑ xi
Means which value would all values be equal to if they were the same i.e. (5,9,3,6)x = ∑ xi
n
n
= (5+9+3+6)/4 = 5.75
ModeMode
Symbol MSymbol M It is the number that appears the It is the number that appears the
mostmost It is possible, not to have any or to It is possible, not to have any or to
have more than one modehave more than one mode Eg (1,2,5) Eg (1,2,5) Eg (1,6,6,8)Eg (1,6,6,8) Eg (1,3,3,4,4,8)Eg (1,3,3,4,4,8)
M = (nothing repeats)
M = 3 & 4M = 6
MedianMedian Symbol Symbol MM Median is found as the middle valueMedian is found as the middle value Note the sample must be in order!Note the sample must be in order! There are two possibilities (odd & There are two possibilities (odd &
even)even) Consider (1,5,7) n = 3Consider (1,5,7) n = 3 Odd only 1 middle; M = 5Odd only 1 middle; M = 5 (1,5,7,8) n = 4(1,5,7,8) n = 4 You must find the mean of both You must find the mean of both
middles (5 + 7)/2 = 6middles (5 + 7)/2 = 6
Do #7
Box & Whisker PlotBox & Whisker Plot (2,5,1,6,9,8,)(2,5,1,6,9,8,) The Construction The Construction 1) Make sure your sample is in order1) Make sure your sample is in order (1, 2,5,6,8,9)(1, 2,5,6,8,9) 2) Find the min, max & median2) Find the min, max & median Min = 1 ; max = 9 median = 5.5 = QMin = 1 ; max = 9 median = 5.5 = Q22
These three points will serve you as These three points will serve you as part of the box and whisker diagram. part of the box and whisker diagram. Draw it on board…Draw it on board…
Box & Whisker PlotBox & Whisker Plot
-1 -0 1 2 3 4 5 6 7 8 9 10
113) Create a number line with vertical line at the3) Create a number line with vertical line at the
three points hingesthree points hinges4) Find median between min and Q4) Find median between min and Q2 2
called Q1called Q15) Find median between Q2 and max called Q3
It is 2 and make another hingeIt is 2 and make another hinge
It is 8 & make another hinge.
(1,2,5,6,8(1,2,5,6,8,9),9)
Complete it!
Words & FactsWords & Facts We have broken the data into We have broken the data into
four parts called quartiles. four parts called quartiles.
Whiskers
MinBox
WhiskersQ1
Q3
Q2Interquar
tile range = Q3-Q1
Max
Words & FactsWords & Facts Each quartile should hold about ¼ Each quartile should hold about ¼
of the data BUT you cannot be sureof the data BUT you cannot be sure You cannot tell the mean or the You cannot tell the mean or the
modemode Do not jump to conclusions! Do not jump to conclusions! A box and whisker gives you an idea A box and whisker gives you an idea
about the spread or concentration about the spread or concentration or dispersion of dataor dispersion of data
Example #1Example #1
1 2 3 4 5 6 7 8 9 10
11 A general ViewA general View
This data is very close together This data is very close together below 4. There is more of a spread below 4. There is more of a spread between 4 and 11 and once again between 4 and 11 and once again between 11 and 12. between 11 and 12. Some Questions…Some Questions…What is the Mean?
12
No idea
QuestionsQuestions What is the mode?What is the mode? What is the median?What is the median? What is the What is the
interquartile range?interquartile range? How many are below How many are below
11?11?
No idea
Q2 = 4Q3-Q1 = 11-3 = 8
75% but no idea of the numberThe lowest concentration of
numbers lie where?
Lowest concentration vs. highest concentration
Between 4 - 11
Example #2Example #2
30
40
50
60
80
70
100
90
n = 20 Class An = 40 Class B
a) Which class did better?
Hard to tell but class Ab) What are the
meansNo idea
¾ x 20 + 2/4 x 40 = 35c) All together approximately how many were over 60%
Example #2 – More Example #2 – More QuestionsQuestions
Which class and which mark was the Which class and which mark was the highest?highest?
Class B at approximately 97%Class B at approximately 97% Which class has lowest range?Which class has lowest range? Class AClass A 87-55 = 3287-55 = 32 Class BClass B 97-40 = 5797-40 = 57 Answer: Class AAnswer: Class A Finish Stencil Finish Stencil