1 hrir 8011 “statistics is a collection of procedures and principles for gaining and processing...
TRANSCRIPT
1
HRIR 8011
• “Statistics is a collection of procedures and principles for gaining and processing information in order to make decisions when faced with uncertainty.” (Utts, p. 3)
• Objective of HRIR 8011: learning to use information to make good (not lousy) decisions, which requires• Collecting information (data)• Analyzing data• Interpreting the results of the analyses
2
Consider…
• Employees who are dissatisfied with their job are more likely to vote for a union than employees who are satisfied (HRIR 8071)
• Structured interviews are better than open-ended interviews when selecting new employees (HRIR 8031)
• An HR manager asks what is the market rate of pay?• An HR manager asks what can I do to reduce
absenteeism?• If low paid workers are absent more, do you raise
wages?
3
The Focus of HRIR 8011
• Our focus…the procedures and principles of using information correctly
• When Professor Tubre says that you should use a cognitive ability test, question it! How do we know we should use it?
• What information is this conclusion based on?
• How were the data collected? Does that seem applicable to my situation?
• How were the data analyzed? Was that appropriate? What did they miss?
• Are the conclusions justified based on the data and the results?
4
Index Numbers
• Index value = 100 X
• Price Index Example: if current cost is $3,300 and base period costs is $2,400 then
• Price Index = 100 X (3,300/2,400) = 137.5
• Interpretation: the current period is 37.5% percent higher than the base period
current value
base period value
5
Time Series
5.0
5.5
6.0
6.5
7.0
7.5
8.0
8.5
Year 1 Year 2 Year 3
6
Measurement
7
Measurement
• Validity
• Reliability
• Bias
8
Seven Measurement
Pitfalls• Deliberate bias• Unintentional bias• Desire to please• Asking the uninformed• Unnecessary complexity• Ordering of questions• Confidentiality and anonymity
• Source: Jessica M. Utts, Seeing Through Statistics, 2nd ed. (Pacific Grove, CA: Duxbury, 1999), p. 32.
9
Cumulative Frequency
• Recall Eggs R Us
Race | Freq. Percent Cumul.‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ African American | 87 15.10 15.10 Asian American | 6 1.04 16.15 Hispanic | 25 4.34 20.49 white | 458 79.51 100.00‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ Total | 576 100.00
10
Percentiles
• The pth percentile of a sample is the value for which at most p% of the measurements are less than that value and at most (100-p)% of the measurements are greater than that value
• Median
• Quartiles
• Deciles
11
Box Plot
SmallestLargest
LowerQuartile
UpperQuartile
Median
{Middle halfof the data
12
Box Plot
0 10 20 30 40
Age
0 2 4 6 8 10
Tenure
13
A Box Plot in Labor
Source: Alan B. Krueger and Alexandre Mas, “Strikes, Scabs, and Tread Separations: Labor Strife and the Production of Defective Bridgestone/Firestone Tires,” Journal of Political Economy 112 (April 2004), pp. 252-289 at 274.
14
A Simple cdf Example
xRelative
FrequencyCumulative Frequency
1 0.125 0.125
3 0.125 0. 250
4 0. 250 0.500
5 0.125 0.625
6 0.125 0.750
9 0.125 0.875
10 0.125 1.000
Consider the simple data set:
1, 4, 6, 4, 10, 9, 3, 5
This yields the following relative and cumulative frequencies
15
A Simple cdf Example
0.000
0.125
0.250
0.375
0.500
0.625
0.750
0.875
1.000
0 1 2 3 4 5 6 7 8 9 10
xRelative
FrequencyCumulative Frequency
1 0.125 0.125
3 0.125 0. 250
4 0. 250 0.500
5 0.125 0.625
6 0.125 0.750
9 0.125 0.875
10 0.125 1.000
1. To make the cdf, start at zero and move to the right along the x-axis until you come to the first value of x (that is, x=1)
16
A Simple cdf Example
0.000
0.125
0.250
0.375
0.500
0.625
0.750
0.875
1.000
0 1 2 3 4 5 6 7 8 9 10
xRelative
FrequencyCumulative Frequency
1 0.125 0.125
3 0.125 0. 250
4 0. 250 0.500
5 0.125 0.625
6 0.125 0.750
9 0.125 0.875
10 0.125 1.000
2. The value x=1 accounts for 0.125 of the cumulative frequency so the cdf jumps up to 0.125
17
A Simple cdf Example
0.000
0.125
0.250
0.375
0.500
0.625
0.750
0.875
1.000
0 1 2 3 4 5 6 7 8 9 10
xRelative
FrequencyCumulative Frequency
1 0.125 0.125
3 0.125 0. 250
4 0. 250 0.500
5 0.125 0.625
6 0.125 0.750
9 0.125 0.875
10 0.125 1.000
3. Now continue to the right until you get to the next value (x=3) at which point the cdf jumps up another 0.125 to 0.250.
18
A Simple cdf Example
0.000
0.125
0.250
0.375
0.500
0.625
0.750
0.875
1.000
0 1 2 3 4 5 6 7 8 9 10
xRelative
FrequencyCumulative Frequency
1 0.125 0.125
3 0.125 0. 250
4 0. 250 0.500
5 0.125 0.625
6 0.125 0.750
9 0.125 0.875
10 0.125 1.000
4. At x=4, note that the relative frequency is 0.25 (recall that there were two occurrences of 4 in the data set) so the cdf jumps 0.25 to 0.50.
19
A Simple cdf Example
0.000
0.125
0.250
0.375
0.500
0.625
0.750
0.875
1.000
0 1 2 3 4 5 6 7 8 9 10
xRelative
FrequencyCumulative Frequency
1 0.125 0.125
3 0.125 0. 250
4 0. 250 0.500
5 0.125 0.625
6 0.125 0.750
9 0.125 0.875
10 0.125 1.000
5. Continuing for the remaining x values yields the completed cdf.
20
Birth of a Distribution
< 5 5 to 9 > 9
3 Bins
21
Birth of a Distribution
<2 2-4 4-6 6-8 8-1010-12 >12
7 Bins
22
Birth of a Distribution
15 Bins
23
Birth of a Distribution
33 Bins
24
Birth of a Distribution
1000 Bins
25
Different Distributions
26
Even More Distributions
27
Symmetrical Distributions
28
Symmetrical Distribution
29
Positively Skewed
30
Negatively Skewed
31
Symmetrical Distribution
Bell-shaped, symmetrical distribution
Will be very important for
statistical inference
32
Additional Variance Example
0
0.5
1
1.5
2
2.5
3
11 12 13 14 15 16 170
1
2
3
4
5
6
7
8
9
11 12 13 14 15 16 17
Cyberland (1st 10 obs) Contrived Sample
=13.9X =13.9X
=1.97 =0.30