business and economic statistics tutorial 1: describing categorical data ( ch 4)

46
Business and Economic Statistics Tutorial 1: Describing Categorical Data (Ch 4) Tutor: Sam Capurso E-mail: ... 1s t

Upload: ayame

Post on 14-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

1st. Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch 4). Tutor: Sam Capurso E-mail :. 1. Why Statistics?. Initiates policy / decisions. Statistics. Evaluates and informs policy / decisions. Accountants work in an economy (in fact, everyone does). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Business and Economic StatisticsTutorial 1: Describing Categorical Data (Ch 4)

Tutor: Sam CapursoE-mail: ...

1st

Page 2: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

1. Why Statistics?

Statistics

Initiates policy / decisions

Evaluates and informs policy / decisions

Accountants work in an economy (in fact, everyone does)

i P E.R. Confidence More...

Business Consumer

Page 3: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

2. Prac set up

Task Minutes

Attendance, hand back work 5

Summary for this week 5 - 10

Individual written work (4 in the semester)

10

Individual MCQ test 10 - 15

Group MCQ scratchy test 10 (or until finished)

Group WAQ Approx 1 hour

Worked Example 10 - 15

Page 4: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

3. First prac (only)

* Introduction* “House keeping”* Arrange groups* Work out team names and take attendance* Prac work

Page 5: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

• * Need to attend lectures and read text BEFORE PRAC

• * Assessment for pracs = • Indiv MCQ (5%)^ + Team MCQ (5%)^ + Team WAQ (10%)^^

• ^ Hand in prac

• ^^ Hand in by due date: … in hand-in box: names, ID numbers, time, day, tutor.

4. Things to note

Page 6: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

5. Add previous prac’s results

Page 7: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Building a HouseGroup activity

Page 8: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Roles

Architect – design, framework, ideas Tradesperson – technical, 'expert' in field Superintendent – leader, knowledge of different

areas Decorator – finer details, user-friendliness Real estate agent – communication, 'sells the

product' General contractor – follows direction, able to learn

how to perform different roles

Page 9: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Task

Questions:1.Why did you choose this role?2.What types of skills / experiences are related to

this role?3.What are the ways in which someone in your

role can work with someone from (choose a different role)?

4.How can you relate this activity to working in your BES team?

Page 11: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Population

Sampling frame (list)

Target sample

Actual sample (respondents)

Convenience sampling

Undercoverage

Non-response bias

Voluntary response bias

Response bias

Note:

n↑ ≠ ↓biasn↑ ↓sampling error

(error due to randomness)

Need to improve survey design to bias

If ↑ n, just asking more people the wrong question!

Sampling:

Page 12: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Simpson’s ParadoxE.g.2nd

Page 13: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

School Girls Boys Total

School A 273 77 350

School B 289 61 350

Total 562 138 700

Which school had higher proportion of girls?

School % girls

School A 78%

School B 83%

School B has more girls

Page 14: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

School A Girls Boys Total

Yr 11 81 6 87

Yr 12 192 71 263

Total 273 77 350

School B Girls Boys Total

Yr 11 234 36 270

Yr 12 55 25 80

Total 289 61 350

School A has more girls in

each year level

School

Year 11

Girls Boys

Year 12

Girls Boys

Percentage of girls by school broken into year levels

School Yr 11 Yr 12

School A 93% 73%

School B 87% 69%

So, something must be going on with year levels when we add them up to get results before.

Page 15: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

School A Girls Boys TotalYr 11 81 6 87Yr 12 192 71 263Total 273 77 350

School B Girls Boys TotalYr 11 234 36 270Yr 12 55 25 80Total 289 61 350

Percentage of girls in each year level

Year level % girlsYr 11 88%Yr 12 72%

% Yr 11 in each school

School % Yr 11School A 25%School B 77%

So, proportion of girls exaggerated in School B, because...* Year 11 students are more likely to be girls, and* School B has higher proportion of Year 11 students

Year 12Girls

Boys

Year 11Girls

Boys

School A Yr 11

Yr 12

School B Yr 11

Yr 12

CharacteristicCategory summed

Group Category summed

Page 16: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Displaying and Describing Quantitative Data

3rd Note

Page 17: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Displaying and Describing Quantitative Data

3rd Note

Page 18: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Displaying and Describing Quantitative Data

• Construct a box-and-whisker plot for the following data: 3, 8, 1, 5, 3, -2, 3

• Solution:• Ordered: -2, 1, 3, 3, 3, 5, 8• Median: 3• Q1: 2• Q3: 4• IQR: 4 – 2 = 2• 1.5 * IQR = 3• LF = Q1 – 3 = -1• UF= Q3 + 3 = 7• So, whiskers at 1 and 5, outliers are -2 and 8

3rd E.g.

Page 19: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Interpretation of slope coefficient

Clip:http://www.youtube.com/watch?v=BgCoGYXwD4w&list=UUZFQ2rSVMR2ahKAzBto5P7w

Note4th

Page 20: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Correlation and Linear Regression

• The difference between r (correlation coefficient) and R2 (the coefficient of determination)…

• The difference between interpreting r and commenting on a scatter plot…

• Question – True or false? Two variables which are strongly related will always have a high correlation coefficient. Explain…

• Is this point unusual? What to do…

E.g.4th

Page 21: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Probability and Expected Values

E.g.

Be aware of the following:

* V[X + c] ≠ V[X] + c

* SD[X + Y] ≠ SD[X] + SD[Y]; = V Var[X] + Var[Y]* where X, Y are random variables, c is a constant.

* Note the two tests for independence…

* Interpretation of expected value: we expect ….(include units)… in the long run, on average.

5th

Page 22: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Probability and Expected Values

E.g.

Questions:

1. Find the formula for P(A or B) if A and B are: independent; not independent.

2. Find the formula for P(A and B) if A and B are: disjoint; not disjoint.

3. Consider disjoint events A and B, which both have non-zero probabilities. Can A and B ever be independent? Explain in words or using formulae.

4. Complete the following: E[aX + bY + c]; Var[aX + bY + c], where a, b are constants, and X, Y are independent random variables

5th

Page 23: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Probability and Expected Values

E.g.

Consider a single trial with two outcomes, success (which we will represent by a 1) or failure (0).

Let the probability of success be p.

a) What is the probability of failure? Hint: you need to make sure the probability model is valid.

b) Write down the formula for calculating the expected value.c) Use this to work out E(y) in terms of p.d) Write down the formula for calculating variance.e) Use this to show Var(y) = p(1-p).

y 0 1

Pr(y) ? p

5th

Solutions

Page 24: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Normal and sampling distributions

• The four types of normal probability questions: P(X < A) P(A < X < B) = P(X < B) – P(X < A) P(X > B) = P(X < -B) = 1 – P(X < B) Given the probability, what are the boundaries?

Proportions

Shape Model Normal

Centre Mean

Spread Variance

Assumptions Conditions

1.2.

1.2.3.

Means

Shape Model Normal

Centre Mean

Spread Variance

Assumptions Conditions

1.2.

1.2.3.

http://www.youtube.com/watch?v=ddBdqqtXiao&feature=c4-overview&list=UUZFQ2rSVMR2ahKAzBto5P7w

6th Note

Because Z tables only have < probs

Page 25: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Normal distributionE.g.6th

The length, X cm, of members of a certain species of fish is normally distributed with mean 40 and standard deviation 5.

a. Find the probability that a fish is longer than 45 cm.

b. Find the probability that a fish is between 35 cm and 50 cm long.

c. Describe the longest 10% of this specifies of fish.Solutions

Page 26: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Confidence intervals and hypothesis tests

Proportions• Confidence intervals for proportions: + z

• Remember to check conditions

• Interpretation: we are 95% confident the population proportion lies between [lower bound] and [upper bound]

• n =

7th Note

CI 90% 95% 99%

z 1.645 1.96 2.576

Page 27: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Confidence intervals and hypothesis tests

Means

• CI: + twhere s = sample standard deviationand where t has df = n – 1 • Remember to check conditions

• Similar interpretation…

7th Note

Demo – finding t from tables

Page 28: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Confidence intervals and hypothesis tests

Hypothesis tests of one proportion

• Hypothesis test: one-tailed (< >) or two-tailed• Conditions• State model using (z or t)• Standardised statistic• P-value (or… learn other way this week, ‘critical

value’ approach)• Conclusion

7th Note

Page 29: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Hypothesis test: 1 proportionHistorically, 53% of the population supported the ruling political

party. A recent survey, in which the 150 respondents were selected randomly, showed that 93 of them supported the party. A two-tailed z-test at the 0.05 level of significance is to be used to determine whether or not the population proportion has significantly changed.

a. State the null hypothesis and the alternative hypothesis.b. Check the conditions that justify inference in this context.c. Determine whether or not the null hypothesis should be

rejected, and make a conclusion based on your finding.

E.g.7th

Handwritten solution

Page 30: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Inference so far… reviewing the p-value

8th Note

Page 31: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Inference so far…

8th Note

Page 32: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Inference so far…

hypothesis tests for counts

8th Note

Page 33: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Hypothesis test: 1 mean

• Previous research has shown that the average IQ of Australians was 110. In 2012, a random sample of 40 Australians revealed an average IQ of 100 with standard deviation 15. The researcher wants to test, at a 1% level of significance, whether the average IQ of Australians has indeed decreased.

• (Fictional data)

E.g.

Handwritten solution

8th

Page 34: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Excel Output9th Note

Page 35: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Inference in regression9th Note

Page 36: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Inference in regression9th Note

Page 37: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Inference in regression9th Note

Page 38: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Inference in regressionWe are estimating the relationship between bwght (birth weight of newborn baby in pounds) and cigs (packets of cigarettes smoked per week by mother prior to birth).Consider the Excel output below and answer the following questions.

Regression Statistics

Multiple R -0.1507R Square 0.0227Adjusted R square 0.022

Standard Error 1.258

Observations 1388

ANOVA

df SS MS F Significance F

Regression 1 51.0172632 51.0172632 32.24 0

Residual 1386 2193.55977 1.58265495

Total 1387 2244.57703 1.61829634

Coefficients S. Error tstat P-value Lower 95% Upper 95%Intercept 7.485744 0.0357713 209.27 0 7.415572 7.55915cigs -0.0321108 0.0056557 -5.68 0 -0.0432054 -0.03210161

E.g.9th

Page 39: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

a. Which do you think is the explanatory variable and which is the response variable?

b. Write down and interpret the correlation coefficient.c. Write down and interpret R2 (the coefficient of determination).d. Interpret the slope and the intercept.e. Are the signs and sizes of the slope and intercepts reasonable? Explain.f. Write down and interpret the 95% confidence interval for the slope.g. Do the same for the 90% confidence interval. Explain how this differs from

the 95% confidence interval.h. Formulate a null and alternative hypothesis for the slope, using economic or

general theory.i. Conduct this hypothesis test using a 5% level of significance and make a

conclusion.j. Test whether the slope is significantly different from -0.05 at a 1% level of

significance.k. Suppose a hypothesis test for the slope had hypotheses H0: β1 = 0, and HA:

β1≠0. Explain the purpose of conducting this test in terms of assessing whether the current regression model should be used.

E.g.9th

Page 40: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Notation - recap:• μ

• σ

• s• = (or for estimate)• n• N• P

• p-value• b0,1

• β0,1

• Population mean• Sample mean• Population standard deviation (variability of individual observations)• Sample standard deviation• Standard deviation of sample means

• Sample size• Population size• Population proportion• Sample proportion• See definition…• Sample coefficient on intercept/slope in

regression• Population coefficient on intercept/slope in

regression

10th Note

Page 41: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Multiple Linear Regression; Dummy Variables; Time Series – some things to note

Multiple linear regression

• Interpretation of slope coefficient: we estimate for every [one unit] increase in [explanatory variable], the [response variable] [increases/decreased] by [… units], on average, holding all other explanatory variables fixed.

• Inference on the whole equation• H0: β1 = β2 = … = 0

no linear relationship between Y and X1, X2, …

• HA: β1 ≠ 0 and/or β2 ≠ 0 at least one of the slopes is significant; there is a significant relationship

between the response variable and the explanatory variables as a group.

• Use p-value from Excel “Significance-F”

10th Note

Page 42: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Multiple Linear Regression; Dummy Variables; Time Series – some things to note

Dummy variables

• Interpretation of dummy variables… see example.• The dummy variable trap…• Testing the significance of a dummy variable is the same as

testing whether there is a significant difference between the means of the two categories.

Time Series

• Interpretation of trend line, trend = a + bt

• Trend is [a units] at [origin] and [increases / decreases] by [b units] each [time period, t].

10th Note

Components of a classical time series modelTrend

Cyclical

Seasonal

Irregular

Page 43: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Dummy Variables1. Consider the following equation:• Income = β0 + β1experience + β2gender + ε

• where gender = 1 if male, 0 if female.

a. State what you expect the sign of β1 and β2 to be. Explain why.

b. Interpret the following:i. The slope coefficient on gender.ii. The slope coefficient on experience.c. Redefine gender to be 1 if female, 0 if male. What happens to β2?

2. Suppose that we want to examine the level of crime in different regions of Adelaide: north, south, east and west. In other words, in our regression model, crime level is the response variable, and region is the explanatory variable. Create a dummy variable for the region.

Solutions – for 2

10th E.g.

Page 44: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Time Series and Price Indices

• Price relative = 100*

• Be careful about the difference between a percentage increase and percentage point increase.

Assume a, b > 100

• Interpretation: price index of A means prices are (a – 100)% higher in Year A than in the base year / there has been a (a – 100)% increase

• The increase in the index number from Year A to Year B is (b – a) percentage points or… • %• Note: you could do the same using prices, instead of price indices. • Interpretation of average price relatives: on average, the price of the … goods increased by

…% between … and … (*)

• Could do the same for expenditure … … but of little use.• Same interpretation, but instead of “price” use “cost”.

11th Note

Year Base year A B

Prince index 100 a b

Page 45: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Time Series and Price Indices

• Laspeyres Price Index = . This is the increase in the cost of the time 0 basket of goods in time t relative to what they cost in time 0.

• Paasche Price Index = = . This is the increase in the cost of the time t basket of goods in 2010 relative to what they would have cost in 2008.

• Same interpretation as (*)

• Note:• Why the Laspeyres and Paasche Indices differ.• How to shift the base, and chain series. • Nominal = in current prices. Real = in constant (base year

prices)• Real prices = (if price index base = 100)

11th Note

Page 46: Business and Economic Statistics Tutorial 1: Describing Categorical Data ( Ch  4)

Time Series and Price Indices

Discussion question – what are the limitations of the CPI?

• Overestimates price index because there is a type of Laspeyres index

• What items are included in the goods basket? (Can’t include all of them!)

• Only surveys metropolitan households• Data taken from survey – potential sources of sampling bias• Does not account for change in quality in goods with same /

lower price (e.g. computers)• How do you include new technology that didn’t exist in the

previous period?• What prices do you take? CPI doesn’t take into account sales /

specials

11th Note