intro to statistics for the social sciences spring, … › mgmt › delaney › d16s_sbs...intro to...

7
Intro to Statistics for the Social Sciences Spring, 2016, Dr. Suzanne Delaney Homework #27 You have been hired as a statistical consultant by Donald who is a used car dealer to help him understand his business better. You will complete six types of analyses for Donald: confidence intervals, t-test, one-way ANOVA, correlation, simple regression and multiple regression analyses. The database can be found on our class website: “Donald’s used car data” Finding “Descriptive Statistics” First we will focus on the mileage (miles on odometer) data. Use the data analysis package in Excel to find the “Descriptive Statistics” for Mileage. Step 1: Open database Step 2: Open data Analysis menu Step 3: Choose “Descriptive Statistics” Be sure to click the box for “Summary statistics” The average mileage = _______________ The standard deviation for mileage = _______________ The best (highest, or max) mileage = _______________ The worst (lowest, or min) mileage = _______________ The number of cars (count) = _______________ The standard error of mean for mileage = _______________ Calculating Confidence IntervalsFind standard error of mean by dividing the standard deviation by the square root of number of cars: Show your work here Did you find the same value as what is listed for Standard Error? _______ Help Donald find the confidence intervals for car mileage: Remember a confidence interval allows you to guess the mean of the population from the mean of a sample. By guessing a range we are more likely to be correct in our guess (even though our guess is a range rather than just a specific number.) For this problem you’ll need the following descriptive statistics from above: The average mileage = _______________ The standard error of mean for mileage = _______________ Critical z for 95% Confidence interval = _______________ (Same as critical z for two tailed test with alpha = .05 – see table to right) Critical z for 99% Confidence interval = _______________ (Same as critical z for two tailed test with alpha = .01 – see table to right) Step 3: Find the scores that border the middle 95% x = x + zσx Also written as: mean + (z score)(standard error of the mean) Step 4: Find the scores that border the middle 99% x = x + zσx Also written as: mean + (z score)(standard error of the mean) 95% Confidence Interval lower boundary raw score is _________ 95% Confidence Interval upper boundary raw score is _________ (Please input values into drawing on the right.) 99% Confidence Interval lower boundary raw score is _________ 99% Confidence Interval upper boundary raw score is _________ (Please input values into drawing on the right.) ____ ____ ____ ____ = _________ = Name: _____________________________ Lab Session: __________ CID Number: _________

Upload: others

Post on 28-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intro to Statistics for the Social Sciences Spring, … › mgmt › delaney › d16s_sbs...Intro to Statistics for the Social Sciences Spring, 2016, Dr. Suzanne Delaney Homework #27

Intro to Statistics for the Social Sciences

Spring, 2016, Dr. Suzanne Delaney Homework #27

You have been hired as a statistical consultant by Donald who is a used car dealer to help him understand his business better. You will complete six types of analyses for Donald: confidence intervals, t-test, one-way

ANOVA, correlation, simple regression and multiple regression analyses. The database can be found on our class website: “Donald’s used car data”

Finding “Descriptive Statistics”

First we will focus on the mileage (miles on odometer) data. Use the data analysis package in Excel to find the “Descriptive Statistics” for Mileage.

Step 1: Open database Step 2: Open data Analysis menu Step 3: Choose “Descriptive Statistics”

Be sure to click the box for “Summary statistics”

The average mileage = _______________ The standard deviation for mileage = _______________ The best (highest, or max) mileage = _______________ The worst (lowest, or min) mileage = _______________ The number of cars (count) = _______________ The standard error of mean for mileage = _______________ Calculating “Confidence Intervals” Find standard error of mean by dividing the standard deviation by the square root of number of cars: Show your work here

Did you find the same value as what is listed for Standard Error? _______

Help Donald find the confidence intervals for car mileage: Remember a confidence interval allows you to guess the mean of the population from the mean of a sample. By guessing a range we are more likely to be correct in our guess (even though our guess is a range rather than just a specific number.)

For this problem you’ll need the following descriptive statistics from above: The average mileage = _______________ The standard error of mean for mileage = _______________ Critical z for 95% Confidence interval = _______________ (Same as critical z for two tailed test with alpha = .05 – see table to right)

Critical z for 99% Confidence interval = _______________ (Same as critical z for two tailed test with alpha = .01 – see table to right)

Step 3: Find the scores that border the middle 95%

x = x + zσx Also written as: mean + (z score)(standard error of the mean)

Step 4: Find the scores that border the middle 99%

x = x + zσx Also written as: mean + (z score)(standard error of the mean)

95% Confidence Interval lower boundary raw score is _________ 95% Confidence Interval upper boundary raw score is _________

(Please input values into drawing on the right.)

99% Confidence Interval lower boundary raw score is _________ 99% Confidence Interval upper boundary raw score is _________

(Please input values into drawing on the right.)

____ ____

____ ____

= _________ =

Name: _____________________________

Lab Session: __________

CID Number: _________

Page 2: Intro to Statistics for the Social Sciences Spring, … › mgmt › delaney › d16s_sbs...Intro to Statistics for the Social Sciences Spring, 2016, Dr. Suzanne Delaney Homework #27

Creating a Histogram: Donald wants to see the actual curve for his data (mpg), so let’s create it and print in out for him. Again we will use the mileage (miles on the odometer) data.

Step 1: Open database - Choose tab on bottom called: “Mileage and Bins”

Step 2: Open ‘Data Analysis’ menu Step 3: Choose “Histogram”

For the “Input Range” select all of the data for “Mileage” For the “Bin Range” select the data for “Mileage Bin” Be sure to click the box for “Chart Output”

Step 4: Clean up your “Histogram” chart Delete the label that reads “Frequency” (click and delete) Delete the last row in the table “More 0” Select and then right-click on the histogram bars Choose “Format Data Series” – set “Gap Width” to zero

Step 5: Adjust the labels and print graph – It should look like this

Page 3: Intro to Statistics for the Social Sciences Spring, … › mgmt › delaney › d16s_sbs...Intro to Statistics for the Social Sciences Spring, 2016, Dr. Suzanne Delaney Homework #27

Completing a “t-test hypothesis test”

Donald wants to know whether car price is affected by the number of doors a car has. So he compares car price with

number of doors (there are only two choices 2-door and 4-door). So there are only two levels of the independent

variable. So, you decide to complete a t-test with an alpha of 0.05. {tea for two and two for tea}

Independent variable (IV): _______________________

Number of levels of IV (what are they?): ___ ___________________

Quasi or True experiment: _______________________ Dependent variable: _______________________ Level of measurement of DV: _______________________ Between or within participant design: _______________________ One or Two-tailed test: _______________________

Step 1: Open database - Choose tab on bottom called: “t-test doors & price”

Step 2: Open ‘Data Analysis’ menu - Choose “t-Test: Two-Sample Assuming Equal Variances” Be sure the data are sorted by “Doors” so that all of the 2-door cars are listed before all of the 4-door cars (Careful this next bit can be tricky - be sure that you select data from column A – “mileage” because that is our DV) For the “Variable 1 Range” select the data for “Mileage” but just for those cars with 2-doors (should be about 190 cars) For the “Variable 2 Range” select the data for “Mileage” but just for those cars with 4-doors (should be about 614 cars) (Careful to notice that you are entering 2-door first, then 4-door; so 2-door will appear first on Excel output)

Step 3: Interpret output Average price for 2-door cars & 4-door cars: ________ _________ State the alpha level: _______________________

Value for the observed t-statistic (called t Stat): _______________________

Value for the critical t-statistic: _______________________ Value for the degrees of freedom: _______________________ What is the p value: _______________________

Was it a significant difference: yes no

Should he reject the null hypothesis? yes no

Should he report the p < 0.05? yes no

Report finding in proper form: _______________________________________________________________ ____________________________________________________________________________________________

Step 4: Let’s draw a bar graph of the two means. Should look like this

Page 4: Intro to Statistics for the Social Sciences Spring, … › mgmt › delaney › d16s_sbs...Intro to Statistics for the Social Sciences Spring, 2016, Dr. Suzanne Delaney Homework #27

Completing an “ANOVA hypothesis test”

Donald wants to know whether car price is affected by the size of engine (4, 6, versus 8 cylinders). So he compares

car price with size of engine (there are three levels of the independent variable 4, 6, versus 8 cylinders). So, you

decide to complete an ANOVA with an alpha of 0.05.

Independent variable (IV): _______________________

Number of levels of IV (what are they?): ___ ___________________

Quasi or True experiment: _______________________ Dependent variable: _______________________ Level of measurement of DV: _______________________ Between or within participant design: _______________________ Step 1: Open database Choose tab on bottom called: “Engine size and price”

Step 2: Open ‘Data Analysis’ menu Step 3: Choose “ANOVA: Single Factor”

We have three columns, one for each level of the independent variable. The data have been rearranged so that Excel can complete the ANOVA. For “Input Range” select all three columns – some cells will be blank because we have a different number of cars in each category – that’s okay Remember to choose labels, and click appropriate box

Step 4: Interpret output Average price for 4, 6 and 8-cylinder cars: ________ ________ ________ State the alpha level: _______________________ Value for the observed F-statistic: _______________________ Value for the critical F-statistic: _______________________ Degrees of freedom between and within: ____________ ___________ What is the p value: _______________________

Was it a significant difference: yes no

Should he reject the null hypothesis? yes no

Should he report the p < 0.05? yes no

Report finding in proper form: ________________________________________________________ ____________________________________________________________________________________________ ____________________________________________________________________________________________

Step 5: Let’s draw a bar graph of the three means. Should look like this

Page 5: Intro to Statistics for the Social Sciences Spring, … › mgmt › delaney › d16s_sbs...Intro to Statistics for the Social Sciences Spring, 2016, Dr. Suzanne Delaney Homework #27

Completing a “Correlation”

Donald wants to know whether car price is related to mileage.

Both of these variables are numeric and he is looking for a relationship.

Step 1: Open database Choose tab on bottom called: “Mileage&Price”

Step 2: Create a scatter plot and label properly Should look like this:

Step 3: Open ‘Data Analysis’ and choose “Correlation”

Step 4: Interpret Correlation: Value of observed r: _______________ Degrees of freedom: _______________ Critical r: _______________

Was it a significant difference: yes no

Should he reject the null hypothesis? yes no

Should he report the p < 0.05? yes no

Construct a summary using proper formatting _______________________________________________________ ____________________________________________________________________________________________

Page 6: Intro to Statistics for the Social Sciences Spring, … › mgmt › delaney › d16s_sbs...Intro to Statistics for the Social Sciences Spring, 2016, Dr. Suzanne Delaney Homework #27

Completing a “Simple regression” Donald wants to know whether he can predict price better if he knows how many miles the care has on it. Step 1: Create a scatterplot using same data as you did when completing correlation Step 2: Find the regression line Highlight the data points by clicking on one of the dots, and then right click the mouse to get “Add Trendline” option and choose it. Also be sure to click on the “Display Equation on Chart” Option. Also, be sure to click on the “Display R-squared value on chart” Option Clean up the font so that it looks like this:

Step 3: Interpret regression Degrees of freedom: _______________ Value of correlation coefficient (r): _______________ Value of regression coefficient (b) _______________ Value of y intercept (a): _______________

Cars with more miles on the engine would tend to have ____________ price. (Higher or lower?)

Cars with few miles on the engine would tend to have ____________ price. (Higher or lower?)

What is the regression equation? __Y’ = ____________________________________________________________ Interpret slope for each additional mile we would predict what to happen to price? ____________________________________________________________________________________________ Interpret y intercept: _____________________________________________________________________________ If a car had 30,000 miles what would Donald predict the price to be? ______________________________________ What is the r2 for this problem? _______________ Please interpret the r2? __________________________________________________________________________ (Hint: “The proportion of total variance of the price of a car…..”)

Page 7: Intro to Statistics for the Social Sciences Spring, … › mgmt › delaney › d16s_sbs...Intro to Statistics for the Social Sciences Spring, 2016, Dr. Suzanne Delaney Homework #27

Completing a “Multiple regression” Donald wants to know whether he can predict the price of the cars better if he knows both how many miles the car has on it and how big the engine is. Step 1: Identify the predicted variable (DV): _______________________________

Identify the two predictor variables (IVs): _____________________ & ____________________ Step 2: Open database Choose tab on bottom called: “Price, mileage and car size”

Step 3: Create a correlation matrix Open ‘Data Analysis’ menu and choose “Correlation” We have three columns, select all three for the “Input Range”

d15s_hw18_summary_prototypical_designs.docx

Step 4: Open ‘Data Analysis’ menu and choose “Regression” We have three columns, Price is first and is our predicted “Y” variable (Choose this column for “Input Y Range”) Mileage and cylinder are next and are our two “X” variables (Choose both columns together for “Input X Range”)

Step 3: Interpret regression What was your regression coefficient for “Intercept” ___________; Is the p < 0.05? _____________ What was your regression coefficient for “Mileage” ___________; Is the p < 0.05? _____________ What was your regression coefficient for “Cylinder” ___________; Is the p < 0.05? _____________ What is your regression equation ___________________________________________________________ Y’ = a + b1X1 + b2X2 or Y’ = a + b1 (mileage) + b1 (car size) Interpreting slopes: • For each addition mile that the car is driven (as X goes up by 1), the predicted price of the car (Y) will decrease. If we increase mileage by 1 full point and hold the other independent variable constant, we can estimate an decrease of ________ in price. • For each increase in engine size (from 4 to 6, or 6 to 8 cylinders), the predicted price of the car (Y) will increase. If we increase engine size by 1 full point (so X goes up by 1) and hold the other independent variable constant,

we can estimate an increase of ________ in price. Your output should look like this:

d16s_sbs200_hw27_summary_prototypical_designs.docx