i.6 statistical tests

Upload: intal-xd

Post on 07-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 I.6 Statistical Tests

    1/29

    I.6 The Nature of Statistical Testing

    Definitions:

    Statistic: statement of numerical information about asample

    Parameter: statement of numerical information about apopulation

    Remember: When a statistic is put forth to represent apopulation, there is some ERROR associated with that figure.

    Error can be described by the probability that one chose agood sample

    We will use:

    Normal distribution or Sampling Distribution Standard Deviation (standard error of the mean)

    Definition:

    Point Estimate: a single number, based on sample data, used toestimate a population parameter.

  • 8/6/2019 I.6 Statistical Tests

    2/29

    Its great if the standard deviation is small, but HOW

    SMALL?

    We use an interval estimate:

    An interval within we might state the parameter probablylies

    Based on the sample data

    A confidence interval is a specific interval estimate of aparameter determined by using data obtained from a sampleand by using the confidence level of the estimate

    The probability associated with a specific interval is calledthe confidence level or confidence coefficient

    Confidence because the probability is viewed as anindicator the parameter lies within the interval

    The higher the probability, the more certain we are themethod will produce an interval containing the parameter

    BUT! Confidence levels are specified BEFORE the intervalestimate is made

  • 8/6/2019 I.6 Statistical Tests

    3/29

    So for a fixed confidence level (say 0.95), how does one find aconfidence interval to estimate the population mean?

    Take 95/2 = 47.5. The z-value corresponding to 47.5% is 1.96.

    If X is our (single) sample mean we would expect the interval

    ( 1.96 , 1.96 )x xX X

    to contain the parameter with confidence level .95.

    OR

    With a confidence level of .95, the parameter is within

    1.96 +1.96X X

    X X

    The more confidence desired, the allowance for sampling erroris greater.

    This means less precision for the estimate.

  • 8/6/2019 I.6 Statistical Tests

    4/29

    Example:

    Suppose a company wants to know an interval estimate at the

    .95 confidence level for the mean time

    it takes a machine toproduce its product.

    A sample of 100 shows an average time of 6 minutes per item.

    Assuming 1.5 minutes, we need to find X .

    Now comes the choices

    We must assume the population is infinite (We werent told this)

    This gives us:

    1.5.15

    100X

    And so we have:

    1.96 +1.96

    6 1.96 0.15 6+1.96 0.15

    5.706 6.294

    X XX X

    With a confidence level of .95, the mean is in this region.

  • 8/6/2019 I.6 Statistical Tests

    5/29

    Suppose a sample of 100 has an average of 5.8 minutes per item.

    To get a confidence level of .95 wed expect the following:

    1.96 +1.96

    5.8 1.96 0.15 5.8 + 1.96 0.15

    5.506 6.094

    X XX X

    Suppose we wanted a confidence level of .9. Then wed need

    the corresponding z-value.

    90/2=45

    And the corresponding z-value is about 1.64 with an intervalestimate of

    1.65 +1.65X X

    X X

  • 8/6/2019 I.6 Statistical Tests

    6/29

    KEEP IN MIND: The use of the normal curve is based on theidea that the sampling distribution of means for LARGEsamples is normally distributed.

    If the sample size is too small (say 30 or fewer), then wed use a

    t distribution

    (We dont cover this)

    You might have asked How do we know the standard deviation

    of the population? If we knew that, wed probably know themean.

    In practice, the standard deviation is estimated. One way isto do the following:

    1) Take a sample size of size n.

    2) Let S standard for the samples standard deviation (you findthis on your own)

    3) The estimated population standard deviation is:

    1

    nS

    n

  • 8/6/2019 I.6 Statistical Tests

    7/29

    Example: Suppose we seek an interval estimate for thepopulation given in Table 4.1 on pg 62. at the confidence of .90.

    We gather a sample of size 40

    Yearly Salary Number of Employees17,500 319,000 620,500 4

    22,500 322,750 828,000 533,000 640,000 155,000 4

    For a confidence of .90 we want to compute

    1.65X

    X and +1.65 XX

    1. Find the mean of the above table.3(17500) 6(19000) 4(20500) 3(22250) 8(22750) 5(28000) 6(33000) 1(40000) 4(55000)

    40

    27,381

    X

    X

  • 8/6/2019 I.6 Statistical Tests

    8/29

    It will take a long time to find S, so Ill just tell you:

    S = 10,672.5

    2. We estimate the population standard deviation using theestimator

    1

    4010672.5

    39

    10672.5 (1.012)

    nS

    n

    And so the population standard deviation is estimated to be:

    10672.5 (1.1012)

    10,800.6

  • 8/6/2019 I.6 Statistical Tests

    9/29

    3. Find X : Note: the population size is known

    1

    (10,800.6) (700 40)

    69940

    1659.4

    X

    X

    N n

    Nn

    ** X measures on average how far the sample mean is from the population

    mean.

    4. PLUG everything into 1.65 XX and +1.65 XX

    1.65X

    X =

    +1.65 XX

    =

    So the interval is

    $24,642.99 < < $30,119.01

  • 8/6/2019 I.6 Statistical Tests

    10/29

    Example:

    Suppose an agency of a state has an aid program available to

    cities with an average income less than $16,000. The city ofWellon does not qualify because their average income is$17,000.

    You believe they miscalculated thinking the actual average isabout $15,500.

    BUT before you go bring this up to the state you want to test

    it out.

    A couple things to think about:

    1. You must decide how large or significant should thedifference be between the average of the sample and the15,500 you think is correct.

    2. Is $1,500 significant enough to bring your case to thestate?

    3. What are your chances of getting a sample mean which ismore than $1,500 away?

    4. If the chances are too great, you probably wont bring thisup to the state.

    The city will reject their point of view if the difference betweentheir sample mean and $15,000 has a 5 percent or less chance ofoccurring.

  • 8/6/2019 I.6 Statistical Tests

    11/29

    Consider the following graph:

    The city will reject their point of view if 1X y or 2X y .

    y1 and y2 are chosen about $15,500 so that 95% of the area

    will lie between y1 and y2 and 5% will lie outside y1 and y2.

    1 15,500 1.96 Xy

    2 15,500 1.96 Xy

  • 8/6/2019 I.6 Statistical Tests

    12/29

    Consider the following graph:

    Suppose you took a sample of 30 wage earners, compute theirmean salary X , the estimator , and

    X .

    Heres what you do:

    If the sample mean falls within the acceptance region, then you

    have a good case to bring to the state.

    If your sample mean falls outside the acceptance region, thenyou have a very small chance of a mean salary near $15,500.

  • 8/6/2019 I.6 Statistical Tests

    13/29

    HYPOTHESIS TESTING PROCEDURE

    1.State the assumed value of the population parameter to betested.

    a.This statement is known as the null hypothesisb.It is denoted 0H . It refers to no differencec.State the conclusion to be drawn if the initial

    assumption is rejected.

    d.It is called the alternate hypothesis, i.e, in otherwords, there is a difference

    .

    2. Determine a criterion for rejection or acceptance.Establish a min acceptance level of probability for a

    difference between the population parameter and thecorresponding sample statistic.

    This probability level is the risk of rejecting the nullhypothesis when it is actually true.

    The risk level is calledthe level of significance or alphalevel

    For example: .05 Means there is a 5% chance Im wrong.

  • 8/6/2019 I.6 Statistical Tests

    14/29

    3. Determine the appropriate probability distribution. Normal Distribution t distribution Etc.

    4. Based on the significance level and the distribution chosen,define the rejection region or regions.

    If the sample statistic does not fall in a rejection region,there isno statistical evidence to doubt the nullhypothesis

    But!! This doesnt prove it5. Formally state the decision rule based on the sample

    results.

    Did you reject or fail to reject your null hypothesis?

    6. Take the necessary sample and compute the appropriatesample statistic

    7. Make the statistical conclusion concerning the nullhypothesis according to the results form Step 5.

  • 8/6/2019 I.6 Statistical Tests

    15/29

    Chi-square Testing

    It is used to determine the probability that the difference

    between actually observed sample data and expected datahave occurred by chance.

    Compare the expected distribution of a data set to the observeddistribution.

    Example:

    Suppose John took a coin and flipped it 5 times and recorded thenumber of tails which appeared. If John repeated this process 32times he could EXPECT the following frequency of tails toappear.

    John flips a coin 5 times (this is ONE trial) and does 32 trials.

    No. of Tails Expected0 11 52 103 104 55 1

    What this means:

    Out of the 32 trials:

  • 8/6/2019 I.6 Statistical Tests

    16/29

    One of the trials John gets NO tails.

    Five trials he gets 1 tail out of 5

    Ten trials he gets 3 tails out of 5.

    Five trials he gets 4 tails

    In actually carrying out the process, this is what he really gets.

    No. of Tails Tails Observed0 21 62 143 94 05 1

    Example: There are 14 trials where 2 of the 5 tosses comes uptails.

    The Question: Is his observed distribution different ENOUGHfrom the expected distribution to suspect a biased coin?

    Example: If out of all 32 trials tails shows up 4 out of 5 timesthen the coin would be suspect to a bias.

  • 8/6/2019 I.6 Statistical Tests

    17/29

    Null Hypothesis: The coin is fair (i.e., no bias)

    Choose a significant levelsay .05

    Formula for the chi-square statistic:

    2 2 2

    2 3 31 1 2 2

    1 2 3

    ( ) ( ( ) ( )... n n

    n

    O E O EO E O E X

    E E E E

    iO is the observed frequency outcome for the ith data point and

    iE is the expected frequency outcome for the ith data point.

    Example: For no tails:

    Expected 0 1E time Observed 0 2O times

    For 3 tails:

    Expected 3 10E times Observed 3 14O times

  • 8/6/2019 I.6 Statistical Tests

    18/29

    So

    2

    2 2 2 2 22

    0 5(2 1) (6 5) (14 10) (9 10) (1 1)

    1 5 10 10 5 1X

    2 7.9X

    Chi Square Probability Distribution

    d .05 .025 .01 .005

    1 3.841 5.024 6.635 7.8792 5.991 7.378 9.210 10.597

    3 7.815 9.348 11.345 12.8384 9.488 11.143 13.277 14.8605 11.070 12.832 15.086 16.7506 12.562 14.449 16.812 18.5487 14.067 16.013 18.475 20.2788 15.507 17.535 20.090 21.9559 16.919 19.023 21.666 23.589

    10 18.307 20.483 23.209 25.188

  • 8/6/2019 I.6 Statistical Tests

    19/29

    Left hand column represents the Degrees of Freedom

    Example:

    There are 6 different outcomes for tossing a coin. You can

    get 0 tails, 1 tail, 2 tails, 3 tails 4 tails, and 5 tails.

    Once we count up how many times we got 0, 1, 2, 3, 4 tails, weknow how many outcomes got 5 tails.

    So it took us 5 entries (0, 1, 2, 3, 4) to figure out the 6th entry

    will be 1.

    * If there is one column of data, d = # of rows minus 1 = r - 1

    With2 7.9X , 5 degrees of freedom, and .05

    we look at the table and write down the number.

    Its 11.070

    In the coin tossing example where there are 5 degrees of

    freedom and .05 was chosen, the man should reject his null

    hypothesis if the

    2

    X statistic is large then 11.070.

    Since our statistic was 7.9 (below 11.070), the null hypothesisshould NOT be rejected.

    Conclusion:

  • 8/6/2019 I.6 Statistical Tests

    20/29

    Example: Suppose two candidates (White and Smith) arerunning for election to congress in a district with 460,000 voters.It is suggested that women in the electorate voted for White in

    significantly larger numbers than men. How can this suggestionbe tested?

    We need to make some assumptions.

    Suppose the electorate was evenly divided between menand woman (230,000 men and 230,000 women).

    White received 62% of the votes. The expected distribution for 100 men and 100 women

    would beVotes for White

    Yes NoSex ofVoters

    M 62 38F 62 38

    If there were no bias (meaning there wasnt a larger

    number of women voting for White than men), this shouldbe the expected outcome.

    Exactly 62% of each (Men and Women) voted for White

    However, when they took a sample of 100 men and 100 women,the following distribution was observed.

  • 8/6/2019 I.6 Statistical Tests

    21/29

    Votes for White

    Yes NoSex ofVoters

    M 52 48F 72 28

    It is clear White got a higher proportion of votes fromwomen in this sample

    BUT.. is the difference significant.

    1.Null Hypothesis: There are no differences betweenmen and women in this election.

    2.Choose .01 3.Now find the Chi-square distribution.4.When there are more than one column and one row,

    the degrees of freedom are

    d = (# of columns 1) x (# of rows 1)

    5.We decide to reject the hypothesis if 2 6.635X

  • 8/6/2019 I.6 Statistical Tests

    22/29

    2 2 2 22

    2

    2

    (52 62) (72 62) (48 38) (28 38)

    62 62 38 38

    100 / 62 100 / 62 100 / 38 100 / 38

    8.489

    X

    X

    X

    Therefore, the null hypothesis is rejected; the difference issignificant.

    What does this mean?

    If we were to run the election over again, then wereconfident woman would still vote in a large number for

    White than men.

    Why did this concern us?

    We wanted to make sure the difference between menand women didnt happen by chance.

    If the difference was by chance, it would be mean that ifwe did run the election again the women might not votemore for White than men.

  • 8/6/2019 I.6 Statistical Tests

    23/29

    Example: There are 20 questions on Brians multiple choiceexam. Bill observed the following distribution of answers onthe exam.

    It looks like Brian might favor b as his answer on the test. Billwants to know if this is by chance. If its not by chance, thenwhen Bill doesnt know an answer should he choose b?

    1.Null Hypothesis: Brian doesnt favor bor There is no difference in the distrubtion

    of answers

    2.Bill sets .05 3.There are 4 degrees of freedom4.If 2X > 9.488, then we reject the hypothesis.5.The expected distribution, meaning each answer is

    equally likely, is:

    Answer Frequencya 3b 8c 3d 3e 3

  • 8/6/2019 I.6 Statistical Tests

    24/29

    Answer Frequency

    a 4b 4c 4d 4E 4

    Then

    Since 5 < 9.488, Bill cannot reject they hypothesis. This meansBrian might not favor b and that the distribution could have

    occurred by chance.

  • 8/6/2019 I.6 Statistical Tests

    25/29

    Example: Consider the following tables of expected andobserved distributions concerning the buying preferences ofconsumers.

    Observed ExpectedBrand

    FavoredNo. of

    respondentsFavored No. of

    respondentsA 123 A 105B 76 B 86C 48 C 56

    Based on these distributions, what should the chi-square statisticbe?

    2 2 2

    2

    2

    2

    123 105 76 86 48 56

    105 86 56

    3.09 1.16 1.14

    5.39

    X

    X

    X

    There are 2 degrees of freedom.

    From our chart, we will not be rejecting the null hypothesis(whatever it is)

  • 8/6/2019 I.6 Statistical Tests

    26/29

    Two brands of varnish, High-Glo and No-Glo, are available at alocal store. The manager of the store is keeping track of sales,so that he can accurately predict the needs of the customers.

    The manager had previously predicted these needs and hasdecided to test his figures by using a chi square statistic. Theexpected and actual sales for 115 customers are listed below:

    Brand Expected ActualHi-Glo 75 62No-Glo 40 53

    There is 1 degree of freedom.

    2 2

    2

    2

    2

    62 75 53 40

    75 40

    2.729 3.189

    5.918

    X

    X

    X

    If .05 or .025 , we would not reject the null hypothesis

    If .01 or below, we would have to reject his hypothesis

  • 8/6/2019 I.6 Statistical Tests

    27/29

    In your project, you will have a setup similar to this:Expected

    Yes No Noresponse/opinionGroup 1 A B CGroup 2 D E F

    Actual / Observed

    Yes No Noresponse/opinion

    Group 1 R1C1 R1C2 R1C3Group 2 R2C1 R2C2 R2C3

    Ex. R1C1 = row 1, column 1Ex. R2C3 = row 2, column 3

    How to find the numbers in the expected table?

    A(row1,col1) = 1 1 1 2 1 3 1 1 2 1

    total number surveyed

    R C R C R C x R C R C

    (add up the numbers in row 1) x (add up the numbers in column 1)

    total number surveyed

    F(row2,col3) = 2 1 2 2 2 3 1 3 2 3

    total number surveyed

    R C R C R C x R C R C

    add up the numbers in row 2 add up the numbers in column 3total number surveyed

    x

  • 8/6/2019 I.6 Statistical Tests

    28/29

    Example: Suppose I surveyed men and women on a certain question and foundthat

    Men Women

    Yes 57 45

    No 23 12N/R 8 2

    I want to test if there was a difference on how men and women would answer thisquestion. In other words, are the responses to this question dependent on theirgender?

    Null:

    Alternative:

    Create the expected table:

  • 8/6/2019 I.6 Statistical Tests

    29/29

    Find the chi square statistic:

    If 0.05 , what is my rejection level?

    Conclusion