lecture -- 5 -- start - pitt.edusuper7/51011-52001/51431.pdf · test • which of the ... source: ....

167
Lecture -- 5 -- Start

Upload: vannhu

Post on 17-Feb-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Lecture -- 5 -- Start

Page 2: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Outline

1. Science, Method & Measurement2. On Building An Index3. Correlation & Causality4. Probability & Statistics5. Samples & Surveys6. Experimental & Quasi-experimental Designs7. Conceptual Models8. Quantitative Models9. Complexity & Chaos10. Recapitulation - Envoi

Page 3: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Outline

1. Science, Method & Measurement2. On Building An Index3. Correlation & Causality4. Probability & Statistics5. Samples & Surveys6. Experimental & Quasi-experimental Designs7. Conceptual Models8. Quantitative Models9. Complexity & Chaos10. Recapitulation - Envoi

Page 4: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Quantitative Techniques for Social Science Research

Ismail SerageldinAlexandria

2012

Lecture # 5:Samples And Surveys

Page 5: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Sample Surveys are among the most studied and written about topics in statistics

Page 6: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at
Page 7: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

So: no Textbooks.. Just follow the presentation

Page 8: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why Do Sample Surveys

Page 9: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why do we do sample surveys?

Page 10: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

We want to know something about the Population so we study a small sample of the Population

(making sure that the sample is representative)

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Page 11: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

So we will discuss how to undertake sampling and how to do surveys

Page 12: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Let’s start with some definitions

Page 13: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Data, Variables, Statistics and Parameters

Page 14: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Variables

• A variable is an attribute that describes a person, place, thing, or idea.

• The value of the variable can "vary" from one entity to another.

• Qualitative Variables are categorical: e.g. The color of balls are green, red or blue.

• Quantitative Variables are numeric: e.g. the population of a city.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 15: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Quantitative Variables: Continuous and Discrete

• Continuous variables can take any value between the maximum/minimum range: e.g. the weight of the persons in a class.

• Discrete variables must have an integer value: e.g tossing a coin, how many times do we get heads? It can never be 2.7 times, it will have to be 1,2,3,…n

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 16: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

TEST

• Which of the following statements are true?– I. All variables can be classified as quantitative or

categorical variables. – II. Categorical variables can be continuous

variables. – III. Quantitative variables can be discrete

variables.

• Answer: I and III are correct

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 17: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

TEST

• Which of the following statements are true?– I. All variables can be classified as quantitative or

categorical variables. – II. Categorical variables can be continuous

variables. – III. Quantitative variables can be discrete

variables.

• Answer: I and III are correct

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 18: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Two Snapshots, Two “states”:Discrete variables imply sudden moves

from state to stateContinuous variables imply constantly

changing transitions between two snapshots

Page 19: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Transitions can be cut up in discrete states

Page 20: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

But many transitions are really continuous

Page 21: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Example: Students leaving school and

entering the Labor Market

Page 22: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at
Page 23: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Later we will discuss how this fits in Markov chains and the manpower model

Page 24: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

But let’s go back to the issues of Data Collection

Page 25: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Methods Of Data Collection

• There are four main methods of data collection.

• Census. A census is a study that obtains data from every member of a population. In most studies, a census is not practical, because of the cost and/or time required .

• Sample survey. A sample survey is a study that obtains data from a subset of a population, in order to estimate population attributes.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 26: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Methods of Data Collection (Cont’d)

• Experiment. An experiment is a controlled study in which the researcher attempts to understand cause-and -effect relationships.

• Observational study. The researcher is not able to control (1) how subjects are assigned to groups and/or (2) which treatments each group receives.

• (Case Studies are observations of one case.)• Note: Observational Studies do NOT allow

you to generalize the findings.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 27: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why do Sample Surveys?

• The reason for conducting a sample survey is to estimate the value of some attribute of a population .

• It is much cheaper and easier than doing a whole census

• When done scientifically, we can define the error term accurately (e.g. ±3%)

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 28: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Pros and Cons

• Resources . A well-designed sample survey can provide very precise estimates of population parameters - quicker, cheaper, and with less manpower than a census.

• Generalizability . Applying findings from a study to a larger population. Generalizability requires random selection.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 29: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Pros and Cons (continued)

• Causal inference . Cause-and -effect relationships can be teased out when subjects are randomly assigned to groups.

• Therefore, experiments , which allow the researcher to control assignment of subjects to treatment groups, are the best method for investigating causal relationships

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 30: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

We will have a lot more to say on Experimental Designs later.

Page 31: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

We must distinguish between the sample statistic

and the population parameter

Page 32: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

From Population To Sample To Population:

(From Sample Statistic To Population Parameter)

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Page 33: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Population Parameter vs. Sample Statistic

• Population parameter. A population parameter is the true value of a population attribute.

• Sample statistic . A sample statistic is an estimate , based on sample data, of a population parameter.

• The estimate comes with the error term (e.g . ±3%)

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 34: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Example Of Population Parameter vs. Sample Statistic

• Example. We want to know the percentage of voters that favor a new tax. – The actual percentage of all the voters is a popula tion

parameter. – The estimate of that percentage, based on sample da ta,

is a sample statistic.

• The quality of a sample statistic (i.e., accuracy, precision, representativeness) is strongly affected by the way that sample observations are chosen; that is, by the sampling method .

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 35: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Bad Surveys make for bad estimates

Page 36: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Estimates of the front runners in the Egyptian Presidential Election 2012

• Before the first Round:

1. Abdel MoneimAboulfotouh

2. Amr Moussa3. Mohamed Morsi4. Hamdein Sabahi5. Ahmed Shafik

• After the first Round:

1. Mohamed Morsi2. Ahmed Shafik3. Hamdein Sabahi4. Abdel Moneim

Aboulfotouh5. Amr Moussa

Page 37: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

The US 1948 Presidential Election:Truman vs. Dewey

Page 38: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Bad (Inaccurate) Polls

Page 39: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

• The 52% is the finding from the sample survey

• The Error term (±3%) is related to the Sampling error: it means that we think the real value is between 49 % and 55%

• The 95 % confidence level means that there are 95 chances in 100 that these values are correct; i.e. that the real figures in the population will fall in that range.

• The error term will vary according to the size of sample.

What does it mean to say: “the poll says 52% (±3%) at 95% confidence level?”

Page 40: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

What is sampling error? (The margin of error, or the ± 3%)

• Sampling Error is the calculated statistical imprecision due to interviewing a random sample instead of the entire population.

• The margin of error provides an estimate of how much the results of the sample may differ due to chance when compared to what would have been found if the entire population was interviewed.

• The confidence level (95 % or 95 out of 100) says that we are that confident in that result within that ± error term.

Page 41: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Sampling error

• Sampling error is related to sample size, but it is not the only kind of error possible in a sample surveys.

• You can look it up in sampling error tables such as the one I can show you here

• This table is produced by Gallup for a sample from a target population of 200 million, with a confidence level of 95%

Page 42: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Recommended allowance for sampling error of a percentage *In Percentage Points (at 95 in 100 confidence level )**

SAMPLE SIZE

1,000 750 500 250 100

Percentage near 10 2% 2% 3% 4% 6%

Percentage near 20 3 3 4 5 9

Percentage near 30 3 4 4 6 10

Percentage near 40 3 4 5 7 10

Percentage near 50 3 4 5 7 11

Percentage near 60 3 4 5 7 10

Percentage near 70 3 4 4 6 10

Percentage near 80 3 3 4 5 9

Percentage near 90 2 2 3 4 6 Table extracted from 'The Gallup Poll Monthly'. Cit ed at

http://www.ropercenter.uconn.edu/education/polling_ fundamentals_error.html

Page 43: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

An Important Observation:Statistical Error and sample size

• As the sample size increases, there are diminishing returns in percentage error.

• At percentages near 50 %, the statistical error drops from 7 to 5% as the sample size is increased from 250 to 500 .

• But, if the sample size is increased from 750 to 1,000, the statistical error drops from 4 to 3%.

• As the sample size rises above 1,000 , the decrease in marginal returns is even more noticeable.

Page 44: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Among others, Langer Research Associates offers a margin -of-error calculator -- MoEMachine -- as a convenient tool for data

producers and everyday data users. Access the MoE Machine at

http://langerresearch.com/moe.php.

Page 45: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

So, let’s learn more about surveys and sampling…

Page 46: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Types of Samples

Page 47: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

What is a Survey?

• A survey may refer to many different types or techniques of observation, but it most often involves a questionnaire used to measure the characteristics and/or attitudes of people.

• Since we do not do a coverage of all the population we select a sample .

• Different ways of contacting members of a sample once they have been selected is the subject of survey data collection.

Page 48: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

What is Survey Sampling?

• In statistics, survey sampling describes the process of selecting a sample of elements from a target population in order to conduct a survey.

• The purpose of sampling is to reduce the cost and/or the amount of work that it would take to survey the entire target population.

• A survey that measures the entire target population is called a census .

Page 49: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Sampling

Page 50: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Two Kinds of Survey Samples

Non-Probability samples and

Probability samples

Page 51: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Sampling Methods

• Non-probability samples. We do not know the probability that each population element will be chosen, and/or we cannot be sure that each population element has a non -zero chance of being chosen .

• Probability samples. Each population element has a known (non -zero) chance of being chosen for the sample.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 52: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Non-Probability Sampling

Page 53: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Pros & cons of Non -Probability Sampling

• Advantages : convenience and cost. • Disadvantage : We cannot estimate the extent

to which sample statistics are likely to differ from population parameters.

• Only probability sampling methods permit that kind of analysis.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 54: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Two of the main types of non -probability sampling methods

• Voluntary sample. People who self-select into the survey. Often, these folks have a strong interest i n the main topic of the survey. E.g. those who call i n to talk show, or participate in an on-line poll. This would be a volunteer sample.

• Convenience sample. A convenience sample is made up of people who are easy to reach. E.g. interviewi ng my students or my employees or shoppers at a local mall, If the group or the location was chosen because it was a convenient this would be a convenience sample.

• Note: Neither allows generalization to the population.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 55: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Non-probability Sample Surveys

• Surveys that are not based on probability sampling have no way of measuring their bias or sampling error.

• Surveys based on non -probability samples are not externally valid. You cannot generalize from them to the general population. They can only be said to be representative of the people that have actually completed the survey.

Page 56: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Non-Probability Samples

• The relationship between the target population and the survey sample is immeasurable and potential bias is unknowable.

• Sophisticated users of non -probability survey samples tend to view the survey as an experimental condition, rather than a tool for population measurement

• Analysts examine the results for internally consistent relationships.

Page 57: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Examples Of Non -Probability Samples

• Judgment Samples: A researcher decides which population members to include in the sample based on his or her judgment. The researcher may provide some alternative justification for the representativeness of the sample.

• Snowball Samples: Often used when a target population is rare, members of the target population recruit other members of the population for the survey.

Page 58: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Examples Of Non -Probability Samples

• Quota Samples: The sample is designed to include a designated number of people with certain specified characteristics. For example, 100 coffee drinkers. This type of sampling is common in non -probability market research surveys.

• Convenience Samples: The sample is composed of whatever persons can be most easily accessed to fill out the survey.

Page 59: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Probability Sampling

Page 60: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Probability samples are the only ones whose results will be generalizable to the

entire population

Page 61: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Random Samples

Page 62: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Ronald Fisher (1890-1962)

Page 63: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Extract from table of random numbers

Page 64: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Main types of probability sampling

• Simple random sampling, • Stratified sampling, • Cluster sampling, • Multistage sampling, and • Systematic random sampling.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 65: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Probability Samples are representative

• The key benefit of all these probability sampling methods is that they guarantee that the sample chosen is representative of the population. This ensures that the statistical conclusions will be valid.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Hence the conclusions are generalizable

Page 66: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Simple Random sampling

• The population consists of N objects.• The sample consists of n objects.• If all possible samples of n objects are

equally likely to occur, the sampling method is called simple random sampling.

• Selection is done by a lottery method or using a table of random number or a computerized random number generator.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 67: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Stratified Sampling

• Stratified sampling . The population is divided into groups, based on some characteristic.

• The groups are called strata. • Then, within each group, a probability sample

(often a simple random sample) is selected.

• As a example, suppose we conduct a national survey. We might divide the population into groups or strata, based on geography - north, east, south, and west. Then, within each stratum, we might randomly select survey respondents.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 68: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Cluster sampling

• Cluster sampling. With cluster sampling, every member of the population is assigned to one, and only one, group. Each group is called a cluster.

• A sample of clusters is chosen, using a probability method (often simple random sampling).

• Only individuals within sampled clusters are surveyed.

• E.g. select a sample of BA units, survey all the staff in these units.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 69: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Multistage sampling.

• Multistage sampling. With multistage sampling, we select a sample by using combinations of different sampling methods.

• For example, in Stage 1, we might use cluster sampling to choose clusters from a population. Then, in Stage 2, we might use simple random sampling to select a subset of elements from each chosen cluster for the final sample.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 70: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Systematic random sampling.

• Systematic random sampling . With systematic random sampling, we create a list of every member of the population. From the list, we randomly select the first sample element from the first k elements on the population list. Thereafter, we select every kth element on the list.

• This method is different from simple random sampling since every possible sample of n elements is not equally likely.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 71: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

How To Select A Probability Sample

Page 72: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

How to select a probability sample

Page 73: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Probability Sampling

• A probability-based survey sample is created by constructing a list of the target population, called the sample frame , a randomized process for selecting units from the sample frame, called a selection procedure , and a method of contacting selected units to and enabling them complete the survey, called a data collection method or mode.

Page 74: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Probability Sampling: Step 1

• Construct a Sample frame: A probability-based survey sample is created by constructing a list of the target population , called the sample frame.

• For some target populations this process may be easy, for example, sampling the employees of a company by using payroll list.

• However, in large, disorganized populations simply constructing a suitable sample frame is often a complex and expensive task.

Page 75: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Probability Sampling: Step 2

• Selecting a sample from within the Sample frame:

• a randomized process for selecting units from the sample frame, called a selection procedure.

• Common methods of conducting a probability sample of the household population in the United States are Area Probability Sampling, Random Digit Dial telephone sampling, and more recently Address-Based Sampling.

Page 76: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Specialized Techniques Of Probability Sampling

• Within probability sampling there are specialized techniques such as:– stratified sampling &– cluster sampling

• These techniques improve the precision or efficiency of the sampling process without altering the fundamental principles of probability sampling.

Page 77: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Probability Sampling: Step 3

• Collecting the Data:• There must be a method of contacting

selected units to and enabling them complete the survey, called a data collection method or mode.

Page 78: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Sources Of Bias

Page 79: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Major Types of Bias In Surveys

• Non-response bias

• Coverage bias

• Selection bias

Page 80: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Major Types of Bias In Surveys

• Non-response bias

• Coverage bias

• Selection bias

Page 81: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Major Types of Bias In Surveys

• Non-response bias: When individuals or households selected in the survey sample cannot or will not complete the survey there is the potential for bias to result from this non -response. Non -response bias occurs when the observed value deviates from the population parameter due to differences between respondents and non -respondents.

Page 82: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Major Types of Bias In Surveys

• Non-response bias

• Coverage bias

• Selection bias

Page 83: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Major Types of Bias In Surveys

• Coverage bias: Coverage bias can occur when population members do not appear in the sample frame (undercoverage). Coverage bias occurs when the observed value deviates from the population parameter due to differences between covered and non -covered units. Telephone surveys suffer from a well known source of coverage bias because they cannot include households without telephones.

Page 84: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Major Types of Bias In Surveys

• Non-response bias

• Coverage bias

• Selection bias

Page 85: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Major Types of Bias In Surveys

• Selection Bias: Selection bias occurs when some units have a differing probability of selection that is unaccounted for by the researcher. For example, some households have multiple phone numbers making them more likely to be selected in a telephone survey than households with only one phone number. This selection bias would be corrected by applying a survey weight equal to [1/(# of phone numbers)] to each household.

Page 86: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

But how you select your sample is only one of the issues in doing survey

research

Page 87: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Bias Due to Measurement Error

• In survey research, the measurement process includes the environment in which the survey is conducted, the way that questions are asked, and the state of the survey respondent.

• Response bias refers to the bias that results from problems in the measurement process. Some examples of response bias:

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 88: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Examples of Response Bias (Due to error in the Measurement process)

• Leading questions . The wording of the question may be loaded in some way to unduly favor one response over another. For example, a satisfaction survey may ask the respondent to indicate where she is satisfied, dissatisfied, or very dissatisfied.

• By giving the respondent one response option to express satisfaction and two response options to express dissatisfaction, this survey question is biased toward getting a dissatisfied response.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 89: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Examples of Response Bias – Cont’d (Due to error in the Measurement process)

• Social desirability . Most people like to present themselves in a favorable light, so they will be reluctant to admit to unsavory attitudes or illegal activities in a survey, particularly if survey results are not confidential. Instead, their responses may be biased toward what they believe is socially desirable.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 90: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Sampling Statistic and Sampling Error

• A survey produces a sample statistic , which is used to estimate a population parameter. If you repeated a survey many times, using different samples each time, you might get a different sample statistic wi th each replication. And each of the different sample statistics would be an estimate for the same population parameter.

• If the statistic is unbiased, the average of all th e statistics from all possible samples will equal the true population parameter; even though any individual statistic may differ from the population parameter. The variability among statistics from different samples is called sampling error .

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 91: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Increasing The Sample size:Reduces Sampling Error but NOT Survey Bias

• Increasing the sample size tends to reduce the sampling error ; that is, it makes the sample statistic less variable. However, increasing sample size does not affect survey bias.

• A large sample size cannot correct for the methodological problems (undercoverage, nonresponse bias, etc.) that produce survey bias.

• Example: The Literary Digest Survey sample size was very large - over 2 million surveys were completed; but the large sample size could not overcome problems with the sample - undercoverage and nonresponse bias.

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 92: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

The Null Hypothesis & Types of Error

Page 93: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

To analyze survey data and arrive at a conclusion, we need to formulate a

Null Hypothesis

Page 94: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Null Hypothesis

• It is usually a statement that can be falsified and whose acceptance or rejection yields a useful insight into the problem being studied and for which the data was collected.

• The null hypothesis is a hypothesis which the researcher tries to disprove, reject or nullify.

• It is symbolized by H 0

Page 95: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Ronald Fisher (1890-1962)

The first to formalize the notion of the “Null Hypothesis”

Page 96: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

How do you state your basic (null) Hypothesis?

Usually: the normal state

(don’t worry, no effect, no change)Or:

there is no difference between expected and observed (i.e. difference is due to

chance only)

Page 97: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

How do you state your basic (null) Hypothesis?

Usually: the normal state

(don’t worry, no effect, no change)Or:

there is no difference between expected and observed (i.e. difference is due to

chance only)

Page 98: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

One-tailed or Two -tailed Tests

• One-Tailed :

• Two Tailed:

Accept H0

Reject H0

Reject H0 Reject H0

Accept H0

Page 99: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Usually:No directionality: use two -tailed test

Directionality: use one -tailed test

Page 100: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

The Null Hypothesis identifies which kind of test is needed: One tailed or two -tailed

• In classical science, it is most typically the H0 statement that there is no effect of a particular treatment; in observations, it is typically that there is no difference between the value of a particular measured variable and that of a prediction, or between two means. We use a two -tailed test

• But when there is Directionality , i.e. when we say that it is better than, bigger than or less than, we use a One-Tailed Test.

Page 101: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

BUT:In Accepting or rejecting the Null Hypothesis

we could be making Two different types of error

Page 102: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Type I error:(False Positive)

• Test says: This person is healthy Reality: This person has cancer

• Test says: This person is not guilty• Reality: This person is guilty

• Test Says: This product is faulty• Reality: This product is good

Page 103: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Type II error:(False Negative)

• Test says: This person has cancer• Reality: This person is healthy

• Test says: This person is guilty• Reality: This person is not guilty

• Test Says: This product is good• Reality: This product is faulty

Page 104: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Type I & Type II Error

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Page 105: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Two other kinds of error:

• In 1948, Frederick Mosteller (1916-2006) Type III error: "correctly rejecting the null hypothesis for the wrong reason". (1948, p.61)

Page 106: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Two other kinds of error:

• In 1970, Marascuilo and Levin proposed a "fourth kind of error" -- a " Type IV error" –defined as being the mistake of "the incorrect interpretation of a correctly rejected hypothesis";

• which, they suggested, was the equivalent of "a physician's correct diagnosis of an ailment followed by the prescription of a wrong medicine" (1970, p.398).

Page 107: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Other risks of error:This is in addition to many other risks:• Correctly specifying the problem• Sampling design• Experimental or quasi-experimental

designs• Correctly understanding the kind of data

and its limitations• Correctly specifying the type of statistical

analysis• Correctly interpreting the results

Page 108: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Calculation & Conclusions

Page 109: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Conclusion of the statistical analysis is to accept/reject the Null Hypothesis

Page 110: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Type I & Type II Error

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Page 111: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Type I & Type II Errors

Source: http://stattrek.com/statistics/data-collect ion-methods.aspx?Tutorial=AP

Page 112: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

More samples means more accurate estimation of the population parameter

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Page 113: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

How to refer to significance level of a test(all these statements are equivalent)

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

You should be familiar with these expressions

Page 114: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Tips to Help Avoid Common Mistakes

• Remember to convert between variance and standard deviation.

• Check if hypothesis is one- or two -tailed .For two -tailed, split α to � �⁄ .

• Always use n - 1 degrees of freedom for one sample t-test.

• Keep statistics ( �̅, s) distinct from population parameters ( �, α).

Page 115: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Choosing the significance level for a test

• Remember: the smaller the significance level p ( sa y 0.01 rather than 0.05), the more stringent the test .

• Choose the level based on:– Sample size– Estimated size of the effect being tested– Consequences of making a mistake

• Common Significance levels:– .05 (1 chance in 20); – .01 (1 chance in a hundred) or – .001 (1 chance in a thousand )

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001688

Page 116: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Choosing the significance level for a test

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

• Remember: the smaller the significance level p ( say 0.01 rather than 0.05), the more stringent the test.

• Choose the level based on:– Sample size– Estimated size of the effect being tested– Consequences of making a mistake

• Common Significance levels:– .05 (1 chance in 20); – .01 (1 chance in a hundred) or – .001 (1 chance in a thousand)

Page 117: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Common Mistakes

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Page 118: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Lets take a few simple examples of a calculation

Page 119: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Remember: the normal (Gaussian) distribution, the Bell Curve…

It has a mean, and a standard deviation.

Page 120: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

The standard deviation defines how “spread out” the distribution is:

Page 121: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Remember:The sample statistic (measured)

is only an estimate for the Population parameter (inferred)

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Page 122: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Common Statistical Notation

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Page 123: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Numerical Measures (Formulae)

Mean: �� = ∑ � =� � ⋯ ��

Variance: s2 = ∑ � �

��� = � ∑ �� ∑ �

� ���Standard Error of the Mean: � = �

�Median : the middle value of ordered valuesNth percentile : the value such that N% of ordered values lie below it

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001696

Page 124: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Assume that we have the mean of a distribution. We need to find the

standard deviation (or its square: the variance)

Page 125: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

The Variance is the square of the Standard Deviation

Page 126: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Calculating the Variance and the standard deviation

• The formula for calculating the variance:

�� = ∑ − � �

� − �

• The Standard deviationis given by:

� = ��

699

Page 127: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Example: calculating Variance and Standard Deviation

For example, using these six measures 3,9,1,2,5 and 4:

∑� = 3 + 9 + 1 + 2 + 5 + 4 = 24

∑�� = 3� + 9� + 1� + 2� + 5� + 4�

= 9 + 81 + 1 + 4 + 25 + 16 = 136

The quantities are the substituted into the shortcut formulate to find ∑ − � �.

∑ � − �̅ � = ∑�� −∑� �

= 136 −24 �

6Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

700

Page 128: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Example: calculating Variance and Standard Deviation

= �!" − #$"" = %&The variance and standard deviation are now found as before:

�� = ∑ − � �

� − �=%&

#= '

� = �� = ' = �. '�'

Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001701

Page 129: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

We will say more about the standard deviation and the

variance in a moment

Page 130: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Understanding What Is Behind A Formula

Page 131: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Clear thinking about statistics: understanding what is behind the

formula

Page 132: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

.• I want you to understand the logic behind a formula. You do not need to memorize any formula. You do that by asking questions….

• For example, let’s look at the formula for computing the sample variance:

• Let’s ask why this? and why that?

)� = �* − �+ , − � �

*

,-�

705

Page 133: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why do we square the deviations from the mean?

.� = 1/ − 1+ �0 − �̅ �

1

0-2

706

Page 134: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why do we square the deviations from the mean?

• Because, if we add up all deviations, we get always zero value.

• So, to deal with this problem, we square the deviations.

• Bonus: Notice that squaring also magnifies the deviations; therefore it helps us better feel the spread of the data.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

707

Page 135: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why do we square the deviations from the mean?

• Because, if we add up all deviations, we get always zero value.

• So, to deal with this problem, we square the deviations.

• Bonus: Notice that squaring also magnifies the deviations; therefore it helps us better feel the spread of the data.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

708

Page 136: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why do we square the deviations from the mean?

• Because, if we add up all deviations, we get always zero value.

• So, to deal with this problem, we square the deviations.

• Bonus: Notice that squaring also magnifies the deviations; therefore it helps us better feel the spread of the data.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

709

Page 137: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why not raise to the power of four (three will not work)?

.� = 1/ − 1+ �0 − �̅ �

1

0-2

710

Page 138: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why not raise to the power of four (three will not work)?

• Squaring does the trick; why should we make life more complicated than it is?

.� = 1/ − 1+ �0 − �̅ �

1

0-2

711

Page 139: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why is there a summation notation in the formula?

.� = 1/ − 1+ �0 − �̅ �

1

0-2

712

Page 140: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why is there a summation notation in the formula?

• To add up the squared deviation of each data point to compute the total sum of squared deviations.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

713

Page 141: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why do we divide the sum of squares by n -1.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

714

Page 142: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why do we divide the sum of squares by n -1.

• The amount of deviation should reflect also how large the sample is; so we must bring in the sample size.

• Why? Because, in general, larger sample sizes have larger sum of square deviation from the mean.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

715

Page 143: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why do we divide the sum of squares by n -1.

• The amount of deviation should reflect also how large the sample is; so we must bring in the sample size.

• Why? Because, in general, larger sample sizes have larger sum of square deviation from the mean.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

716

Page 144: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why divide by n-1 not n?

.� = 1/ − 1+ �0 − �̅ �

1

0-2

717

Page 145: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why divide by n-1 not n?

• When you divide by n -1, the sample's variance provides an estimated variance much closer to the population variance, than when you divide by n.

• But for larger samples, (say over 30), it really does not matter whether it is divided by n or n-1. The results are almost the same, and they are acceptable.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

718

Page 146: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Why divide by n-1 not n?

• When you divide by n -1, the sample's variance provides an estimated variance much closer to the population variance, than when you divide by n.

• But for larger samples, (say over 30), it really does not matter whether it is divided by n or n-1. The results are almost the same, and they are acceptable.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

719

Page 147: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Does N-1 have a Meaning?

.� = 1/ − 1+ �0 − �̅ �

1

0-2

720

Page 148: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Does N-1 have a Meaning?

• The factor n -1 is what we consider as the "degrees of freedom" (but that is another discussion).

• Degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

721

Page 149: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Does N-1 have a Meaning?

• The factor n -1 is what we consider as the "degrees of freedom" (but that is another discussion).

• Degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

722

Page 150: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Explain number of values that are allowed to vary

.� = 1/ − 1+ �0 − �̅ �

1

0-2

723

Page 151: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Explain number of values that are allowed to vary

• For example, if we have two observations, when calculating the mean we have two independent observations;

• however, when calculating the variance, we have only one independent observation, since the two observations are equally distant from the mean.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

724

Page 152: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Explain number of values that are allowed to vary

• For example, if we have two observations, when calculating the mean we have two independent observations;

• however, when calculating the variance, we have only one independent observation, since the two observations are equally distant from the mean.

.� = 1/ − 1+ �0 − �̅ �

1

0-2

725

Page 153: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Degrees of Freedom• The number of independent pieces of

information that go into the estimate of a parameter is called the degrees of freedom (df).

• So for calculating the mean of the sample, we have all the observations in the sample size (n).

• But to calculate the distance from the mean, you have one less. Why?

• If you have two observations, they will be both at the same distance from the mean .

Page 154: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

This example shows how to question statistical formulas.

To help you understand them rather than memorizing them.

Then you can use the concepts better.

Page 155: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Clear thinking is always more important than the ability to

calculate something.

Page 156: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Clear Thinking

Page 157: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Social surveys

• Framing the Issues• Identifying the target population• Sample Frame and Sample design• Instrument design• Gathering data• Analyzing data• Interpreting Results

Page 158: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

That is done within the framework of a research design

Page 159: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Applications

• Market research • Opinion poll• Voting expectations• Educational or Health studies• Sociological studies• Medical clinical studies

And so much more…

Page 160: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Examples of US/UK Major surveys

• National Election Studies • Gallup poll • General Social Survey • International Social Survey • United Kingdom Census • United States Census • National Health and Nutrition

Examination Survey • World Values Survey

Page 161: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Again:Clear thinking is always more

important than the ability to calculate something.

Page 162: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

So, One More Time…

Page 163: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

With Clear thinking you will not be a turkey…

Page 164: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

You will learn to fly…

Page 165: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Some will even soar like an eagle

Page 166: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at

Thank You

Page 167: Lecture -- 5 -- Start - pitt.edusuper7/51011-52001/51431.pdf · TEST • Which of the ... Source: . ... Table extracted from 'The Gallup Poll Monthly'. Cited at