info 515lecture #31 action research descriptive statistics and surveys info 515 glenn booker

64
INFO 515 Lecture #3 1 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

Upload: randolph-rich

Post on 11-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 1

Action ResearchDescriptive Statistics and

Surveys

INFO 515Glenn Booker

Page 2: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 2

Reliability and Validity A measure is reliable if it consistently

gives the same answer A key to scientific measurement is the ability

to repeat an experiment reliably A measure is valid if it actually measures

the concept under investigation It tests what you think it tests

Page 3: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 3

Review Std Deviation and CV Standard Deviation can be used to

compare two (or more) groups that have the same units of measure and similar means

Coefficient of Variation can compare two (or more) groups, which have different reference points (means) and different standard deviations See which groups are more closely distributed

around their mean

Page 4: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 4

Z Score The Z Score is the ‘how weird am I’

measure for a given data point* The standardized or ‘z’ score allows you

to do either of the following: Find where one or more individuals stand in

reference to the mean of a single distribution on one unit of measure (one variable)

Where is an individual located relative to a distribution of test scores?

Am I better than average? If so, how much?

* This is not an official ISO definition…

Page 5: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 5

Z Score Find where one or more individuals stand in

reference to the mean of two (or more) different distributions that may have different units of measure

Where does an individual stand relative to two tests, each given in a different class (with different distributions)?

Did I do better on the midterm in philosophy than the one in geography?

Page 6: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 6

Z Score A z score tells you how far above or below

the mean any given score is in standard deviation units

Z scores are most useful when the shape of your actual distribution of scores is nearly normal (see slide 9, or Action Research handout p. 11)

What’s the “normal” distribution?

Page 7: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 7

Normal Distribution Example Consider stopping a car at a traffic light You don’t stop exactly the same place

each time, but generally stop somewhere behind or near the big white line (I hope!)

Describing where you are likely to stop might be described by a “normal distribution”

Page 8: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 8

Normal Distribution The normal, or Gaussian, distribution is the

classic “bell curve” which shows that most measurements are somewhere close to the mean, but a few measurements could range far above or below that mean

It is symmetric, and extends forever above and below the mean

Page 9: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 9

Normal Distribution The normal distribution is described by

two math functions The function f(x) is the probability density

function, often called a PDF; it represents how likely the answer is to fall near the current value of x

The function F(x) is the cumulative probability function; it represents the total chance of getting the current value of x or anything less

A.k.a. a cumulative density function, or CDF

Page 10: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 10

‘f(x)’ is the probability density function (the classic bell curve)‘F(x)’ is the cumulative probability function

Normal Distribution

0

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 3

X

f(x)

F(x)

Normal Distribution

Page 11: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 11

Probability Density Function, f(x)

The chance you will stop (the event will occur) between any two distances ‘a’ and ‘b’ is the area under the curve f(x) between those two values

Normal Distribution

0

0.1

0.2

0.3

0.4

0.5

-3 -2 -1 0 1 2 3

X

f(x)

a b

Page 12: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 12

Probability Density Function, f(x) Notice that f(x) is symmetric from left to

right, and that it is defined for all possible values of x (x = negative infinity to x = positive infinity) f(x) never reaches zero!

The total area under the curve f(x) is one You will eventually stop somewhere

Unfortunately, f(x) is a messy function to integrate (find the area under it)

Page 13: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 13

Cumulative Probability Function F(x) Imagine you start at x equals minus

infinity (x = -) Then add up the area under f(x) from

minus infinity to the current value of x This is the cumulative probability function, F(x)

That’s why F(0) (F at x=0) is exactly 0.5 Half of all events occur left of x=0, and half

occur to the right of x=0 (symmetry)

Page 14: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 14

Cumulative Probability Function F(x) So to find the chance of getting a result

between values ‘a’ and ‘b’ is also given by:Probability = F(b) - F(a)

An analogy might be The number of babies born between 1940 (a)

and 1990 (b) is equal to the total number of babies ever born by 1990 (F(b)), minus the total number of babies ever born by 1940 (F(a))

Page 15: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 15

Standard (Z) Scores

Back to Z scores, our motivation for discussing the normal distribution

Z Scores are standardized scores whose distribution has the following properties: Retains the shape of the original scores, but Has a mean of 0 and Has a variance and standard deviation of 1

Page 16: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 16

Calculating Z scores Compute “z” score by subtracting the

mean from the raw score and dividing that result by the standard deviationz = (Xi - = (Score – Mean)/(Standard Dev)

The z score is not just associated with the normal distribution – it can be used with any kind of distribution

Page 17: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 17

Interpreting Z Scores The z score describes how many standard

deviations a specific score is above or below the mean A negative z score means that the score is

below the mean A positive z score is above the mean A z score of zero (z=0) is equal to the mean

Page 18: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 18

Z Score Example I own 250 books -- I want to know how I

compare to other college professors Suppose that the mean number of

books owned by college professors is 150 with a standard deviation of 50 z = (250 - 150) / 50 = 2

My z score is 2; meaning I have 2 standard deviations more books than average (‘cuz I’m a pack rat!)

Page 19: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 19

Z Score Tables Are used to determine the proportion of

the area under the curve that lies between the mean and a given standard score (z)

These tables are prepared using integral calculus to save you time

They show only positive ‘z’ values, since the areas for negative ‘z’ are the same as for positive ‘z’ (thanks to symmetry)

Page 20: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 20

Z Score Tables (Yonker p. 29-30)

Normal Distribution

0

0.1

0.2

0.3

0.4

0.5

-3 -2 -1 0 1 2 3

X

f(x)

z value(Col. A)

Area between 0 and z(Col. B)

Area beyond z(Col. C)

Notice that we always haveCol. B + Col. C = 0.5000

Page 21: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 21

Use of Z Score Tables Z score tables can be used to find the

chance of a measurement (or percentage of cases) occurring between any two z values If the z scores are on opposite sides of the

mean (one positive, one negative), add the areas from Column B for each score

If the z scores are on the same side of the mean (both positive, or both negative), subtract the areas from Column B

Subtract the larger area from the smaller area; otherwise you’d get negative area!

Page 22: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 22

Use of Z Score Table Examples Between z scores of -1.5 and +2.2, the

percent of cases is, from Column B:z(-1.5) is the same area as z(+1.5)z(+1.5) = 0.4332 and z(+2.2) = 0.4861Percent = 43.32 + 48.61 = 91.93%

Between z scores of +1.5 and +2.2, the percent of cases is:Percent = 48.61 – 43.32 = 5.29%

Page 23: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 23

Normal Distribution

0

0.1

0.2

0.3

0.4

0.5

-3 -2 -1 0 1 2 3

X

f(x)

34.13% 13.59% 2.14% 0.13%34.13%13.59%2.14%0.13%

From p. 11 in Yonker

Percentages shown are the total percent between the integer Z score values; between 0 and 1 has 34.13%, between 1 and 2 has 13.59%,

etc.

Cumulative Z Score

Page 24: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 24

F(x) Values For F(x) from minus 6 to plus 6, a

distribution with mean =0 and standard deviation of 1.0 gives:

Z CDF delta CDF from next value-6 0.000000000987 0.000000286-5 0.000000286652 0.000031385-4 0.000031671242 0.001318227-3 0.001349898032 0.021400234-2 0.022750131948 0.135905122-1 0.158655253931 0.3413447460 0.5000000000001 0.8413447460692 0.9772498680523 0.9986501019684 0.9999683287585 0.9999997133486 0.999999999013

Page 25: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 25

Cumulative Z Score Key values are:

From z = -1 to +1, total area is 68.26% From z = -1.96 to +1.96, total area is 95% From z = -2 to +2, total area is 95.44% From z = -2.57 to +2.57, total area is 99% From z = -3 to +3, total area is 99.74%

Page 26: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 26

Transformed z, or T scores A.k.a. Standardized scores or “T” scores Z scores are transformed artificially

Multiply a z score by the desired standard deviation and add the desired mean (e.g. 10 and 50) T = zbecomesz

Examples A z score of -1.5 would give a T score of

T = 10*(-1.5) + 50 = 35 A z of +2.2 would give T = 10*(2.2)+50 = 72

Page 27: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 27

T scores This is used in many fields of research,

especially Psychology and Education (that’s where the “desired” mean and standard deviation values came from)

Benefits: gets rid of negative connotations of negative and zero scores Only z scores below z = -5.0 would result in a

negative T score (typically less than one data point in a million)

Page 28: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 28

Level of Confidence Since the normal distribution goes to

positive and negative infinity, we need a way to limit the range of expected or likely values Or any normal distribution could have any

value some times Define the Level of Confidence as the

acceptable limits of predictable behavior Typically use 95% for most applications,

but 99% for medical research

Page 29: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 29

Level of Confidence Generally, we can say that the actual

value of a parameter estimate is in the range of its mean + twice its standard error, with a 95% level of confidence Use 1.96 instead of 2 for precise work

Thus the value of a parameter with mean of 6.2 and standard error of 1.9 lies between 2.4 (i.e., 6.2 – 2*1.9) and 10.0 (i.e., 6.2 + 2*1.9) with a 95% level of confidence

Page 30: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 30

The “t” Statistic The t-statistic is defined as

t = (parameter estimate) / (standard error) If |t| > 2, then the parameter estimate is

significantly different from zero at the 95% level of confidencet = 6.2/1.9 = 3.26

Hence because |3.26| > 2, this estimate is statistically significant

Also means the 95% confidence interval does not include zero

Again, use 1.96 instead of 2 for precise work

Page 31: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 31

The “t” Statistic T = ‘t’???? No! Notice that the T score is a completely

different concept from the ‘t’ statistic We’ll use the ‘t’ statistic to help judge

SPSS output later in the course

Page 32: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 32

Sampling Terms Population = the entire realm of interest,

everyone, all books, all publishers, all patrons, etc.

Sample = a subgroup or subset of the population Accurate inference requires good samples Use sample since often hard or impossible to

measure the entire population

Page 33: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 33

Sampling Terms Inferential Statistics

Taking samples in order to infer unknown population parameters

Principle of Random Selection A procedure by which each member of the

population has an equally likely chance of being chosen as any other member

Representative of the population

Page 34: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 34

Types of Samples Probabilistic sample - sampling in which

the probability of each element in the population being selected is known and can be specified Each element has the same chance

Non-probabilistic sample – each probability not known a priori (in advance) E.g. convenience samples, or available

samples

Page 35: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 35

Random Sampling Techniques Simple Random Stratified Random

Proportional Disproportional

Cluster Systematic

Page 36: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 36

Simple Random Sample Often can’t sample the entire user

population Must be a truly random sample, not

just convenient Can use random number table, or

computer-generated pseudo-random numbers (Yonker, p. 31) to choose the sample

Page 37: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 37

Stratified Random Sampling Group customers into categories (strata); get

simple random samples from each category (stratum). Can be very efficient method.

Can weigh each stratum equally (proportional s.s.) or unequally (disproportional s.s.) For unequal weight, make fraction ~ standard deviation

of stratum, and ~ 1/ square root (cost of sampling). F ~ /sqrt(cost)where “sqrt” is “square root”, “~” is ‘proportional to’

Page 38: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 38

Major # in Population % in Population # in SampleEducation 50 50% X 20 10

Soc./Beh. Sci. 30 30 6

Business 15 15 3

Sci./Tech 5 5 1

% = 50/100 X 100

Proportional Stratified Random Sampling

Data taken from Carpenter and Vasu, (1978)

Page 39: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 39

Cluster Sampling Divide population into (geographic)

clusters, then do simple random samples within each selected cluster

Try for representative clusters Not as efficient as simple random

sampling, but cheaper Sometimes used for in person interviews

Page 40: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 40

Cluster Sampling Example Randomly select n (certain number of)

census tracks From randomly selected census tracks,

randomly select n blocks From randomly selected blocks, randomly

select addresses Interview the family--unit of study

Page 41: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 41

Systematic Sampling Calculate your sampling interval:

Interval = Size of population / (Size of sample)

Select your first element at random from the sampling interval

Move ahead systematically by the sampling interval (e.g. every 10th customer) until you reach your desired sample size

Page 42: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 42

Non-random Sampling Techniques Quota Accidental Judgment

Page 43: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 43

Non-random techniques Quota sampling

Is economical Is a non-random version of stratified sampling Define desired characteristics in advance:

gender, race, age, etc. Example: Interview 20 females and 20 males

over the age of 65

Page 44: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 44

Non-random techniques Accidental sampling

Mall market studies, Internet surveys Often requires a choice (by the interviewee) to

be sampled Judgment sampling

Pick people who have some special knowledge Seek out experts – more of an interview

method

Page 45: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 45

What is a Survey Study (Assessment)? To describe systematically the facts and

characteristics of a given population or area of interest, factually and accurately. (Isacc and Michael)

Survey studies are used to: Describe what is Establish need Identify problems Infer possible solutions

Page 46: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 46

Surveys A survey often refers to a large data

collection effort: What it involves—personal interviews,

telephone interviews, a questionnaire sent through the mail, document survey, literature survey, social area analysis (observation and description of different areas of the city)

“Who” it involves—community, customers, users, employees, literature

Purpose—information gathering and fact finding to Describe what exists (such as public library services) Establish need, Identify problems, Imply possible solutions

Page 47: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 47

Customer Satisfaction Surveys Could have many opportunities to

conduct surveys Customer call-back after x days Customer complaints Direct customer visits Customer user groups Conferences

Page 48: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 48

Customer Satisfaction Surveys Want representative sample of all

customers Three main methods are used

Personal interview Telephone interview Questionnaire by mail

Page 49: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 49

Personal Interview Advantages:

1. Explore complex issues2. Question clarification3. Rapport4. Higher response rate5. Observation

Page 50: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 50

Personal Interview Disadvantages:

1. Interviewer bias2. Question uniformity3. No anonymity4. Difficult to analyze5. Time consuming

Page 51: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 51

Telephone Interview Advantages:

1. Some anonymity2. Low cost3. Rapid completion4. Higher response rate5. No travel time6. Widely spread sample

Page 52: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 52

Telephone Interview Disadvantages:

1. Reaching people2. Some interview bias possible3. Only accessible phone numbers4. No observation

Page 53: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 53

Structured vs. Unstructured Interviews In an unstructured interview, only the

first question is standard for all respondents The remaining questions are determined by

the answers of each respondent In a semi-structured interview, the

questions are open ended, but all of the respondents receive the same questions

Page 54: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 54

Questionnaire by Mail Advantages:

1. Economical2. Faster3. Wide range of issues4. Widely spread sample5. Avoids interviewer bias6. Anonymity

Page 55: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 55

Questionnaire by Mail Disadvantages:

1. Question clarity2. No probing3. Who is answering?4. No observation5. Response rate

Page 56: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 56

Interview & Questionnaire Tips1. Start with easy questions that the respondent will

enjoy answering You want to prevent boredom early on while building

rapport and putting the respondent at ease2. Try for an easy and natural flow over topics

Place like items together and give a brief explanation when a topic breaks

3. Within topics, go from the general to the specific For example, start with questions on use of the Internet

in general, then move on to specific questions about the use of search engines

Page 57: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 57

Interview & Questionnaire Tips4. Put open-ended or difficult questions

(if any) at the end of the interview or questionnaire

5. Put questions on “sensitive” matters (such as age or income) at the end of the interview or questionnaire Otherwise, the interview may be over before

it has started!

Page 58: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 58

The “Question Continuum” Closed Questions

Fixed Alternatives Structured “Your annual income is: a) 0-25K, b) 26-35K, ”

Semi Structured Questions Open Questions

Free form responses Unstructured “What do you like about Drexel?”

Page 59: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 59

Sample Size How big is enough? Must choose:

Confidence level (80 - 95%, to get Z) Margin of error (B = 3 - 5%)

For simple random sample, also need Estimated satisfaction level (p), which is what you’re

trying to measure, and Total population size (N = total number of customers)

Page 60: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 60

Critical Z valuesConfidence Level (2-sided) critical Z

80% 1.28

90% 1.645

95% 1.96

99% 2.57

Page 61: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 61

Sample Size Sample size, n

n = [N*Z2*p*(1-p)] [N*B2 + Z2*p*(1-p)]

The sample size depends heavily on the answer we want to obtain, the actual level of customer satisfaction (p)!

Page 62: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 62

Sample Size If we choose

80% confidence level, then Z = 1.28 5% margin of error, then B = 5% = 0.05 and expect 90% satisfaction, then p = 0.90

n = (N*1.28^2*0.9*0.1)/ (N*0.05^2 + 1.28^2*0.9*0.1)

n = 0.1475*N/(0.0025*N + 0.1475)

Page 63: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 63

Sample Size

Given:Z 1.28p 0.9B 0.05

Hence:Z^2 1.6384p(1-p) 0.09B^2 0.0025

Find:N n

10 8.55035520 14.9355850 27.06052

100 37.09996200 45.54935500 52.75873

1000 55.6972410000 58.63655

100000 58.947631000000 58.97892Infinity 58.9824

<- Beware of sampling small populations!

For very large N, sample size stabilizes

Page 64: INFO 515Lecture #31 Action Research Descriptive Statistics and Surveys INFO 515 Glenn Booker

INFO 515 Lecture #3 64

Sample Size If don’t know customer satisfaction value

‘p’, use 0.5 as worst-case estimate Once the real value of ‘p’ is known, solve

for the actual value of B (margin of error) Key is finding a truly representative

sample For N approaching infinity, sample size

simplifies to:n = p*(1-p)*(Z/B)2