where do you get what you are telling us?. sources of data where do you get the data? and from whom...

Where do you get what you are telling us?

Sources of Data

where do you get the data? And from whom should the data be collected?

Clearly, your data should come from the participants that are both available to you and relevant to the question you are studying

there are times when we aren't very concerned about generalizing.

we're just evaluating a program in a local agency and we don't care whether the program would work with other people in other places and at other times.

In that case, sampling and generalizing might not be of interest

"Who do you want to generalize to?" Or should it be: "To whom do you want to generalize?"

In most applied social research, we are interested in generalizing to specific groups.

The group you wish to generalize to is often called the population

This is the group you would like to sample from because this is the group you are interested in generalizing to

Some examples of population

All secondary school principals in Malaysia All primary school counselors in the state of Sabah All students attending Kolej Tunku Kursiah during

the academic year 2004-2005 All students in Mrs. Amin form two at SMKA

Let's imagine that you wish to generalize to urban homeless males between the ages of 30 and 50 in Malaysia.

If that is the population of interest, you are likely to have a very hard time developing a reasonable sampling plan.

You are probably not going to find an accurate listing of this population, and even if you did, you would almost certainly not be able to mount a national sample across hundreds of urban areas.

So we probably should make a distinction between the population you would like to generalize to, [theoretical population or target population ]

The population that will be accessible to you [accessible population].

In this example, the accessible population might be homeless males between the ages of 30 and 50 in six selected urban areas in Malaysia

A population can be defined as any set of persons/subjects having a common observable characteristic. It is the group from which you were able to randomly sample .

The target population is the group to which the researcher would like to generalize his or her results. This defined population has at least one characteristic that differentiates from other groups.

The accessible population is the population to which the researcher has access.

Once you've identified the theoretical and accessible populations, you have to do one more thing before you can actually draw a sample -- you have to get a list of the members of the accessible population.

The listing of the accessible population from which you'll draw your sample is called the sampling frame.

If you were doing a phone survey and selecting names from the telephone book, the book would be your sampling frame.

That wouldn't be a great way to sample because significant subportions of the population either don't have a phone or have moved in or out of the area since the last book was printed.

Finally, you actually draw your sample (using one of the many sampling procedures). The sample is the group of people who you select to be in your study.

Sampling refers to drawing a sample (a subset) from a population (the full set). is the act, process, or technique of selecting a suitable sample, or a representative part of a population for the purpose of determining parameters or characteristics of the whole population.

Samples are measured in order to make generalisations about populations. Ideally, samples are selected, usually by some random process, so that they represent the population of interest.

Target population All first and second graders in Malaysia

Accessible population All first and second graders in Selangor

Sample 384 first and second graders in the state of Selangor

Topic if investigatiion: The effect of computer assisted instruction on The reading achievement of first and second graders in Malaysia

The usual goal in sampling is to produce a representative sample (i.e., a sample that is similar to the population on all characteristics, except that it includes fewer people because it is a sample rather than the complete population).

In other words, a representative sample is a "mirror image" of the population from which it was selected.

Why Sample?

First, it is usually too costly to test the entire population

The second reason to sample is that it may be impossible to test the entire population

The third reason to sample is that testing the entire population often produces error. Thus, sampling may be more accurate.

Why Sample?

The final reason to sample is that testing may be destructive.

[you probably would not want to buy a car that had the door slammed five hundred or a thousand times or had been crash tested. Rather, you probably would want to purchase the car that did not make it into either of those samples]

Up to here by six 27/08/05

How important is sampling?

Sampling is important in regards to external validity.

What is external validity? The extent to which the result of the study can be

generalized. Two types : population and ecological

generalizibility

Next lecture begins here

Population generalizability:

The degree to which the sample represent the population

Look at the usefulness of the study > small and narrowly defined groups: findings not useful

That is why representativeness is important. We want to make the result of the study to be widely applicable as possible.

You must take appropriate action to make sure the findings are generalized to the entire population.

Ecological generalizability

Refers to the degree the result of the study can be extended to other settings.

Example: result from urban school may not be true for students from rural schools

What we can do here is to describe in detail the nature of the environment, setting under which the study takes place.

You can’t generalized the effectiveness of a method of teaching mathematics to the effectiveness of the methods for all subjects.

Caution: even with the application of powerful technique of random sampling, it is quite difficult to overcome the problem of ecological gerenalizibility.

Procedure for Drawing a Sample

1. Define the population. Who is the population for each project? –e.g., residents of bandar Kajang or around Bandar Kajang. Remember, the population is the group you want to infer to from the sample - define it carefully so it is clear who is in, and who is out.

2. Identify the sampling frame: the list of elements from which the sample may be drawn. –It is sometimes referred to as the working population. –e.g., to sample teachers, my sampling frame might be a list from the The Education Department of Hulu Langat District

3. Select a sampling procedure

DEfine the population and sample clearly, why?

For those interested to determine the generalizibility of the findings

Not only define the population and sample, sampling process has to be clearly defined too.

(this one of the common weaknesses in research)

In non-experimental research, you investigate relationships among variables in some pre-defined population.

Typically, you take elaborate precautions to ensure that you have achieved a representative sample of that population;

You define your population, then do your best to randomly sample from it.

The two main types of sampling in quantitative research:

random sampling [probability ] nonrandom sampling. [nonprobability ]

The former produces representative samples.

The latter does not produce representative samples.

In probability samples, each member of the population has a known probability of being selected.

Elements are drawn by chance procedures

Probability methods include random sampling, systematic sampling, and stratified sampling.

In nonprobability sampling, members are selected from the population in some nonrandom manner. These include convenience sampling, judgment sampling, quota sampling, and snowball sampling.

Probability-based (random) samples:

These samples are based on probability theory. Every unit of the population of interest must be identified, and all units must have a known, non-zero chance of being selected into the sample. Every member of the population has an equal chance of being selected

(those selected and those who are not are similar to one other). The idea here is representativeness.

How sure are we? That is why it has to be random and sufficiently large!!! should have no bias. The researcher cannot consciuosly or unconsciously influence who will be selected

The advantage of probability sampling is that sampling error can be calculated.

Sampling error is the degree to which a sample might differ from the population. It the difference between population parameter and sample statistics (you can’t run away from sampling error unless

you do census)

When inferring to the population, results are reported plus or minus the sampling error.

In nonprobability sampling, the degree to which the sample differs from the population remains unknown.

Random sampling is the purest form of probability sampling. Each member of the population has an equal and known chance of being selected.

When there are very large populations, it is often difficult or impossible to identify every member of the population, so the pool of available subjects becomes biased

RANDOM = each element of the population has an equal chance of inclusion in the sample.

Begin with a SAMPLING FRAME = a list of every element in the population.

Random Sampling Techniques Simple random sampling

The first type of random sampling is called simple random sampling.

It's the most basic type of random sampling.

It is an equal probability sampling method (EPSM).

EPSEM means "everyone in the sampling frame has an e qual chance of being in the final sample."

EPSEM is important because that is what produces "representative" samples (i.e., samples that represent the populations from which they were selected)!

Simple random sample:

Each unit in the population is identified, and each unit has an equal chance of being in the sample. The selection of each unit is independent of the selection of every other unit. Selection of one unit does not affect the chances of any other unit.

AAE

SIMPLE RANDOM

Sampling experts recommend random sampling "without replacement" rather than random sampling "with replacement" because the former is a little more efficient in producing representative samples (i.e., it requires slightly fewer people and is therefore a little cheaper).

Advantages of the SRS method of sampling:

Assures good representativeness of sample (particularly if large).

allows us to make generalizations/inferences. In fact, most of the statistical stuff we'll do later assumes that we've actually done a simple random sample, even if we haven't.

avoids biases that are possible in some of the other methods we'll talk about.

Disadvantages of SRS method:

Have to have a list/sampling frame. Have to number the list. both are hard to do when the population is

large.

How do you draw a simple random sample?" One way is to put all the names from your population into a hat

and then select a subset (e.g., pull out 100 names from the hat).

Researchers typically use a computer program that randomly selects their samples. One program is available at the following address: http://www.randomizer.org/form.htm .

Can use excel to generate random numbers. You need as many randomly generated numbers as elements in your sample (n).

To use a computer program (sometimes called a random number generator) you must make sure that you give each of the people in your population a number. Then the program gives you a list of randomly selected numbers. Then you identify the people with those randomly selected numbers and try to get them to participate in your research study

Researchers often use a table of random numbers.

You pick a place to start, and then move in one direction (e.g., move down the columns).

Use the number of digits in the table that is appropriate for your population size (e.g., if there are 2500 people in the population then use 4 digits).

Once you get the set of randomly selected numbers, find out who those people are and try to get them to participate in your research study.

For example, to select a sample of 25 people who live in your college dorm,

make a list of all the 250 people who live in the dorm.

Assign each person a unique number, between 1 and 250. T

Then refer to a table of random numbers.

Starting at any point in the table, read across or down and note every number that falls between 1 and 250.

Use the numbers you have found to pull the names from the list that correspond to the 25 numbers you found. These 25 people are your sample. This is called the table of random numbers method.

Kita perlu ada frem sampel yang lengkap bagi membolehkan kaedah ini diamalkan.

Jika tak ada frem yang lengkap, apa nak buat? Kelemahan procedure ini perlukan sampling frem yang lengkap.

Best sampling procedure !!! dengan andaian tertentu.

The key to obtaining random sampel is to ensure that every member of the population has an equal and independence chance of being selected. So kita gunakan table of random numbers (more scientific)

How to use table of random numbers?

Say you have 300 sampel to be selected out of 3000 students.

Start anywhere on the table you have chosen (possibly secara random)

Mulakan membaca nombor 4 digit (why 4 digits > sebabnya the final number 3000 adalah empat digit]

Pilih nombor yang tidak melebihi 3000 sehinggalah bilangan sampel yang diperlukan mencukupi

What if you come across two similar number? Skip the later number and go to the nest number.

When selecting the number you can either go horizontally or downwards.

Do not use simpel random sampling if we wish certain subgroups to be in the sample.

1 4923 5013 4916 4951 5109 4993 5055 5080 4986 4974

2 4870 4956 5080 5097 5066 5034 4902 4974 5012 5009

3 5065 5014 5034 5057 4902 5061 4942 4946 4960 5019

4 5009 5053 4966 4891 5031 4895 5037 5062 5170 4886

5 5033 4982 5180 5074 4892 4992 5011 5005 4959 4872

6 4976 4993 4932 5039 4965 5034 4943 4932 5116 5070

7 5011 5152 4990 5047 4974 5107 4869 4925 5023 4902

8 5003 5092 5163 4936 5020 5069 4914 4943 4914 4946

9 4860 4899 5138 4959 5089 5047 5030 5039 5002 4937

10 4998 4957 4964 5124 4909 4995 5053 4946 4995 5059

11 4948 5048 5041 5077 5051 5004 5024 4886 4917 5004

12 4958 4993 5064 4987 5041 4984 4991 4987 5113 4882

13 4968 4961 5029 5038 5022 5023 5010 4988 4936 5025

14 5110 4923 5025 4975 5095 5051 5035 4962 4942 4882

15 5094 4962 4945 4891 5014 5002 5038 5023 5179 4852

16 4957 5035 5051 5021 5036 4927 5022 4988 4910 5053

17 5088 4989 5042 4948 4999 5028 5037 4893 5004 4972

18 4970 5034 4996 5008 5049 5016 4954 4989 4970 5014

19 4998 4981 4984 5107 4874 4980 5057 5020 4978 5021

20 4963 5013 5101 5084 4956 4972 5018 4971 5021 4901

Boleh tak kita memilih secara persampelan rawak mudah guru-guru di Malaysia?- rasionale?

Kalau tak mampu nak buat simple random sampling , do cluster random, stratified random or multi-stage random.

Kalau nak pastikan certain sub-group yang sememangnya mempunyai banyak berbezaan ciri to be included, use stratified random sampling

Systematic samplingIs often used instead of random sampling.

It is also called an Nth name selection technique. After the required sample size has been calculated, every Nth record is selected from a list of population members.

As long as the list does not contain any hidden order, this sampling method is as good as the random sampling method.

Its only advantage over the random sampling technique is simplicity. Systematic sampling is frequently used to select a specified number of records from a computer file.

Advantages of Systematic Sampling : Easier to do than SRS. You don't have to keep running

back to the random number generator.

Disadvantages of Systematic Sampling: Still need a list/sampling frame that is numbered. Might run into periodicity problem. If the list happened to be

arranged by class (1,2,3,4…), you might end up picking all first years. Have to make sure the list is not so structured.

Systematic sampling

Systematic sampling involves three steps:

First, determine the sampling interval, which is symbolized by "k," (it is the population size divided by the desired sample size).

Second, randomly select a number between 1 and k, and include that person in your sample.

Third, also include each kth element in your sample. For example if k is 10 and your randomly selected number between 1 and 10 was 5, then you will select persons 5, 15, 25, 35, 45, etc. When you get to the end of your sampling frame you will have all the people to be included in your sample.

For example,

To select a sample of 25 dorm rooms in your college dorm, (1) Make a list of all the room numbers in the dorm. Say there are 100 rooms.

(2) Divide the total number of rooms (100) by the number of rooms you want in the sample (25). The answer is 4. This means that you are going to select every fourth dorm room from the list. But you must first consult a table of random numbers.

(3) Pick any point on the table, and read across or down until you come to a number between 1 and 4. This is your random starting point. Say your random starting point is "3". This means you select dorm room 3 as your first room, and then every fourth room down the list (3, 7, 11, 15, 19, etc.) until you have 25 rooms selected.

Systematic Sample/Skip Interval Sample

1. Begin with a numbered sampling frame again.

2. Choose your random number.

3. Choose your SAMPLING INTERVAL = number in population divided by number desired in sample, or N/n.

4. Select the element that corresponds to the random number. Then instead of picking a second random number, etc., count out the interval (N/n) and choose that element. When you get to the end of the list go back to the beginning until you have your full sample.

Note, if you get a fraction, round up. If you round down, you might not get to the end of the list, and those elements at the end will not have any probability of inclusion. With rounding up, you will always get through the whole list.

This method is useful for selecting large samples, say 100 or more. It is less cumbersome than a simple random sample using either a table of random numbers or a lottery method. For example, you might have to sample files in a large filing cabinet. It is easier to select every 17th file than to pull out all the files and number them, etc.

However, you must be aware of problems that can arise in systematic random sampling. If the selection interval matches some pattern in the list (e.g., each 4th dorm room is a single unit, where all the others are doubles) you will introduce systematic bias into your sample.

Stratified sampling

is commonly used probability method that is superior to random sampling because it reduces sampling error.

A stratum is a subset of the population that share at least one common characteristic. Examples of stratums might be males and females, or managers and non-managers.

The researcher first identifies the relevant stratums and their actual representation in the population.

Random sampling is then used to select a sufficient number of subjects from each stratum. "Sufficient" refers to a sample size large enough for us to be reasonably confident that the stratum represents the population.

Stratified sampling is often used when one or more of the stratums in the population have a low incidence relative to the other stratums

Stratified random sampling

The population is divided into groupings (or strata), (e.g., divide it into the males and the females if you are using gender as your stratification variable).

Take a random sample from each group (i.e., take a random sample of males and a random sample of females). Put these two sets of people together and you now have your final sample.

Each unit in the population is identified, and each unit has a known, non-zero chance of being in the sample. This is used when the researcher knows that the population has sub-groups (strata) that are of interest.

For example, if you wanted to find out the attitudes of students on your campus about immigration, you may want to be sure to sample students who are from every region of the country as well as foreign students. Say your student body of 10,000 students is made up of 8,000 – Middle East; 1,000 – South East Asia; 500 - Africa; 300 – Indian Continent; 200 - Others.

Improves representativeness in terms of the stratification variables

Two different types of stratified sampling

Proportional stratified sampling In proportional stratified sampling you must make sure the

subsamples (e.g., the samples of males and females) are proportional to their sizes in the population.

Disproportional stratified sampling. In disproportional stratified sampling, the subsamples are

not proportional to their sizes in the population.

In a population you have 365 students

219 female students(60%)

146 male studenst40%

From these you will selectA stratified sampel of:

66 female(60%)

44 Male(40%)

and

50

25

25

25

25

25

50

STRATIFIED

Here is an example

Assume that your population is 75% female and 25% male. Assume also that you want a sample of size 100 and you want to stratify on the variable called gender.

For proportional stratified sampling, you would randomly select 75 females and 25 males from the population.

For disproportional stratified sampling, you might randomly select 50 females and 50 males from the population.

If you select a simple random sample of 500 students, you might not get any from the Midwest, South, or Foreign.

To make sure that you get some students from each group, you can divide the students into these five groups, and then select the same percentage of students from each group using a simple random sampling method.

This is proportional stratified random sampling.

However, you may still have too few of some types of students. Instead, you may divide students into the five groups and then select the same number of students from each group using a simple random sampling method.

This is disproportionate stratified random sampling. This allows you to have enough students in each sub-group so that you can perform some meaningful statistical analyses of the attitudes of students in each sub-group.

In order to say something about the attitudes of the total student

population of the university, however, you will have to apply weights to the findings for each sub-group, proportional to its presence in the total student body.

How do you divide into stratum ?

Level of education, type of occupaion, religious affiliation, tempat tinggal dan banyak lagi

Misalnya you want to study the attitudes of adolescent towards certain issue (clothing patterns). You might as well compare those who live in small, big towns as well as those who live in rural areas.

So here you have theree sub-groups.

Misalnya lagi. The study is about the EFFECT OF YOUTH PARTICIPATION IN SCHOOL TO WORK PROGRAM ON THEIR OCCUPAtIONAL SELF-EFFICACY, CAREER DECIDEDNESS, AND EMPLOYABILITTY SKILLS.

How many sub groups do we have here?

Cluster sampling This is the most commonly used scientific sampling method in the social

sciences, like opinion polling, etc.

Randomly select clusters, or pre-existing, natural groupings rather than individual type units in the first stage of sampling.

Use it when you don't have or need a sampling frame:

a list doesn't exist, the list would be too hard to get, or if the population is directly identifiable without a list (eg. name

of the road/street in your area).

Cluster sampling views the units in a population as not only being members of the total population but as members also of naturally-occurring in clusters within the population. For example, city residents are also residents of neighborhoods, blocks, and housing structures.

You can do cluster sampling when the elements of the population naturally "cluster" into identifiable patterns, like neighborhoods, organizations, etc.

The assumption is that individuals within a cluster will be fairly homogenous. (Talk about housing area).

You have to come up with your clusters carefully!

1. Take the whole population and divide it into a bunch of smaller clusters. Number the clusters.2. Do a simple random or systematic sample of the clusters.3. Divide the chosen clusters into smaller ones and number them. 4. Repeat 2. And so on until you get to individual elements in your sample.

A

B

G D

C

EF

H

D

F

A

CLUSTER SAMPLING

Cluster sampling is used in large geographic samples where no list is available of all the units in the population but the population boundaries can be well-defined.

For example, to obtain information about the drug habits of all high school students in a state, you could obtain a list of all the school districts in the state and select a simple random sample of school districts.

Then, within in each selected school district, list all the high schools and select a simple random sample of high schools. Within each selected high school, list all high school classes, and select a simple random sample of classes.

Then use the high school students in those classes as your sample.

Cluster sampling must use a random sampling method at each stage. This may result in a somewhat larger sample than using a simple random sampling method, but it saves time and money.

It is also cheaper to administer than a statewide sample of high school seniors, because there are many fewer sites to obtain information from.

Ambil jumlah kelompok yang banyak untuk menjadikan sampel mempunyai lebih keperwakilan.

Lebih baik sampel kelompok kecil dengan banyak berbanding kelompok besar dengan bilangan sedikit terutama sekali bila banyak variasi dalam populasi.

Lebih banyak maklumat akan diperolehi dan akan memberikan anggaran yang lebih tepat. Misalnya:

Kalau tidak banyak perbezaan antara sampe dalam kelompok, bilangan kelompok yang kecil sudah memadai

Makin banyak bilangan kelompok yang dipilih makin yakin untuk kita mengaplikasikan dapatan kajian kepada populasi.

Kebaikannya: boleh digunakan apabila susah atau tidak mungkin dapat memilih sampel secara rawak

Tidak memerlukan masa yang banyak

Kelemahan: mungkin akan memilih kelompok yang tidak mewakili populasi

Kesilapan yang sering berlaku: memilih satu kelompok sahaja terutama apabila bilangan sampel yang terdapat dalam kelompok itu besar. Perlu ingat kelompok berkenaan tidak mewakili populasi. Perlu juga diketahui bahawa disini kita memilih kelompok secara rawak bukannya subjek kajian. Oleh itu salah bagi kita membuat inferensi kepada populasi. MEMANG SALAH!!!! DAN JANGAN BUAT SEBEGITU.

Advantages of Cluster Sampling method:

Less costly.

Don't need a list.

At start everyone has an approximately equal chance of selection despite the number of steps involved.

boleh digunakan apabila susah atau tidak mungkin dapat memilih sampel secara rawak

Tidak memerlukan masa yang banyak

.

Disadvantages of the Cluster Sample: more possibility of introducing error - drawing the boundaries,

etc.

increases with the number of steps involved.

Have to figure out a balance between number of stages and the number you want in your final sample. For instance, we could get a sample of 2000 Malaysian by picking 2000 clusters and one person from each, or we could pick 1000 each from 2 clusters. I

if the clusters aren't drawn well, the second method would be unrepresentative. But if the single person drawn from the first method was weird, it wouldn't matter how good the clusters were

Memerlukan sampel size yang lebih besar berbanding rawak mudah atau rawak berstrata

Dalam kelompok perlu heterogenus seboleh mungkin dan antara kelompok mestilah homogenus seboleh mungkin

Two types of cluster sampling

One-stage cluster sampling To select a one-stage cluster sample, you first select a

random sample of clusters. Then, second, you include in your final sample all of the

individual units in the selected clusters

Two-stage cluster sampling First, you take a random sample of clusters (i.e., just like

you did in one-stage cluster sampling). Second, you take a random sample of elements from

each of the selected clusters (e.g., you might randomly select 10 students from each of the 15 classrooms you selected in stage one).

Multistage Sampling: involves combinations of stratified and/or clustered and/or simple random samples until one reaches the desired unit of analysis

Allows us to get a random sample without a sampling frame. Example: sample from East Malaysia

Cluster:urban / rural and randomly select communities

When random sampling is not possible,

Descrtibe the sample as thoroughly as possible so that the interested party can judge in making the generalizability

Do replication

When random sampling is done, the resulting subsets will be "mirror images" of each other (except for chance differences).

Random assignment you start with a set of people (that may very well be a convenience

sample) and then randomly divide that set of people into two or more subsets. You are taking a set of people and "assigning" them to two or more groups.

For example, if you randomly assign a convenience sample of 150 people to three groups of 50 people, the three groups will be "equivalent" on all known and unknown variables. In short, random assignment generates similar groups that can be used in strong experimental research designs

Non-probability (non-random) samples

These samples focus on volunteers, easily available units, or those that just happen to be present when the research is done.

Non-probability samples are useful for quick and cheap studies, for case studies, for qualitative research, for pilot studies, and for developing hypotheses for future research.

Nonrandom Sampling Techniques

Convenient sample:

Also called an "accidental" sample. These are the ones like "man on the street interviews," The researcher selects units that are convenient, close at hand, easy to reach, etc. Ex. Choose whoever walks by. If you looked at folks' clothes at the bus terminal, that means you have done a convenience sampling.

Advantages of convenience samples: easy cheap some possibility of substantive inference, if you can justify,

but not statistical inference. Ex: many psych. studies are done with college students as subjects.

If the researcher can make the case that the college students are like other people in the relevant characteristics, then it's OK, but you can't use the concept of statistical inference.

Disadvantages of ALL non-scientific samples: Can't do statistical inference.

Purposive sample

the researcher selects the units with some purpose in mind or with a specific set of characteristics for your research study. For example, students who live in dorms on campus, or experts on urban development.

it involves selecting a convenience sample from a population)

Judgment sampling

is a common nonprobability method. The researcher selects the sample based on judgment

(according to a specific criteria of interest) This is usually and extension of convenience sampling. For

example, a researcher may decide to draw the entire sample from one "representative" city, even though the population includes all cities.

When using this method, the researcher must be confident that the chosen sample is truly representative of the entire population

Example of judgemental sampling: A wanted to talk about ENVONMENT ISSUES , so she sought out people who were really involved in KEDAH OVER ENVIRONMENT ISSUE. She didn't want to know about everyone's opinions on THIS ISSUE, just about the activists. Or you might start with one person that fits the bill and ask for recommendations of other people like her. This is big in studies of political elites (ask a staffer to recommend some friends, etc.).

Quota sample: the researcher constructs quotas for different types of units.

Use convenience sampling to obtain those quotas. A set of quotas might be as follows: to interview a fixed number of shoppers at a mall, half of whom are male and half of whom are female.

Snowball sampling (i.e., it involves asking your participants to identify other potential participants with a specific set of characteristics, then asking the next set of participants you obtain the same question, and continuing this process until a sufficient sample size is obtained).

It's been shown that the main factor in generalizability is not HOW MANY SAMPLE SUBJECTS, but rather, THE PROCESS BY WHICH THEY WERE SELECTED

apakah proses persampelan yang akan kita gunakan?

how "typical" or representative is the sample of this population?

If randomly -- you're hoping you'll get a "good enough mix" on other contaminating variables or threats to validity so that they will cancel out (e.g., mix of ethnicities, I.Q.'s. & so forth).

how "certain" can you be that the findings from the sample will hold true for the entire population? Especially if we didnt "study" (e.g., survey or interview) everyone?

How often is random sampling done?

The answer to this question depends on the research method being used. For example, when we use the experimental research method it is RARELY used.

However, if we use the survey method it is FREQUENTLY used. The difference is that experiments usually require some commitment on the part of the subjects. Whereas in survey research, the commitment is usually low.

Thus, when we randomly contact people in the survey research method we will usually get high levels of participaton. When we design experiments we usually must rely on volunteers.

Start here for next lecture First three lecture belum ada nota Exam after break Everybody should talk in english

BIAS AND ERROR IN SAMPLING

A sample is expected to mirror the population from which it comes, however, there is no guarantee that any sample will be precisely representative of the population from which it comes. One of the most frequent causes of unrepresentative of its population is sampling error.

Chance may dictate that a disproportionate number of untypical observations will be made like for the case of testing fuses, the sample of fuses may consist of more or less faulty fuses than the real population proportion of faulty cases.

In practice, it is rarely known when a sample is unrepresentative and should be discarded.

Sampling error : the differences between the sample and the population that are due solely to the particular units that happen to have been selected.

(the difference between the result of a given sample and the result of a census conducted using identical procedures).

Sekiranya kita menggunakan populasi maka ralat persampelan tidak akan berlaku. Apabila kita menggunakan sampel sudah tentu berlaku perbezaan ciri di antara satu unit dengan satu unit yang lain.

Apabila kita menggunakan sampel ralat tetap wujud cuma ianya berbeza dari segi saiz ralat. Ralat persampeln di sebabkan oleh dua faktor uama iaitu (1) saiz sampel yang digunakan dan (2) kaedah persampelan yang digunakan.

For example, suppose that a sample of 100 women are measured and are all found to be taller than 5.6 feet. It is very clear even without any statistical prove that this would be a highly unrepresentative sample leading to invalid conclusions.

The more dangerous error is the less obvious sampling error against which nature offers very little protection. An example would be like a sample in which the average height is overstated by only one inch or two rather than one foot which is more obvious. It is the unobvious error that is of much concern.

STUDENTS NO. SEX SCHOOL IQ

52 M B 110

63 F B 83

82 F C 105

75 M C 113

92 F C 98

36 F B 129

03 F A 130

11 F A 117

43 M B 117

08 M A 120

SAMPEL 1

MIN IQ OF POPOLATION (100 students) = 109.5MIN IQ OF SAMPEL (10 students = 112.2

STUDENTS NO. SEX SCHOOL IQ

72 F C 121

64 F C 137

94 F C 96

49 M B 111

41 M B 125

20 F A 104

05 F A 123

93 M C 97

14 M A 111

99 M C 83

SAMPEL 2

MIN IQ OF POPOLATION (100 students) = 109.5MIN IQ OF SAMPEL1 (10 students) = 112.2MIN IQ OF SAMPEL2 (10 students) = 110.8MIN IQ OF SAMPEL 1+2 (20 students) = 115.5

TAKE ANOTHER SAMPLE OF 10 STUDENTS, SAMPLE 3

MIN IQ OF POPOLATION (100 students) = 109.5MIN IQ OF SAMPEL1 (10 students) = 112.2MIN IQ OF SAMPEL2 (10 students) = 110.8MIN IQ OF SAMPEL 1+2 (20 students) = 115.5MIN IQ OF SAMPEL3 (10 students) = 112.3

TAKE ANOTHER SAMPLE OF 10 STUDENTS, SAMPLE 4

MIN IQ OF POPOLATION (100 students) = 109.5MIN IQ OF SAMPEL1 (10 students) = 112.2MIN IQ OF SAMPEL2 (10 students) = 110.8MIN IQ OF SAMPEL 1+2 (20 students) = 115.5MIN IQ OF SAMPEL3 (10 students) = 112.3MIN IQ OF SAMPEL4 (10 students) = 104.2MIN IQ OF SAMPEL1+2+3+4 (40 students) = 109.6

There are two basic causes for sampling error. One is chance [Unusual units in a population do exist and there is always a possibility that an abnormally large number of them will be chosen]. The main protection agaisnt this kind of error is to use a large enough sample and

sampling bias [tendency to favour the selection of units that have paticular characteristics ]. Sampling bias is usually the result of a poor sampling plan.

Sampling bias is a systematic mistake; the fault of the researcher.

The major source of sampling biases comes from the use of nonprobability sampling techniques. If you can't specify the probability that each member can be chosen, then the results won't be generalizable.

Biases comes from: Convenience--because they are readily available, Volunteers--probably not like the others who didn't volunteer, Judgment sampling--"I think they represent the group," Administrative convenience--when the boss says "use this

group."

If a bias does exist, the researcher must describe it fully in the final report.

Frame error: discrepancy between the intended target population and the actual population from which the sample is drawn

It could be due to : (a) missing elements - individuals who should be on your list but for some reason are not on the list.

(b) Foreign elements. Elements which should not be included in my population and sample appear on my sampling list.

(c) Duplicates. These are elements who appear more than once on the sampling frame.

Selection error

Certain element in the frame have a greater chance of falling into the sample than the others

The elements may be listed more than once on different lists . (mathematic teachers, school grade, gender list)

"How big should my sample be?" Here are my four "simple" answers to your question:

Try to get as big of a sample as you can for your study (i.e., because the bigger the sample the better).

If your population is size 100 or less, then include the whole population rather than taking a sample (i.e., don't take a sample; include the whole population).

Look at other studies in the research literature and see how many they are selecting.

For an exact number, just look at tables which show recommended sample sizes.

You also need to understand that there are many times when you will need larger

rather than smaller samples. You will need larger samples when…….

When the population is very heterogeneous.

When you want to breakdown the data into multiple categories.

When you want a relatively narrow confidence interval (e.g., note that the estimate that 75% of teachers support a policy plus or minus 4% is more narrow than the estimate of 75% plus or minus 5%).

When you expect a weak relationship or a small effect.

When you use a less efficient technique of random sampling (e.g., cluster sampling is less efficient than stratified sampling).

When you expect to have a low response rate. The response rate is the percentage of people in your sample who agree to be in your study.

To estimate a sample size, a researcher must:

estimate the standard deviation of the population– homogeneity of the population

allowable amounts of error [Degree of precision desired

juga disebut margin of error atau ketepatan yang diperlukan]

determine a confidence interval

Degree of precision desired also known as margin of error or precision required.

(3%, 4%, 5%, 10%) we choose the disired precision.

how close the estimate should fall to the parameter.

(exactness of prediction). If possible the sample statistics must be equal to population parameter, but is impossible. We do not take measurement from every unit. Thus, there is a sampling error.

SAMPLE SIZE CRITERIA

In addition to the purpose of the study and population size, the risk of selecting a "bad" sample, and the allowable sampling error,

Three criteria usually will need to be specified to determine the appropriate sample size:

level of precision, level of confidence or risk, and degree of variability in the attributes being

measured

Degree Of Variability

the degree of variability in the attributes being measured refers to the distribution of attributes in the population.

The more heterogeneous a population, the larger the sample size required to obtain a given level of precision.

The less variable (more homogeneous) a population, the smaller the sample size. Note that a proportion of 50% indicates a greater level of variability than either 20% or 80%.

This is because 20% and 80% indicate that a large majority do not or do, respectively, have the attribute of interest. Because a proportion of .5 indicates the maximum variability in a population, it is often used in determining a more conservative sample size, that is, the sample size may be larger than if the true variability of the population attribute were used.

The Level Of Precision

The level of precision, sometimes called sampling error, is the range in which the true value of the population is estimated to be.

This range is often expressed in percentage points, (e.g., ±5 percent), in the same way that results for political campaign polls are reported by the media.

Thus, if a researcher finds that 60% of farmers in the sample have adopted a recommended practice with a precision rate of ±5%, then he or she can conclude that between 55% and 65% of farmers in the population have adopted the practice.

misalnya IQ pelajar ialah 100 sd=15. Jika kita mengambil cerapan daripada 25 sampel, ralat persampelnnya ialah

15/√25=3.0 jadi skor IQ pelajar berada pada misalnya 100 ± 3.0

tetapi sekiranya saiz sampel ditingkatkan menjadi 40maka ralat persampelan menjadi

15/√40=2.3skor IQ pelajar sekarang berada dalam lengkongan 100 ± 2.3

jika kita tambal saiz sampel menjadi 100ralat persampelan menjadi

15/√100=1.5 skor IQ pelajar sekarang berada dalam lengkongan 100 ± 1.5

Ini bermakna semakin besar saiz sampel semakin kurang ralat persampelan dan min sampel akan menghampiri min populasi

Once these are known, the formula for calculating sample size is

n = (Z*S )2E

where...Z = standardized value that corresponds to the confidence levelS = sample standard deviationE = acceptable magnitude of errorSuppose a researcher studying annual expenditures on books wishes to have a 95% confidence interval (Z=1.96) and a range of error (E) of less than $2, and an estimate of the standard deviation is $29.

n = (Z*S )2 {(1.96)(29)/2}2 = = 808 E If we change the range of acceptable error to $4, sample size

falls n = {(1.96)(29)/4 = 202

Suppose you wanted to estimate the same size for a survey which contains the following question:

What is your overall attitude towards Hospital X? Very Good 7 6 5 4 3 2 1 Very Poor

The range of acceptable error is 0.1 points, the confidence level is 95%, and the estimated standard deviation is 1.

n = (Z*S )2 {(1.96)(1)/.1}2 = = 384 E If you increase the acceptable error to 0.2, the sample size drops to n = 96

The Confidence Level The confidence or risk level is based on ideas encompassed under the

Central Limit Theorem.

The key idea encompassed in the Central Limit Theorem is that when a population is repeatedly sampled, the average value of the attribute obtained by those samples is equal to the true population value.

Furthermore, the values obtained by these samples are distributed normally about the true value, with some samples having a higher value and some obtaining a lower score than the true population value.

In a normal distribution, approximately 95% of the sample values are within two standard deviations of the true population value (e.g., mean).

In other words, this means that, if a 95% confidence level is selected, 95 out of 100 samples will have the true population value within the range of precision specified earlier

There is always a chance that the sample you obtain does not represent the true population value.

Such samples with extreme values are represented by the shaded areas.

This risk is reduced for 99% confidence levels and increased for 90% (or lower) confidence levels.

A "p" value of .05 is a commonly used significance level. When we say "the results are significant at the .05 level," we have a 95% probability that the differences we observed in the data were not due to chance. With a "p" value of .01, we have a 99% probability that the differences were not due to chance.

For example, if your population is 500,000, and you want to be 95% confident that your data are representative of your population to + 1% accuracy, 9,423 returned surveys are required. (If you expect a response rate between 30% - 40%, you must then estimate the number for your initial mailing at approximately 29,000.)

If you want to be 99% confident that your data is accurate to within + 1%, you need 16,057 surveys returned.

This number can drastically drop if you need 95% confidence that your data is accurate to +5%. With those parameters, 384 returned surveys will be adequate.

Population mean = 109.5 Sample1 mean of 10 students = 112.2 Sampel2 mean of 10 students = 110.8 Sample(1&2) mean of 20 students =111.5 Sample 3 mean = 104.4 Sample 4 mean = 91.2 Sample mean of 3&4 = 97.8 All four samples =109.6

In general, it is safe to assume that:

Sample sizes will need to increase as the size of the confidence interval decreases.

Sample sizes will need to increase as the level of statistical significance decreases.

Sample sizes will need to increase as population increases.

The 'p' value .05 is often used to calculate the sample size you need and set thresholds of statistical significance.

The term "significance" does not mean "important," but rather is a measure of confidence when analyzing the data.

In completing this discussion of determining sample size, there are three additional issues.

First, the above approaches to determining sample size have assumed that a simple random sample is the sampling design.

More complex designs, e.g., stratified random samples, must take into account the variances of subpopulations, strata, or clusters before an estimate of the variability in the population as a whole can be made.

Another consideration with sample size is the number needed for the data analysis.

If descriptive statistics are to be used, e.g., mean, frequencies, then nearly any sample size will suffice.

On the other hand, a good size sample, e.g., 200-500, is needed for multiple regression, analysis of covariance, or log-linear analysis, which might be performed for more rigorous state impact evaluations.

The sample size should be appropriate for the analysis that is planned.

In addition, an adjustment in the sample size may be needed to accommodate a comparative analysis of subgroups (e.g., such as an evaluation of program participants with nonparticipants).

Sudman (1976) suggests that a minimum of 100 elements is needed for each major group or subgroup in the sample and for each minor subgroup, a sample of 20 to 50 elements is necessary.

Similarly, Kish (1965) says that 30 to 200 elements are sufficient when the attribute is present 20 to 80 percent of the time (i.e., the distribution approaches normality).

On the other hand, skewed distributions can result in serious departures from normality even for moderate size samples (Kish, 1965:17). Then a larger sample or a census is required.

Finally, the sample size formulas provide the number of responses that need to be obtained. Many researchers commonly add 10% to the sample size to compensate for persons that the researcher is unable to contact.

The sample size also is often increased by 30% to compensate for nonresponse. Thus, the number of mailed surveys or planned interviews can be substantially larger than the number required for a desired level of confidence and precision.

STRATEGIES FOR DETERMINING SAMPLE SIZE

There are several approaches to determining the sample size. These include using a census for small populations, imitating a sample size of similar studies, using published tables, and applying formulas to calculate a sample size.

Using A Census For Small Populations

One approach is to use the entire population as the sample. Although cost considerations make this impossible for large populations, a census is attractive for small populations (e.g., 200 or less).

A census eliminates sampling error and provides data on all the individuals in the population. In addition, some costs such as questionnaire design and developing the sampling frame are "fixed," that is, they will be the same for samples of 50 or 200.

Finally, virtually the entire population would have to be sampled in small populations to achieve a desirable level of precision

Using A Sample Size Of A Similar Study

Another approach is to use the same sample size as those of studies similar to the one you plan.

Without reviewing the procedures employed in these studies you may run the risk of repeating errors that were made in determining the sample size for another study.

However, a review of the literature in your discipline can provide guidance about "typical" sample sizes which are used.

Using Published Tables

A third way to determine sample size is to rely on published tables which provide the sample size for a given set of criteria. Some tables present sample sizes that would be necessary for given combinations of precision, confidence levels, and variability. Please note two things.

First, these sample sizes reflect the number of obtained responses, and not necessarily the number of surveys mailed or interviews planned (this number is often increased to compensate for nonresponse).

Second, the sample sizes presume that the attributes being measured are distributed normally or nearly so. If this assumption cannot be met, then the entire population may need to be surveyed.

Using Formulas To Calculate A Sample Size

Although tables can provide a useful guide for determining the sample size, you may need to calculate the necessary sample size for a different combination of levels of precision, confidence, and variability. The fourth approach to determining sample size is the application of one of several formulas was used to calculate the sample sizes

n = N ----------

1 + N(e)2

where do you get what you are telling us?. sources of data where do you get the data? and from whom...

Documents

theoretical population

target population

defined population

malaysiaa population

national sample

accessible populations

sampling frame

group of people