kuliah7_validrelible

43
Validity and Validity and Reliability Reliability

Upload: jelita-kasih-adinda

Post on 05-Dec-2015

3 views

Category:

Documents


0 download

DESCRIPTION

kuliah7_validrelible

TRANSCRIPT

Validity and ReliabilityValidity and Reliability

Validity and ReliabilityValidity and ReliabilityChapter EightChapter Eight

ValidityValidity

ValidityValidity has been defined as referring to the has been defined as referring to the appropriateness, correctness, meaningfulness, appropriateness, correctness, meaningfulness, and usefulness of the specific inferences and usefulness of the specific inferences researchers make based on the data they collect.researchers make based on the data they collect.

It is the most important idea to consider when It is the most important idea to consider when preparing or selecting an instrument.preparing or selecting an instrument.

ValidationValidation is the process of collecting and is the process of collecting and analyzing evidence to support such inferences.analyzing evidence to support such inferences.

Evidence of ValidityEvidence of Validity

There are 3 types of evidence a researcher There are 3 types of evidence a researcher might collectmight collect Content-related evidence of validityContent-related evidence of validity

Content and format of the instrumentContent and format of the instrument Criterion-related evidence of validityCriterion-related evidence of validity

Relationship between scores obtained using the Relationship between scores obtained using the instrument and scores obtainedinstrument and scores obtained

Construct-related evidence of validityConstruct-related evidence of validity Psychological construct being measured by the Psychological construct being measured by the

instrumentinstrument

Illustration of Types of Evidence of Validity (Figure 8.1)Illustration of Types of Evidence of Validity (Figure 8.1)

Content-related EvidenceContent-related Evidence A key element is the adequacy of the A key element is the adequacy of the

sampling of the domain it is supposed to sampling of the domain it is supposed to represent.represent.

The other aspect of content validation is the The other aspect of content validation is the format of the instrument.format of the instrument.

Attempts to obtain evidence that the items Attempts to obtain evidence that the items measure what they are supposed to measure measure what they are supposed to measure typify the process of content-related typify the process of content-related evidence.evidence.

Criterion-related EvidenceCriterion-related Evidence A criterion is a second test presumed to A criterion is a second test presumed to

measure the same variable.measure the same variable. There are two forms of criterion-related There are two forms of criterion-related

validity:validity:1)1) Predictive validity: time interval elapses between Predictive validity: time interval elapses between

administering the instrument and obtaining criterion administering the instrument and obtaining criterion scoresscores

2)2) Concurrent validity: instrument data and criterion Concurrent validity: instrument data and criterion data are gathered and compared at the same timedata are gathered and compared at the same time

A Correlation Coefficient (r) indicates the A Correlation Coefficient (r) indicates the degree of relationship that exists between the degree of relationship that exists between the scores of individuals obtained by two scores of individuals obtained by two instruments.instruments.

Construct-related EvidenceConstruct-related Evidence

Considered the broadest of the three Considered the broadest of the three categories.categories.

There is no single piece of evidence that There is no single piece of evidence that satisfies construct-related validity.satisfies construct-related validity.

Researchers attempt to collect a variety of Researchers attempt to collect a variety of types of evidence, including both content-types of evidence, including both content-related and criterion-related evidence.related and criterion-related evidence.

The more evidence researchers have from The more evidence researchers have from different sources, the more confident they different sources, the more confident they become about the interpretation of the become about the interpretation of the instrument.instrument.

ReliabilityReliability

Refers to the consistency of scores or Refers to the consistency of scores or answers provided by an instrument.answers provided by an instrument.

Scores obtained can be considered reliable Scores obtained can be considered reliable but not valid.but not valid.

An instrument should be reliable and valid An instrument should be reliable and valid (Figure 8.2), depending on the context in (Figure 8.2), depending on the context in which an instrument is used.which an instrument is used.

Reliability and Validity Reliability and Validity (Figure 8.2)(Figure 8.2)

Reliability of Measurement Reliability of Measurement (Figure 8.3)(Figure 8.3)

Errors of MeasurementErrors of Measurement

Because errors of measurement are Because errors of measurement are always present to some degree, variation always present to some degree, variation in test scores are common.in test scores are common.

This is due to:This is due to: Differences in motivationDifferences in motivation EnergyEnergy AnxietyAnxiety Different testing situationDifferent testing situation

Reliability Coefficient Reliability Coefficient

Expresses a relationship between scores Expresses a relationship between scores of the same instrument at two different of the same instrument at two different times or parts of the instrument.times or parts of the instrument.

The 3 best known methods are:The 3 best known methods are: Test-retestTest-retest Equivalent forms methodEquivalent forms method Internal consistency methodInternal consistency method

Test-Retest MethodTest-Retest Method Involves administering the same test twice to Involves administering the same test twice to

the same group after a certain time interval has the same group after a certain time interval has elapsed.elapsed.

A reliability coefficient is calculated to indicate A reliability coefficient is calculated to indicate the relationship between the two sets of scores.the relationship between the two sets of scores.

Reliability coefficients are affected by the lapse Reliability coefficients are affected by the lapse of time between the administrations of the test.of time between the administrations of the test.

An appropriate time interval should be An appropriate time interval should be selected.selected.

In Educational Research, scores collected over In Educational Research, scores collected over a two-month period is considered sufficient a two-month period is considered sufficient evidence of test-retest reliability.evidence of test-retest reliability.

Equivalent-Forms MethodEquivalent-Forms Method

Two different but equivalent (alternate or Two different but equivalent (alternate or parallel) forms of an instrument are parallel) forms of an instrument are administered to the same group during the administered to the same group during the same time period.same time period.

A reliability coefficient is then calculated A reliability coefficient is then calculated between the two sets of scores.between the two sets of scores.

It is possible to combine the test-retest and It is possible to combine the test-retest and equivalent-forms methods by giving two equivalent-forms methods by giving two different forms of testing with a time interval different forms of testing with a time interval between the two administrations.between the two administrations.

Internal-Consistency MethodsInternal-Consistency Methods There are several internal-consistency methods that There are several internal-consistency methods that

require only one administration of an instrument.require only one administration of an instrument. Split-half ProcedureSplit-half Procedure: involves scoring two halves of a : involves scoring two halves of a

test separately for each subject and calculating the test separately for each subject and calculating the correlation coefficient between the two scores.correlation coefficient between the two scores.

Kuder-Richardson ApproachesKuder-Richardson Approaches: (KR20 and KR21) : (KR20 and KR21) requires 3 pieces of information: requires 3 pieces of information:

Number of items on the testNumber of items on the test The meanThe mean The standard deviationThe standard deviation

Considered the most frequent method for determining Considered the most frequent method for determining internal consistencyinternal consistency

Alpha CoefficientAlpha Coefficient: a general form of the KR20 used to : a general form of the KR20 used to calculate the reliability of items that are not scored calculate the reliability of items that are not scored right vs. wrong.right vs. wrong.

Standard Error of MeasurementStandard Error of Measurement

An index that shows the extent to which An index that shows the extent to which a measurement would vary under a measurement would vary under changed circumstances.changed circumstances.

There are many possible standard errors There are many possible standard errors for scores given.for scores given.

Also known as Also known as measurement errormeasurement error, a , a range of scores that show the amount of range of scores that show the amount of error which can be expected. (Appendix error which can be expected. (Appendix D)D)

Scoring AgreementScoring Agreement Scoring agreement requires a demonstration that Scoring agreement requires a demonstration that

independent scorers can achieve satisfactory independent scorers can achieve satisfactory agreement in their scoring.agreement in their scoring.

Instruments that use direct observations are highly Instruments that use direct observations are highly vulnerable to observer differences.vulnerable to observer differences.

What is desired is a correlation of at least .90 What is desired is a correlation of at least .90 among scorers as an acceptable level of among scorers as an acceptable level of agreement.agreement.

Internal ValidityInternal Validity

Internal ValidityInternal ValidityChapter NineChapter Nine

What is Internal Validity?What is Internal Validity?

Internal ValidityInternal Validity is when observed differences on is when observed differences on the dependent variable are directly related to the the dependent variable are directly related to the independent variable, and not due to some other independent variable, and not due to some other unintended variable.unintended variable.

In other words, any relationship observed In other words, any relationship observed between two or more variables should be between two or more variables should be unambiguous as to what it means rather than unambiguous as to what it means rather than being due to “something else”.being due to “something else”.

The “something else” could be:The “something else” could be: AgeAge AbilityAbility Types of materials usedTypes of materials used

Threats to Internal ValidityThreats to Internal Validity

Subject Subject CharacteristicsCharacteristics

MortalityMortality LocationLocation InstrumentationInstrumentation TestingTesting

HistoryHistory MaturationMaturation Attitude of subjectsAttitude of subjects RegressionRegression ImplementationImplementation

Mortality Threat to Internal Validity Mortality Threat to Internal Validity (Figure 9.1)(Figure 9.1)

Location Might Make a Difference Location Might Make a Difference (Figure 9.2)(Figure 9.2)

Subject CharacteristicsSubject Characteristics The selection of people may result in differences, The selection of people may result in differences,

either between individuals or groups, that are either between individuals or groups, that are related to the variables being studied.related to the variables being studied.

This refers to a ‘selection bias’ or subject This refers to a ‘selection bias’ or subject characteristics threat.characteristics threat.

If not controlled, these variables may “explain away” If not controlled, these variables may “explain away” whatever differences are found in the study.whatever differences are found in the study.

There are techniques used to either equalize the There are techniques used to either equalize the differences or control these variables.differences or control these variables.

MortalityMortality It is common to lose subjects as a study It is common to lose subjects as a study

progressesprogresses This is known as “mortality threat”.This is known as “mortality threat”. Loss of subjects limits generalizability and Loss of subjects limits generalizability and

can introduce bias.can introduce bias. Mortality is the most difficult threat to control Mortality is the most difficult threat to control

for internal validity.for internal validity. An attempt to eliminate the problem would An attempt to eliminate the problem would

be to provide evidence that the subjects lost be to provide evidence that the subjects lost were similar to those who remained in the were similar to those who remained in the study.study.

LocationLocation

The particular locations where data is The particular locations where data is collected may create different results or collected may create different results or explanations known as ‘location threat’.explanations known as ‘location threat’.

The best way to control for this is to keep The best way to control for this is to keep the location consistent for all subjects.the location consistent for all subjects.

If this is not possible, the researcher should If this is not possible, the researcher should ensure that different locations do not favor ensure that different locations do not favor or jeopardize the hypothesis.or jeopardize the hypothesis.

InstrumentationInstrumentation

The way instruments are used may The way instruments are used may constitute a threat to the internal validity constitute a threat to the internal validity of a study.of a study.

Some examples are as follows:Some examples are as follows: Instrument decayInstrument decay Data Collector CharacteristicsData Collector Characteristics Data Collector BiasData Collector Bias

Instrument Decay Instrument Decay (Figure 9.3)(Figure 9.3)

A Data Collector Characteristics Threat A Data Collector Characteristics Threat (Figure 9.4)(Figure 9.4)

TestingTesting

A “testing threat” is when efforts are A “testing threat” is when efforts are achieved by the subjects due to “practice” achieved by the subjects due to “practice” (i.e., pretest, post-test designed study)(i.e., pretest, post-test designed study)

An interaction also could cause this by An interaction also could cause this by taking the test and being more aware of a taking the test and being more aware of a possible interaction, allowing the subjects possible interaction, allowing the subjects to be more responsive towards the to be more responsive towards the treatment.treatment.

A Testing Threat to Internal Validity A Testing Threat to Internal Validity (Figure 9.5)(Figure 9.5)

HistoryHistory

A A history threathistory threat is when an unforeseen is when an unforeseen event occurs during the course of the event occurs during the course of the study.study.

Researchers need to be alert to any Researchers need to be alert to any possibilities of influences that may occur possibilities of influences that may occur during the course of the study.during the course of the study.

A History Threat to Internal Validity A History Threat to Internal Validity (Figure 9.6)(Figure 9.6)

MaturationMaturation Change during an intervention may be due to Change during an intervention may be due to

factors associated with the passing of time factors associated with the passing of time rather than the intervention.rather than the intervention.

Students could change over the course of a Students could change over the course of a study. This is known as a Maturation Threat.study. This is known as a Maturation Threat.

Maturation is only a threat in studies using Maturation is only a threat in studies using pre/post data for the intervention group or in pre/post data for the intervention group or in studies that span a number of years.studies that span a number of years.

The best way to control for this is to include a The best way to control for this is to include a well-selected comparison group in the study.well-selected comparison group in the study.

Could Maturation be at Work Here?Could Maturation be at Work Here?(Figure 9.7)(Figure 9.7)

Attitude of SubjectsAttitude of Subjects The way subjects view a study and their The way subjects view a study and their

participation can be considered a threat to participation can be considered a threat to internal validity, a.k.a. the ‘Hawthorne effect’.internal validity, a.k.a. the ‘Hawthorne effect’.

Subjects may perform better based upon a Subjects may perform better based upon a feeling of receiving special attention.feeling of receiving special attention.

The opposite may occur, with subjects receiving The opposite may occur, with subjects receiving no treatment at all, resulting in poor no treatment at all, resulting in poor performances.performances.

A remedy to this would be to provide both A remedy to this would be to provide both groups with comparable treatments or to make groups with comparable treatments or to make the treatment a regular part of the study.the treatment a regular part of the study.

The Attitude of Subjects Can Make a Difference The Attitude of Subjects Can Make a Difference (Figure 9.8)(Figure 9.8)

RegressionRegression

A A regressionregression threat is possible when threat is possible when change is studied in a group that has change is studied in a group that has extreme low or high performance in the extreme low or high performance in the pre-intervention stage.pre-intervention stage.

As with the As with the maturationmaturation threat, this can be threat, this can be controlled by the use of an equivalent controlled by the use of an equivalent control or comparison group.control or comparison group.

Regression Rears Its Head Regression Rears Its Head (Figure 9.9)(Figure 9.9)

ImplementationImplementation

The experimental group may be treated in The experimental group may be treated in ways that are unintended, giving them an ways that are unintended, giving them an advantage.advantage.

This is known as an This is known as an implementationimplementation threat. threat. This can occur in two ways:This can occur in two ways:

1)1) When different individuals are assigned to When different individuals are assigned to implement different methods, and these implement different methods, and these individuals differ in ways related to the outcomeindividuals differ in ways related to the outcome

2)2) When some individuals have a personal bias in When some individuals have a personal bias in favor of one method over the otherfavor of one method over the other

How to Minimize Threats to How to Minimize Threats to Internal ValidityInternal Validity

There are four alternatives a researcher There are four alternatives a researcher can use to reduce threats to internal can use to reduce threats to internal validity:validity:

1)1) Standardize the conditions under which the Standardize the conditions under which the study occursstudy occurs

2)2) Obtain more information on the subjects of Obtain more information on the subjects of the studythe study

3)3) Obtain more information on the details of the Obtain more information on the details of the studystudy

4)4) Choose an appropriate designChoose an appropriate design

Illustration of Threats to Internal Validity Illustration of Threats to Internal Validity (Figure 9.10)(Figure 9.10)