ahmad measuremenr reliability and validity

Outline

Norm-Referenced Reliability Procedures

Criterion-Referenced Reliability Procedures

Norm-Referenced Validity Procedures

Norm-Referenced Item Analysis Procedures

Criterion-Referenced Validity Assessment

Criterion-Referenced Item Analysis Procedures

Affects

Norm-Referenced Reliability

Estimated by: Estimated by:

Criterion-Referenced Reliability

Estimated by: Estimated by:

Test-retest Parallel form Internal consistency

Test-retest Parallel form Inter-rater and intra-ratter

Test-retest Parallel form Internal consistency

-For affective measures

1. Administer instrument under standardized conditions.

2. Re-administer instrument with same conditions.

3. Determine the correlation between the two scores.

-Administer the two forms of instrument and identify the correlation between them.

-The two forms should have equal mean and standard deviation, equal correlation with a third variable, and being constructed with the same objective and procedure.

- Can assess equivalence & stability.

-For cognitive measures.

•Consistency of a single measure on one occasion.

•Cronbach’s alpha is the preferred index of internal consistency reliability. Why?

•KR 20 & KR 21 are special cases of alpha, used when data are dichotomous

αα coefficient coefficient

•Equal to the mean of the all possible split-half coefficients associated with a set of data

• Indicator for the consistency of items in the same instrument.

•Affected by test length, total test variance, and shape of the resulting distribution of test scores, and response rate.

Reliability of Subjectively Scored Measures

Estimating the Reliability of Change in Testing LengthEstimating the Reliability of Change in Testing Length

Criterion-Referenced Reliability

•The ability of a measure to consistently classify objects or persons into the same categories on two separate occasions.

Procedure for Test-retest for Criterion-Referenced Reliability

•The two parallel forms assess the same content domain and have The two parallel forms assess the same content domain and have relatively homogeneous items.relatively homogeneous items.

Inter-rater and intra-ratter/Inter-rater and intra-ratter/ Criterion-Referenced Reliability

•Error of standardsError of standards•Halo errorHalo error•Logic errorLogic error•Similarity errorSimilarity error•Central tendency errorCentral tendency error

Expected Rating errorsExpected Rating errors

Contrasted Groups Approach

Confirmatory Factor Analysis (CFA) Procedure

1. Reliability3.Construct

Validity

4.Discriminant

Validity

2. Convergent

Validity

Same Same ConstructConstruct

Different Different ConstructsConstructs

Method 1: rating scale

Method 2: checklist

ConstructConstruct 1: bonding

ConstructConstruct 2: perinatal care

1- Reliability should be high as a prerequisite for validity

2- Convergent validity should be high (correlation between different methods measuring the same construct).

3- Construct validity is evidenced when heterotrait-monomethod correlations be lower than correlations mentioned in point 2 (function of trait not method).

4- Discriminant validity is evidenced when heterotrait-heteromethod correlations are the lowest among all previously mentioned correlations.

Example

This example can be applied for more than 2 constructs

Multitrait- Multimethod Approach

30•http://www.socialresearchmethods.net/kb/mtmmmat.php

Criterion-Related Validity/Norm-Referenced

Norm-Referenced Item Analysis Procedures

2-Discrimination Index

Item No.

Proportion of Correct answers in Group

Item DiscriminationIndex (D)

(range from -1.00 to +1.00)

Upper 1/4 Lower 1/4

1 90% 20% 0.72 80% 70% 0.13 100% 0% 14 100% 100% 05 50% 50% 06 20% 60% - 0.4

Adapted from : www. distance.fsu.edu/docs/

A negative D value usually indicates that an item is faulty and needs improvement because the item is not discriminating in the same way as the total test.

A positive D value is desirable

•D values greater than +0.20 are desirable for a norm-referenced measure.

Focuses on:

We need evidence for Content validity Construct validity Decision validity Criterion related validity (predictive validity and concurrent

validity).

Criterion-referenced Validity Assessment

1- Content specialists 2-Determination of Interrater Agreement

3-Average Congruency/Percentag

•Two or more content specialists examine the format and content of each item.

•Item-objective congruence focuses on content validity at the item level.

•If more than one objective is used for a measure, the items that are measures of each objective usually are treated separately.

•Interrater Agreement can be evaluated by:

•1- The index of content validity (CVI).

•2- P0 and K as measures of inter-rater agreement with acceptable levels (P0 ≥ 0.80, & K ≥ 0.25).

• Too low P0 and K are indicators of ???

The percentage of items rated congruent by each judge is calculated. • The mean percentage for all judges is the “average congruency percentage”.

• An average congruency percentage of 90% or higher is acceptable .

1-Item-Objective or Item-Subscale Congruence

2-Item Difficulty 3-Item Discrimination

-Based on the ratings of two or more content specialists who assign a value of +1 (definitely measure), 0 (undecided), or -1 (not a measure) for each item upon the item’s congruence with the measure’s objective

-The Index is computed based on formula (6.1) , & ranges from (-1 to +1).

•Item p level is calculated for each item.

•The item p level should be higher for the group that is known to possess more of a specified trait or attribute than for the group known to possess less

•Focus on measurement of performance changes (e.g., pretest/posttest) or differences (e.g., experienced/ inexperienced) between the groups.

-Referred to as (D‘) is directly related to the property of decision validity.

45•A useful adjunct item-discrimination index is provided through the use of Po or K

•Usually a negative discrimination index is due to a faulty item.

1. McBride, D. L., LeVasseur, S. A., & Li, D. (2013). Development and Validation of a Web-Based Survey on the Use of Personal Communication Devices by Hospital Registered Nurses: Pilot Study. JMIR research protocols, 2(2).

2. Sriratanaprapat, J., Chaowalit, A., & Suttharangsee, W. (2012). Development and Psychometric Evaluation of the Thai Nurses' Job Satisfaction Scale. Pacific Rim International Journal of Nursing Research, 16(3).

3. Yildirim, Y., Tokem, Y., Bozkurt, N., Fadiloglu, C., Uyar, M., & Uslu, R. (2011). Reliability and validity of the Turkish Version of the Memorial Symptom Assessment Scale in cancer patients. Asian Pacific Journal of Cancer Prevention, 12, 3389-3396.

Thank you!

ahmad measuremenr reliability and validity

Health & Medicine

fimconvocationke11 gran' mohamad zulhsham ahmad bin aac...

lesson five validity & practicality. contents introduction:...

chapter 4: reliability and validity · 2014-09-10 ·...

validity + reliability

external validity research hypotheses, findings & validity...

october 19 -...

1 daftar absensi peserta bimbingan teknis litigasi …...

validity. major kinds of validity validity: the test...

mutual recognition arrangement...iso/iec 17025 ilac iajapan...

threats to validity...2019/10/07 · threats to validity...

writers reviewer - belajar jadi guru | sekedar berbagi yang...

internal structure evidence of validity · internal...

research validity & threats to validity

validity lecture overview overview of the concept different...

ahmad shaifful anuar ahmad shukor

reability & validity

scanned by camscanner - kulgam · subzar ahmad dar mohd...

ahmad khairi ahmad azhar.pptx

validity ppt1

ahmad firdaus bin ahmad asri