top 10 flashpoints in the evaluation of teaching

TOP 10 FLASHPOINTS IN THE

EVALUATION OF TEACHING

KEY REFERENCES:

1. Arreola, R. A. (2007). Developing a

comprehensive faculty evaluation

system (3rd ed.). San Francisco:

Jossey-Bass (Anker).

2. Seldin, P. (Ed.). (2006). Evaluating

faculty performance. San Francisco:

Jossey-Bass (Anker).

3. Hativa, N. (2013). Student ratings of

instruction: A practical approach to

designing, operating, and reporting.

Seattle, WA: CreateSpace.

4. Hativa, N. (2013). Student ratings of

instruction: Recognizing effective

teaching. Seattle, WA: CreateSpace.

5. Spooren, P., Brockx, B., & Mortelmans,

D. (2013). On the validity of student

evaluation of teaching: The state of the

art. Review of Educational Research,

83(4), 598–642.

TEACHING EVALUATION POLL

111 1. How many evaluate teaching in your university or college? Yes No 2. How many use a student rating scale? Yes No 3. How many use peer observation? Yes No 4. How many use other evidence? Yes No 5. How many would prefer to go eat again instead of listening to me? Yes No

TOTAL SCORE ____

A FEW FACTOIDS: 1. 15,000+ studies on teaching

effectiveness (60+ yrs.)

2. 2000 refs on student ratings (90 yrs.)

3. 80+ year history on scaling

4. Student ratings have dominated

as frequently the only measure

MAJOR OUTCOMES: YOU WILL BE

1. armed with the major problems &

issues in the evaluation of teaching;

2. able to decide on the most appropriate

course of action for the decisions you

make;

3. given answers to your burning questions.

TYPES OF DECISIONS:

1. Formative: teaching improvement

2. Summative: contract renewal,

promotion/tenure, merit pay/increase

3. Program: curriculum revision

WHY BOTHER? How important is teaching?

1. Faculty: to improve their courses

& teaching

2. Administrators: to make fair

& equitable decisions about

hiring, firing, promoting, and

demoting part- & full-time faculty

PROFESSIONAL STANDARDS:

Personnel Evaluation Standards Program Evaluation Standards Standards for Educational &

Psychological Testing

LEGAL STANDARDS:

U.S. Equal Employment

Opportunity Commission’s (EEOC)

Uniform Guidelines on Employment

Selection Procedures

COURT DECISIONS: (faculty appointments, pay, contract

renewal, teaching awards, and

promotion and tenure)

1. Substantial or Academic

Deference

2. Limited Deference

3. 80/20 Rule for Adverse Impact

A FEW GOOD MEN (1992):

© Columbia Pictures Industries

and Castle Rock Entertainment

TOP 10

FLASHPOINTS

FLASHPOINT

a critical stage in a process, trouble spot, contentious issue, volatile hot button, or lowest temperature at which flammable liquid will give off enough vapor to ignite

Derived from 2 Latin words: flashus, meaning “your shorts,” and pointum, meaning “are on fire”

A. Student ratings are limited &

incomplete

B. Student ratings are fallible

Triangulate “complementary” multiple sources of evidence for each decision

15 Sources of Evidence:

1. Student Ratings2. Peer Observations3. Peer Review of Course Materials4. External Expert Ratings5. Self-Ratings6. Videos7. Student Interviews8. Exit and Alumni Ratings

9. Employer Ratings10. Mentor’s Advice11. Administrator Ratings12. Teaching Scholarship13. Teaching Awards14. Learning Outcome Measures15. Teaching Portfolio

A. Start with “formative decisions”

B. Then consider each type of

“summative decision”

Tailor the combo of sources of evidence you pick to the decision being made

360° ASSESSMENT OF A PROFESSOR

(Formative Decisions)

PROFESSOR


(Formative Decisions)

SELF-RATINGS

VIDEOSelf/Peer

MENTOR

PEERRATINGS

STUDENTINTERVIEWS

STUDENT RATINGS

PROFESSOR


(Summative Decisions)

PROFESSOR


(Summative Decisions)

SELF-RATINGS

VIDEO(optional)

MENTOR(optional)

PEERRATINGS(optional)

DEPT.CHAIR

STUDENTRATINGS

PROFESSOR

A. Most “home-grown” scales are PUTRID

B. Commercially-developed scales

typically meet professional standards

Whatever scale you

develop, do it right or don’t

do it at all.

HOW MANY TYPES OF

ITEMS ARE THERE?

2

ITEM WORLD

1. Test Items (correct and incorrect answers)

If your urologist says, “You have a

kidney stone the size of Ohio,”

that is an example of a(an)

A. analogy.

B. hyperbole.

C. metaphor.

D. simile.

E. rather disturbing thought.

2. Scale/Questionnaire Items: (response = answer)

a. Opinion/Attitude: no incorrect response b. Factual: response = fact

(1) Sociodemographic Characteristics (2) Actual Practices

RATING ITEM AUTOPSY

STIMULUS RESPONSE

STIMULUS RESPONSE Sentence Anchors (My instructor is a knucklehead.) (Strongly Agree – Strongly Disagree)

Phrase Anchors (Enunciated clearly) (Excellent – Poor)

Word Anchors (Volume) (Effective – Ineffective)

ANCHOR WORLD

1. Intensity

2. Evaluation

3. Frequency

4. Quantity

5. Comparison

SAMPLE FLAWED STATEMENTS

• My instructor is a bade speler and a morron. (double-barreled content)

• My instructor treated all students with respect. (student not in a position to rate)

• My instructor entered class on a cable from the ceiling like Ethan Hunt in Mission: Impossible. (fact; not a behavior)

A. Online advantages in admin., cost,

turnaround, & responses outweigh

p & p

B. Most online problems can be overcome

3 Options:

1.in-house admin.

2.vendor admin. with “home-grown” scales

3.vendor admin. & rating scale

A. Scale must be admin. under identical

conditions to all students

B. Control time, place, conditions, &

situational factors

Admin. within a narrow

window of 1–2 days before

or after final evaluation

A. 30–50% rates provide an inadequate

data base for decisions

B. 20 strategies to boost response

rates

Use a combination of admin., organizational, & incentive procedures (early posting of grades has the highest increase in rates)

RESPONSE RATE Effective strategies:

a. Faculty and administrators communicate

importance of students’ input and how

results will be used

b. Assurance of anonymity

c. User-friendly system

d. Faculty “assign” students to complete evals.

e. Completion is part of course requirements

f. Provide extra credit, points, or other incentives

g. Draw raffle or lottery prizes

*h. Withhold students’ early access to final grades

A. Global items provide the illusion of

simplicity, accurate & reliable info,

& pinpoint precision

B. They can be unreliable, unrepres. of

teaching behaviors, & illegal for

personnel decisions

Cease & desist use of global items for summative decisions

Use total scale rating (mean/median) and subscale composite ratings

COMMERCIAL BREAK

We Now Rejoin Ron’s Presentation

Already in Progress.

A. Eliminate “neutral” anchor

B. Use “NA” during field test only

C. Use “NO” in observation scales

D. Assign a midpoint score to blank

responses

Clearly specify scoring procedure for each of 4 response options

A. Link anchor, item, subscale, & total

scale ratings to specific decisions

B. Define purpose of C-R & N-R

interpretations, especially norms

C-R ratings are collected WITHIN each course; N-R ratings are based on OUTSIDE course norms

DO NOT rank faculty under any conditions

A. Identify F2F & online common &

unique characteristics for teaching

effectiveness

B. Consider various options to create

scales to measure those characteristics

Design or select scales that tap both what is in common & what is unique to F2F & online courses

FINAL ADVICE

1. Use multiple sources of evidence

for all faculty “job” decisions

2. Develop or select the best quality

measures to provide that evidence

3. Match the evidence to your decisions

4. Provide a narrow window to standardize

administrations

5. Use a combination of methods to maximize

response rate

BOTTOM LINE

top 10 flashpoints in the evaluation of teaching

Documents

teaching evaluation

effective teaching

teaching portfolio

teaching effectiveness

teaching improvement

teaching scholarship13

teaching awards14

student ratings of instruction