top 10 flashpoints in the evaluation of teaching
TRANSCRIPT
TOP 10 FLASHPOINTS IN THE
EVALUATION OF TEACHING
KEY REFERENCES:
1. Arreola, R. A. (2007). Developing a
comprehensive faculty evaluation
system (3rd ed.). San Francisco:
Jossey-Bass (Anker).
2. Seldin, P. (Ed.). (2006). Evaluating
faculty performance. San Francisco:
Jossey-Bass (Anker).
3. Hativa, N. (2013). Student ratings of
instruction: A practical approach to
designing, operating, and reporting.
Seattle, WA: CreateSpace.
4. Hativa, N. (2013). Student ratings of
instruction: Recognizing effective
teaching. Seattle, WA: CreateSpace.
5. Spooren, P., Brockx, B., & Mortelmans,
D. (2013). On the validity of student
evaluation of teaching: The state of the
art. Review of Educational Research,
83(4), 598–642.
TEACHING EVALUATION POLL
111 1. How many evaluate teaching in your university or college? Yes No 2. How many use a student rating scale? Yes No 3. How many use peer observation? Yes No 4. How many use other evidence? Yes No 5. How many would prefer to go eat again instead of listening to me? Yes No
TOTAL SCORE ____
A FEW FACTOIDS: 1. 15,000+ studies on teaching
effectiveness (60+ yrs.)
2. 2000 refs on student ratings (90 yrs.)
3. 80+ year history on scaling
4. Student ratings have dominated
as frequently the only measure
MAJOR OUTCOMES: YOU WILL BE
1. armed with the major problems &
issues in the evaluation of teaching;
2. able to decide on the most appropriate
course of action for the decisions you
make;
3. given answers to your burning questions.
TYPES OF DECISIONS:
1. Formative: teaching improvement
2. Summative: contract renewal,
promotion/tenure, merit pay/increase
3. Program: curriculum revision
WHY BOTHER? How important is teaching?
1. Faculty: to improve their courses
& teaching
2. Administrators: to make fair
& equitable decisions about
hiring, firing, promoting, and
demoting part- & full-time faculty
PROFESSIONAL STANDARDS:
Personnel Evaluation Standards Program Evaluation Standards Standards for Educational &
Psychological Testing
LEGAL STANDARDS:
U.S. Equal Employment
Opportunity Commission’s (EEOC)
Uniform Guidelines on Employment
Selection Procedures
COURT DECISIONS: (faculty appointments, pay, contract
renewal, teaching awards, and
promotion and tenure)
1. Substantial or Academic
Deference
2. Limited Deference
3. 80/20 Rule for Adverse Impact
A FEW GOOD MEN (1992):
© Columbia Pictures Industries
and Castle Rock Entertainment
TOP 10
FLASHPOINTS
FLASHPOINT
a critical stage in a process, trouble spot, contentious issue, volatile hot button, or lowest temperature at which flammable liquid will give off enough vapor to ignite
Derived from 2 Latin words: flashus, meaning “your shorts,” and pointum, meaning “are on fire”
A. Student ratings are limited &
incomplete
B. Student ratings are fallible
Triangulate “complementary” multiple sources of evidence for each decision
15 Sources of Evidence:
1. Student Ratings2. Peer Observations3. Peer Review of Course Materials4. External Expert Ratings5. Self-Ratings6. Videos7. Student Interviews8. Exit and Alumni Ratings
9. Employer Ratings10. Mentor’s Advice11. Administrator Ratings12. Teaching Scholarship13. Teaching Awards14. Learning Outcome Measures15. Teaching Portfolio
A. Start with “formative decisions”
B. Then consider each type of
“summative decision”
Tailor the combo of sources of evidence you pick to the decision being made
360° ASSESSMENT OF A PROFESSOR
(Formative Decisions)
PROFESSOR
360° ASSESSMENT OF A PROFESSOR
(Formative Decisions)
SELF-RATINGS
VIDEOSelf/Peer
MENTOR
PEERRATINGS
STUDENTINTERVIEWS
STUDENT RATINGS
PROFESSOR
360° ASSESSMENT OF A PROFESSOR
(Summative Decisions)
PROFESSOR
360° ASSESSMENT OF A PROFESSOR
(Summative Decisions)
SELF-RATINGS
VIDEO(optional)
MENTOR(optional)
PEERRATINGS(optional)
DEPT.CHAIR
STUDENTRATINGS
PROFESSOR
A. Most “home-grown” scales are PUTRID
B. Commercially-developed scales
typically meet professional standards
Whatever scale you
develop, do it right or don’t
do it at all.
HOW MANY TYPES OF
ITEMS ARE THERE?
2
ITEM WORLD
1. Test Items (correct and incorrect answers)
If your urologist says, “You have a
kidney stone the size of Ohio,”
that is an example of a(an)
A. analogy.
B. hyperbole.
C. metaphor.
D. simile.
E. rather disturbing thought.
2. Scale/Questionnaire Items: (response = answer)
a. Opinion/Attitude: no incorrect response b. Factual: response = fact
(1) Sociodemographic Characteristics (2) Actual Practices
RATING ITEM AUTOPSY
STIMULUS RESPONSE
STIMULUS RESPONSE Sentence Anchors (My instructor is a knucklehead.) (Strongly Agree – Strongly Disagree)
Phrase Anchors (Enunciated clearly) (Excellent – Poor)
Word Anchors (Volume) (Effective – Ineffective)
ANCHOR WORLD
1. Intensity
2. Evaluation
3. Frequency
4. Quantity
5. Comparison
SAMPLE FLAWED STATEMENTS
• My instructor is a bade speler and a morron. (double-barreled content)
• My instructor treated all students with respect. (student not in a position to rate)
• My instructor entered class on a cable from the ceiling like Ethan Hunt in Mission: Impossible. (fact; not a behavior)
A. Online advantages in admin., cost,
turnaround, & responses outweigh
p & p
B. Most online problems can be overcome
3 Options:
1.in-house admin.
2.vendor admin. with “home-grown” scales
3.vendor admin. & rating scale
A. Scale must be admin. under identical
conditions to all students
B. Control time, place, conditions, &
situational factors
Admin. within a narrow
window of 1–2 days before
or after final evaluation
A. 30–50% rates provide an inadequate
data base for decisions
B. 20 strategies to boost response
rates
Use a combination of admin., organizational, & incentive procedures (early posting of grades has the highest increase in rates)
RESPONSE RATE Effective strategies:
a. Faculty and administrators communicate
importance of students’ input and how
results will be used
b. Assurance of anonymity
c. User-friendly system
d. Faculty “assign” students to complete evals.
e. Completion is part of course requirements
f. Provide extra credit, points, or other incentives
g. Draw raffle or lottery prizes
*h. Withhold students’ early access to final grades
A. Global items provide the illusion of
simplicity, accurate & reliable info,
& pinpoint precision
B. They can be unreliable, unrepres. of
teaching behaviors, & illegal for
personnel decisions
Cease & desist use of global items for summative decisions
Use total scale rating (mean/median) and subscale composite ratings
COMMERCIAL BREAK
ANGER MANAGEMENT (2003):
© Revolution Studios Distribution Co.
and Columbia TriStar Entertainment
We Now Rejoin Ron’s Presentation
Already in Progress.
A. Eliminate “neutral” anchor
B. Use “NA” during field test only
C. Use “NO” in observation scales
D. Assign a midpoint score to blank
responses
Clearly specify scoring procedure for each of 4 response options
A. Link anchor, item, subscale, & total
scale ratings to specific decisions
B. Define purpose of C-R & N-R
interpretations, especially norms
C-R ratings are collected WITHIN each course; N-R ratings are based on OUTSIDE course norms
DO NOT rank faculty under any conditions
A. Identify F2F & online common &
unique characteristics for teaching
effectiveness
B. Consider various options to create
scales to measure those characteristics
Design or select scales that tap both what is in common & what is unique to F2F & online courses
FINAL ADVICE
1. Use multiple sources of evidence
for all faculty “job” decisions
2. Develop or select the best quality
measures to provide that evidence
3. Match the evidence to your decisions
4. Provide a narrow window to standardize
administrations
5. Use a combination of methods to maximize
response rate
BOTTOM LINE