standards seem like a good idea but how do we validate them? · standards seem like a ......

20
'Standards seem like a good idea but how do we validate them?' Gordon Stanley Presented at 'The Blind Assessor: Are we constraining or enriching student learning?' symposium. 22/11/2010 1 Standards seem like a good idea but how do we validate them? Gordon Stanley [email protected] Feature presentation at ‘The Blind Assessor: Are we constraining or enriching student learning’ symposium. 22 November 2010

Upload: tranduong

Post on 01-Apr-2018

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   1  

Standards seem like a good idea but how do we

validate them?

Gordon Stanley [email protected]

Feature presentation at ‘The Blind Assessor: Are we constraining or enriching student learning’ symposium. 22 November 2010

Page 2: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   2  

Media Lobbying by Head Masters

Media Lobbying by Head Masters

Page 3: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   3  

Background to contemporary standards-based reporting

Most  education  systems  around  the  world  now  report  their  student  performance  with  reference  to  standards.  

This  move  has  been  hastened  by  the  impact  of  international  testing  and  the  public  policy  focus  on  education  in  the  development  of  human  and  social  capital.  

What  has  this  meant  for  assessment  practice?  

  Assessment  and  grading  do  not  take  place  in  a  vacuum.  Professional  judgments  about  the  quality  of  student  work  together  with  interpretations  of  such  judgments  are  always  made  against  some  background  framework  or  information.  

  Modern  assessment  practice  tries  to  make  the  framework  explicit.  

Page 4: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   4  

Aggregate Approach

  Pre-­‐dates  modern  outcome/standards  approach.  

  Scores  on  different  assessment  tasks  are  added  together  and  converted  to  a  100  point  (percentage)  scale.  

  Component  scores  may  be  weighted  before    being  added.  

  While  not  norm-­‐based  it  is  not  standards-­‐referenced  unless  components  are.  

Traditional Approaches

Grading  Using  Aggregate  Scores  of  100  

 A:    90-­‐100    B:  80-­‐89    C:  65-­‐79    D:  50-­‐64  (Passing  grade)    E:  45-­‐49  (Conditional  pass)    F:  <  45  

Page 5: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   5  

Norm-based Grading

Norm-­‐based  or  grading  on  a  curve  pre-­‐sets  distribution  of  grades  achievable.  Typical  e.g.:  

 A:  4-­‐6%    B:  8-­‐12%    C:  20-­‐30%    D:  45-­‐55%    E:  5-­‐15%  

Move to Standards-referenced Assessment

Aggregating  of  scores  by  itself  does  not  enable  one  to  be  conXident  that  grades  can  be  interpreted  in  consistent  ways  unless  the  tasks  that  are  being  scored  have  been  designed  in  such  a  way  that  higher  scores  can  only  be  achieved  through  demonstrating  higher  levels    of  the  knowledge    and  skills  required.  

Norm-­‐based  grading  only  prevents  ‘grade  inXlation’  but  does  not  allow  demonstration  of    improved  student  cohort  performance  when  it  occurs.  

Page 6: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   6  

International Trends With  the  growth  of  international  education  and  the  global  human  capital  market  countries  are  looking  beyond  their  borders  for  comparability  in  student  outcomes/results.    

How  to  meet  this  challenge  of  comparable  assessment  is  an  important  issue  for  education  systems.    

In  Europe  and  in  many  Commonwealth  countries  the  Xirst  approach  to  comparison  has  been  to  develop  qualiXications  frameworks  to  classify  levels  of  qualiXications  and  to  de;ine  their  common  characteristics  as  outcome  standards.    

Definition of Standards Across Systems

Standards  describe  what  it  is  that  students  should  be  able  to  know  and  do.  

While  this  deXinition  is  basically  the  same  across  countries  and  systems;  what  varies  is  the  “breadth”  of  the  variable  (i.e.  the  descriptions);  “the  name(or  type/purpose)”.  

Standards  written  for  quali;ication  frameworks  are  generic  and  need  to  be  interpreted  at  a  given  subject/discipline  level.  

Page 7: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   7  

Level of Specification of Outcome Standard

Specifying  standards  in  terms  of  observable  student  outcomes  is  not  easy.  

Grading  against  speciXic  objectives  often  leads  to  Xiner  and  Xiner  level  of  speciXication  (‘check-­‐list’  approaches).  

Danger  is  that  there  can  be  too  many  to  consider  and  each  element  may  become  operationally  isolated  from  each  other.  

Design Requirements

  Assessment  tasks  need  to  be  designed  to  elicit  student  performance  and  enable  it  to  be  judged  with  respect  to  the  intended  outcomes.  

  The  syllabus  and  teaching  program  needs  to  be  explicit  about  the  developmental  continuum  from  novice  to  expert  in  the  Xield  of  study.  

Page 8: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   8  

Design Requirements continued

  The  assessment  program  has  to  allow  that  higher  scores  or  higher  grades  are  based  on  evidence  about  progress  on  the  developmental  continuum  from  novice  to  expert.  

  Generic  standards  descriptors  can  help  to  frame  subject  speciXic  criteria  for  the  reporting  of  marks  and  grades.  

Generic Grade Descriptions

A:  Clear  mastery  of  all  course  objectives  with    intellectual  initiative  demonstrated  at  high  level.  

B:    Substantial  mastery  of  most  course  objectives    with  relevant  analytical  skills.  

C:  Sound  mastery  of  major  course  objectives  with    understanding  of  most  of  basic  course.  

D:    Some  mastery  of  a  range  of  objectives  with  basic    understanding.  

E:    Few  course  objectives  met.  

Page 9: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   9  

Accountability   Performance  management  and  the  effects  of  the  accountability  agenda  mean  that  the  results  of  testing  and  public  examinations  are  high  stakes  for  many  of  the  stakeholders  in  education.  

  Students:  when  results  determine  entry  to  selective  higher  education  program.  

  Teachers:  when  results  are  used  for  accountability.  

  School  systems:  where  funding  is  dependent  on  meeting  government  targets.  

  End-­‐users:  want  credible  results.  

Effects of Accountability   Student  performance  on  external  examinations  and  tests  are  seen  as  performance  indicators  for  education  systems.    

  In  this  context  everyone  has  an  interest  in  improved  performance.  

  With  strong  incentives  for  improvement  how  do  we  know  whether  or  not  results  are  more  inXluenced  by  this  need  rather  than  by  valid  evidence  of  improvement?  

  ‘Is  grade  inXlation  occurring?’  common    media  debate.  

Page 10: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   10  

l  

Typical Popular Comment

How should we respond?

  This  letter  raises  the  issue  commonly  appearing  in  the  media  where  employers  complain  that  the  skills  and  knowledge  expected  by  the  level  of  student  results  are    not  present.  

  What  is  the  evidence  for  this  view  and  how  can  school  and  assessment  authorities  address  the  issue?  

Page 11: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   11  

Grade Inflation?

  Media  response  to  publication  of  results:  are  standards  rising  or  falling?  

  Grades  represent  the  achievement  standard  so  should  not  rise  or  fall.  

  Numbers  achieving  the  ‘standards’  implicit  in  the  grade  can  rise  or  fall.  

  Standards  once  established  should  remain  relatively  constant.  

t/  

Page 12: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   12  

Source: www.gradeinflation.com. Constructed by Rojstracze

Grade trend in US universities and colleges 1920 to 2006

Standards Versus Norms

  Media  problem  is  caused  by  the  move  away  from  normative  equating  procedures  for    reporting  results.  

  When  normative  scaling  is  applied  it  is  relatively  easy  to  have  a  Xixed  proportion  of  candidates  achieving  the  highest  reported  grade  each  year.  

  With  standards-­‐referenced  systems  outcomes  are  determined  by  a  judgment  process  as  to  whether  or  not  the  standard  has  been  reached.  

  How  stable  is  the  judgment  process?  

Page 13: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   13  

One  of  the  main  differences  between  norm-­‐  and  standards-­‐referencing  is  that  with  the  latter  there  is  no  inherent  limit  to  the  percentage  of  students  achieving  a  particular  standard.    In  theory  it  is  possible  for  all  students  to  achieve  any    performance  standard.  

This  means  the  percentage  achieving  particular  performance  bands  or  levels  can  vary  from  year  to  year.  The  question  is  how  do  we  know  that  the  percentage  we  have  nominated  as  achieving  the  bands  is  comparable  from  year  to  year?  

Is  it  good  enough  to  just  attest  that  due  process  has  been  carried  out  or  is  there  a  need  for  more  substantive  information  regarding  the  percentages  to  be  produced?    

Validating Standards

Interpreting Results Over Time

  How  should  we  interpret  variation  in  the  numbers  achieving  the  top  grade  over  time?    

  Time  series  data  often  show  incremental  creep  with  more  students  achieving  the  top  levels  of  performance  each  year.    

  This  result  then  leads  to  debate  about  whether  or  not  standards  are  falling  or  whether  the  education  system  itself  is  delivering  some  consistent  improvement  (Wikstrom,  2005).  

Page 14: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   14  

The Need to Validate

  Clearly  a  fundamental  question  which  arises  in  education  systems  is  how  do  we  validate  standards-­‐referenced  results?  

  Public  examination  authorities  are  expected  to  maintain  standards.  

  There  are  a  number  of  procedures  and    tools  to  assist  the  process  in  systems  where  results  are  referenced  to  standards.    

  They  all  ultimately  rely  on  professional  judgment  to  some  degree.  

A Scenario   An  education  system  introduces  a  new  professional  development  programme  aimed  at  increasing    student  outcomes.  

  The  ‘high  stakes’  exit  examination  is  reported  with  respect  to  standards.  Much  effort  is  put  into  ensuring  that  standards  are  being  maintained.    

  Student  performance  on  the  exit  examination  appears  to  be  much  improved.  

  Is  it  an  ‘easier’  examination,  therefore  cut-­‐scores  to  determine  grades  need  to  be  raised?          or  

  Is  it  evidence  that  the  professional  development  programme  is  bearing  fruit?  

   How  would  we  know?  

Page 15: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   15  

The Validation Problem Professional  judgement-­‐based  standard  setting  exercise  has  been  conducted  and  the  percentages  achieving  particular  grade  levels  determined.  

The  examinations  are  high  stakes  and  previous  years’  examinations  have  been  used  by  teachers  and  students  to  prepare  for  the  current  examination.  

The  performance  standards  that  are  used  to  assign  grades  are  available  to  school  systems  so  teachers  and  students  have  the  opportunity  to  “internalise”  the  performance  standards.  

What  are  some  “other”  ways  that  we  can  get  alternate  multiple  sources  of  information  about  the  current  distribution  (relative  to  the  previous  distributions)  that  will  enable  us  to  judge  (validate)  the  outcomes  of  the  results  of  the  standard  setting  exercise?  

The  emphasis  in  these  questions  is  whether  the  alignment  of  the  cut-­‐score  to  the  “borderline  student”  from  one  year  is  equivalent  to  the  new  cut-­‐score  of  the  same  borderline  student  in  subsequent  years.  

Create  alternate  multiple  sources  of  information  that  indicate  the  relative  stability  of  the  results  of  the  standard  setting  exercise.  If  these  different  sources  give  similar  information  then  we  can  be  more  conXident  that  the  results  are  comparable  and  any  change  is  genuinely  a  change  in  the  distribution  of  performance  from  one  year  to  the  next  (Convergent  Validation).  

Validating Standards

Page 16: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   16  

Comparability   For  example:  

  Using  statistical  moderation  to  moderate  school-­‐based  assessments  (common  students  and    common  items)  

  Using  common  items  (moderating  test)  to  compare  performance  over  time  in  the  same  subjects  in    Hong  Kong  

  Using  common  judges  to  compare  performance  over  time  in  subjects  in  New  South  Wales  and  the  GCSE  

  Using  common  students  to  equate  the  distributions  of  different  subjects  in  the  same  year.          

Professional Judge Approaches Panel  of  independent  judges  interrogates  data  and  process  to  

make  their  own  independent,  professional  judgement  about  the  relative  differences  between  the  distributions  from  the  different  years.  This  could  involve  interviewing  the  examiners,  markers,  judges  and  asking  them  such  questions  as  “Is  this  year’s  paper  more  difXicult  then  last  year’s”;  “Is  there  a  difference  in  the  ability  of  this  year’s  cohort  relative  to  the  previous  year?”;  etc.    

Complete  independent  standard  setting  exercise  using  equivalent  judges  or  use  judges  from  a  different  educational  system  who  are  familiar  with  the  content  

/  

Page 17: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   17  

Ask  the  examiners  to  

Have  teachers  in  the  system  estimate  the  cut-­‐scores  on  the  examination  after  the  examination  has  started  and  before  the  examination  is  complete  so  that  the  students  have  not  contaminated  the  judgement  by  providing  their  views  on  the  relative  difXiculty  of  the  paper  to  the  teachers  estimate  the  cut-­‐scores  when  they  set  the  examination  and  compare  the  two  sets  of  cut-­‐scores  

/  

Professional Judge Approaches

Use  pair-­‐wise  comparison  method  to  equate  the  scales   from  two  adjacent  years:  

Choose  items  from  different  examinations  (including  the  one  currently  being  completed  by  the  students)  ask  a  number  of  teachers  (100  or  more  ~  online)  to  take  each  pair  of  items  in  turn  and  indicate  which  item  is  the  harder  of  the  two.  The  results  from  these  judgements  can  then  be  used  to  produce  a  common  scale  which  can  then  be  used  to  compare  the  cut-­‐scores  from  the  two,  or  more,  years.      

/  

Professional Judge Approaches

Page 18: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   18  

Professional Judgement Method

Advantages Disadvantages

1.  Involves teachers in applying the standards; helps internalise the standards across the system

1.  Transparency in cut-scores ~ what happens if there is a large differential.

2.  It gives the system level authorities feedback as to how well the standard is effectively embedded

2.  Not getting student comparison only getting teacher estimates i.e. teacher effect

3.  Not statistical; relies on professional judgement

3.  Needs to be done online or by phone

4.  Relatively cheap and non-intrusive 4.  Could lack authenticity within the community because the teachers themselves are making the judgements

5.  It is similar in that it validates professional judgement with professional judgement

Common Item Method for Validating Standards

The  Common  Items  Method    involves  moderating  tests:  A  generic  moderating  test  (Core  Skills  or  General  Achievement  

Test)  can  be  developed  and  used  on  a  sample  of  students  in  a  sample  of  subjects  each  year  .  It  needs  to  be  kept  secure.    

The  distributions  of  results  from  different  years  can  then  be  mapped  onto  the  scale  of  the  moderating  test  and  comparisons  can  then  be  made  to  make  sure  that  the  cut-­‐scores  do  align  (within  reason).  

Calibrated  item  banks  can  be  used  to  develop  the  moderating  tests  so  that  the  security  of  the  moderating  tests  is  not  a  major  issue  

Page 19: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   19  

Advantages and Disadvantages - Moderating Test Method -

Advantages Disadvantages

1.  It is perceived to be an alternate to professional judgement

1.  Relatively costly and quite intrusive

2.  It is well known and accepted as a method to equate and compare distributions

2.  May be difficult to motivate students ~ this could lead to a diminution of validity

3.  One single test can be used to accommodate most subjects and sub-tests of the test can be used to equate the different subjects

3.  Generic tests are only loosely linked to the actual content in the examinations

4.  Actual student performance is used to compare the subjects

4.  Adds to the examination load of students

5.  Statistical in nature and would be relatively difficult for teachers and the community to understand

6.  Security is an issue

Common Student Method

This  method  involves  students  sitting  Item  from  examinations  from  different  years  

 Students  from  a  similar,  but  different,  system  could  be  asked  to  complete  a  shortened  composite  paper  that  comprises  items  (that  assess  material  that  is  known  to  the  students  in  the  chosen  system)  from  the  years  that  need  to  be  equated  or  compared.  The  results  can  then  be  used  to  place  the  distributions  onto  a  common  scale  so  that  the  cut-­‐scores  can  then  be  compared.  

Page 20: Standards seem like a good idea but how do we validate them? · Standards seem like a ... Norm-based Grading Normbasedorgradingonacurvepresets ... It gives the system level authorities

'Standards  seem  like  a  good  idea  but  how  do  we  validate  them?'  Gordon  Stanley  

Presented  at  'The  Blind  Assessor:  Are  we  constraining  or  enriching  student  learning?'  symposium.  22/11/2010   20  

Alternative Scenarios - School Based Assessment -

  Professional  judgement  based  standard  setting  exercise  at  the  subject  level  by  different  teachers  in  different  schools.  

  No  subject  based  examinations;  but  generic  skills    test  available.    

  Consensus  or  social  moderation  methods  are  used  to  achieve  comparability  across  schools  (reliability  assumed  at  the  school-­‐level)  

  How  can  we  get  an  alternate  estimate  of  the  percentage  achieving  the  performance  standards?    

Clearly  validation  is  not  a  straight  forward  exercise.    Most  procedures  are  expensive  in  time  and  professional  resource  and  cannot  provide  unequivocal  answers.  

Validation  does  not  occur  in  a  neutral  environment  when  accountability  agendas  place  such  high  stakes  on  yearly  improvement.