thinking about the think-aloud method to guide development of assessments of modified achievement...
TRANSCRIPT
Thinking About the Think-Aloud Method to Guide Development of
Assessments of Modified Achievement Standards
Presentation at the OSEP GSEG Project Directors Conference
Steve FerraraJanuary 16, 2008
Think-Alouds and Modified Assessments Steve Ferrara2
Today
No one right way to do cognitive labs (think-alouds) Background on cognitive labs Overview and selected issues in conducting
cognitive labs Do not cover data analysis, synthesis, and interpretation
and use Adapting cognitive labs for our target students and
for assessments of modified achievement standards I don’t know how you plan to use cognitive labs in the
process of developing and validating items, so I have planned some general comments
Think-Alouds and Modified Assessments Steve Ferrara3
Two general principles
As with most research methodologies, there is no one right way to do cognitive labs Think-alouds used in reading comprehension
research, survey item development, achievement item development and validation, human factors research and evaluation (e.g., usability studies),…
There are principles and practices that enable us to produce verbal data that can be interpreted reliably and validly and that is practically useful
Think-Alouds and Modified Assessments Steve Ferrara4
So what are cognitive labs?
(a) Prompting of a specified population of respondents to (b) think out loud to (c) illuminate how they think while they (d) perform specified tasks
Simply put, we ask respondents to “think out loud” while they perform a task
Think-alouds can be used for physical or cognitive tasks
Think-Alouds and Modified Assessments Steve Ferrara5
Example: think-aloud prompt I want you to say out loud anything that you are thinking while you
are reading and trying to answer these science questions. Some things you might be feeling include... (hungry, tired, bored;
this is interesting, hard). You might think things like... (this is a biology question; I don’t
understand the question; I’m going to reread this part; we did this in class last year).
Say anything you are thinking to yourself…What I’m most interested in is the stuff you are doing in your head – while you are answering the question – that helps you to understand the question and figure out the answer.
(From the ICV project; see Ferrara, Duncan, et al., 2004)
Think-Alouds and Modified Assessments Steve Ferrara6
Thinking aloud
In verbal reporting, respondents: Bring information into attention (When necessary) convert the information into
verbalizable code Vocalize their thinking
(Ericsson & Simon, 1993, p. 16)
Crucial considerations: Are respondents aware of the information they use during
task completion? Can they verbalize it?
Think-Alouds and Modified Assessments Steve Ferrara7
Why do cognitive labs? In general, to illuminate respondents’ cognitive
processing while they perform a task Exploratory goals
e.g., What reading comprehension strategies do students use when…?
Refinement goals e.g., Improve clarity and fidelity of interpretation of survey
items(Desimone & LeFloch, 2004)
Validation goals e.g., Ensure that achievement items elicit intended
knowledge, skills, and processes (Ferrara, Duncan, et al., 2004; Leighton, 2004)
Think-Alouds and Modified Assessments Steve Ferrara8
Rough history Early empiricists in psychology: introspection Behaviorism in 1930s: introspection fell into ill repute Surveys and polls: error (e.g., 1948 presidential election) and
new interest in wording of questions and tasks New Society evaluations in 1960s: studies of behavior rather
than opinion Sudman and Bradburn in 1970s: largest effect on responses was
tasks (e.g., wording), not interviewers or respondents Emergence of cognitive psychological research in 1970s and
beyond: implications for survey development 1990s: NCES surveys and Voluntary National Test reading and
mathematics items in 1997-2000, reading research, SEPT, etc. 2000s: Expanding application to educational achievement test
items
Think-Alouds and Modified Assessments Steve Ferrara9
Overview and selected issues in conducting cognitive labs
Retrospective and concurrent Open-ended and moderately or highly
focused (specificity of tasks, uses of probes) Respondent sampling and generalizability Task sampling and generalizability Task difficulty Respondent willingness and effectiveness in
thinking aloud and verbalizing
Think-Alouds and Modified Assessments Steve Ferrara10
Concurrent and retrospective think-alouds
Describing thinking during task completion or after task completion
Trade-offs Thinking aloud may alter task performance Recall and reconstruction may differ from the
thinking that actually occurred
Think-Alouds and Modified Assessments Steve Ferrara11
Open-ended and moderately or highly focused
Open-ended and exploratory “Please think out loud while you respond to the following
items.”
Moderately focused “…Remember to tell me what you think about when you
respond. Tell me about how you understand the item, how you select a response, and how you know which response is correct.”
Probes “How did you decide to select that response? What
information in the item and from school did you use?”
Think-Alouds and Modified Assessments Steve Ferrara12
Grade 6 science item
Think-Alouds and Modified Assessments Steve Ferrara13
Illustrative verbal reports
Thinking aloud “A. The candy will reach Bill. No that can’t be
right…B. The candy will go behind Bill…The possibility is the candy wouldn’t reach Bill and then it probably will drop.”
Response to a retrospective probe (“How did you get that answer?”) “Because I looked and if it’s, well, how it’s going
around and he’s going forward so…”
Think-Alouds and Modified Assessments Steve Ferrara14
Respondent sampling and generalizability
Number of respondents Rule of thumb: 9 respondents Internal studies in usability testing suggest that little new
information is gained after ~9 respondents Representativeness of the population of inference
9 for each key subgroup in the population or 9 total? Typical subpopulations (e.g., racial-ethnic, gender) or
those more likely to be relevant (e.g., instructional program, gender)?
Often, we don’t know enough about which subpopulations may process differently
Cost affordability is a consideration
Think-Alouds and Modified Assessments Steve Ferrara15
Task sampling and generalizability
All items, a random sample of items, or exemplars from item subsets (e.g., item families)?
We may not know enough about the tasks (e.g., item families) to sample effectively (and that’s why we’re conducting cog labs)
Numbers of respondents, time per respondent, and cost affordability are considerations
Think-Alouds and Modified Assessments Steve Ferrara16
Task difficulty
Rule of thumb: Select tasks that are moderately difficult for respondents Alternately, select respondents that are well matched to the
tasks
Consideration: Select tasks in a range of difficulties Some respondents can verbalize about easy and routinized
tasks Some respondents can verbalize about their thinking, even
for tasks that are too difficult or that they don’t know about
Think-Alouds and Modified Assessments Steve Ferrara17
Respondent willingness and effectiveness in thinking aloud and verbalizing
Some respondents are reticent Think of middle-schoolers Think of lower achievers
(e.g., Ferrara, Albert, et al., 1996)
Some respondents are unaware of their thinking (i.e., what information they heed)
Some respondents may be willing and aware, but do not verbalize their processing in illuminating or useful ways…
Think-Alouds and Modified Assessments Steve Ferrara18
Target students for assessments of modified achievement standards
Pursuing grade-level content standards but may not progress at the same rate as their peers
May not be comfortable about verbalizing about what they know and don’t know, may not be particularly metacognitive (i.e., aware), may not verbalize effectively
Think-Alouds and Modified Assessments Steve Ferrara19
Example from a GSEG project (implications to follow)
Ohio, Minnesota, Oregon, AIR Persistently low-performing SWDs
(PLP/SWD) Borrowed PLP idea from earlier Georgia work
Students in the lowest achievement level for two or three years (depends on the student cohorts)
Think-Alouds and Modified Assessments Steve Ferrara20
Example (cont.)
Reading items that function adequately for PLP/SWDs—psychometric definition P values .4-.6 Point biserials and polyserials GE .20 And items that did not function well
Try to determine what distinguishes adequately functioning items and how to make other items more psychometrically sound for PLP/SWDs
(One of several research activities to identify item and test modifications to provide valid and accessible items for PLP/SWDs)
Think-Alouds and Modified Assessments Steve Ferrara21
Example (cont.)
Initial findings for one cohort Adequately functioning items
Little and no inference required by comprehension items
Vocabulary items: Definitions in the text You can put your finger on the answer in the
passage Other items
Inference and synthesis required
Think-Alouds and Modified Assessments Steve Ferrara22
Further…
The project will define eligibility, in part by identifying students “at the bottom of” grade-level assessments and “at the top of” alternate assessments
Think about trying to get verbal data from students who currently participate in alternate assessments
Think-Alouds and Modified Assessments Steve Ferrara23
Implications for cog labs with target students They probably are fairly concrete thinkers They are not likely to be highly verbal (in the
colloquial sense) They may be reticent They may not be highly metacognitive (i.e., aware)
How much useful verbal data might we expect to get from students who are likely to be eligible for assessments of modified achievement standards?
Some encouraging results regarding think-alouds with students with learning disabilities
(Johnstone, Liu, Altman, Thurlow, 2007)
Think-Alouds and Modified Assessments Steve Ferrara24
This is not an argument against using cog labs in this situation
But choose the items (and other assessment tasks), students, and think-aloud prompts and probes with the target students clearly in mind
Also, maybe consider a new idea: group cognitive labs As far as I know, this idea has not been proposed
elsewhere Think of it as focus groups where the focus is cognitive
processing while responding to test items
Think-Alouds and Modified Assessments Steve Ferrara25
Group cognitive labs idea
4-6 respondents per group Probably homogeneous in terms of
achievement, verbalization, etc. OTL is an important consideration Get diversity and generalizability across groups
Could matrix sample items so that respondents do individual think-alouds for 2-3 items
Think-Alouds and Modified Assessments Steve Ferrara26
Group cognitive labs idea (cont.)
Retrospective reports Training and practice:
Thinking aloud Reporting similarities and differences in thinking of other
respondents Avoiding being unduly influenced by others’ thinking
Round-robin think-alouds Respondent A thinks aloud for item 1 Other respondents report similarities and differences Etc.
Think-Alouds and Modified Assessments Steve Ferrara27
Group cognitive labs idea (cont.)
Possible advantages Cost-efficiency Broader sampling Possible improvement in quantity and quality of
verbal reports Possible drawbacks
Increase in reticence Respondents influence each other, obscure
individual processing, and pollute verbal reports
Think-Alouds and Modified Assessments Steve Ferrara28
Good luck!
Steve Ferrara
CTB McGraw-Hill
Think-Alouds and Modified Assessments Steve Ferrara29
ReferencesDesimone, L. M., & Le Floch, K. C. (2004). Are we asking the right questions? Using cognitive interviews to improve
surveys in education research. Educational Evaluation and Policy Analysis, 26(1), 1-22.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. (rev. ed.). Cambridge, MA: The MIT Press.
Ferrara, S., Albert, F., Gilmartin, D., Knott, T., Michaels, H., Pollack, J, Schuder, T., Vaeth, R., & Wise, S. (1996, April). A qualitative study of the information examinees consider during item review on a computer-adaptive test. In L. Wolf, (Moderator), Item review in computerized adaptive testing. Symposium conducted at the annual meeting of the National Council on Measurement in Education, New York.
Ferrara, S., Duncan, T. G., Freed, R., Velez-Paschke, A., McGivern, J., Mushlin, S., Mattessich, A., Rogers, A., & Westphalen, K. (2004). Examining test score validity by examining item construct validity: Preliminary analysis of evidence of the alignment of targeted and observed content, skills, and cognitive processes in a middle school science assessment. Paper presented at the annual meeting of the American Educational Research Association, San Diego.
Johnstone, C., Liu, K., Altman, J., & Thurlow, M. (2007). Student think aloud reflections on comprehensible and readable assessment items: Perspectives on what does and does not make an item readable . (Technical Report 48.) Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.
Leighton, J. P. (2004). Avoiding misconception, misuse, and missed opportunities: the collection of verbal reports in educational achievement testing. Educational Measurement: Issues and Practice, 23(4), 6-15.