thinking about the think-aloud method to guide development of assessments of modified achievement...

Thinking About the Think-Aloud Method to Guide Development of

Assessments of Modified Achievement Standards

Presentation at the OSEP GSEG Project Directors Conference

Steve FerraraJanuary 16, 2008

Think-Alouds and Modified Assessments Steve Ferrara2

Today

No one right way to do cognitive labs (think-alouds) Background on cognitive labs Overview and selected issues in conducting

cognitive labs Do not cover data analysis, synthesis, and interpretation

and use Adapting cognitive labs for our target students and

for assessments of modified achievement standards I don’t know how you plan to use cognitive labs in the

process of developing and validating items, so I have planned some general comments


Two general principles

As with most research methodologies, there is no one right way to do cognitive labs Think-alouds used in reading comprehension

research, survey item development, achievement item development and validation, human factors research and evaluation (e.g., usability studies),…

There are principles and practices that enable us to produce verbal data that can be interpreted reliably and validly and that is practically useful


So what are cognitive labs?

(a) Prompting of a specified population of respondents to (b) think out loud to (c) illuminate how they think while they (d) perform specified tasks

Simply put, we ask respondents to “think out loud” while they perform a task

Think-alouds can be used for physical or cognitive tasks


Example: think-aloud prompt I want you to say out loud anything that you are thinking while you

are reading and trying to answer these science questions. Some things you might be feeling include... (hungry, tired, bored;

this is interesting, hard). You might think things like... (this is a biology question; I don’t

understand the question; I’m going to reread this part; we did this in class last year).

Say anything you are thinking to yourself…What I’m most interested in is the stuff you are doing in your head – while you are answering the question – that helps you to understand the question and figure out the answer.

(From the ICV project; see Ferrara, Duncan, et al., 2004)


Thinking aloud

In verbal reporting, respondents: Bring information into attention (When necessary) convert the information into

verbalizable code Vocalize their thinking

(Ericsson & Simon, 1993, p. 16)

Crucial considerations: Are respondents aware of the information they use during

task completion? Can they verbalize it?


Why do cognitive labs? In general, to illuminate respondents’ cognitive

processing while they perform a task Exploratory goals

e.g., What reading comprehension strategies do students use when…?

Refinement goals e.g., Improve clarity and fidelity of interpretation of survey

items(Desimone & LeFloch, 2004)

Validation goals e.g., Ensure that achievement items elicit intended

knowledge, skills, and processes (Ferrara, Duncan, et al., 2004; Leighton, 2004)


Rough history Early empiricists in psychology: introspection Behaviorism in 1930s: introspection fell into ill repute Surveys and polls: error (e.g., 1948 presidential election) and

new interest in wording of questions and tasks New Society evaluations in 1960s: studies of behavior rather

than opinion Sudman and Bradburn in 1970s: largest effect on responses was

tasks (e.g., wording), not interviewers or respondents Emergence of cognitive psychological research in 1970s and

beyond: implications for survey development 1990s: NCES surveys and Voluntary National Test reading and

mathematics items in 1997-2000, reading research, SEPT, etc. 2000s: Expanding application to educational achievement test

items


Overview and selected issues in conducting cognitive labs

Retrospective and concurrent Open-ended and moderately or highly

focused (specificity of tasks, uses of probes) Respondent sampling and generalizability Task sampling and generalizability Task difficulty Respondent willingness and effectiveness in

thinking aloud and verbalizing


Concurrent and retrospective think-alouds

Describing thinking during task completion or after task completion

Trade-offs Thinking aloud may alter task performance Recall and reconstruction may differ from the

thinking that actually occurred


Open-ended and moderately or highly focused

Open-ended and exploratory “Please think out loud while you respond to the following

items.”

Moderately focused “…Remember to tell me what you think about when you

respond. Tell me about how you understand the item, how you select a response, and how you know which response is correct.”

Probes “How did you decide to select that response? What

information in the item and from school did you use?”


Grade 6 science item


Illustrative verbal reports

Thinking aloud “A. The candy will reach Bill. No that can’t be

right…B. The candy will go behind Bill…The possibility is the candy wouldn’t reach Bill and then it probably will drop.”

Response to a retrospective probe (“How did you get that answer?”) “Because I looked and if it’s, well, how it’s going

around and he’s going forward so…”


Respondent sampling and generalizability

Number of respondents Rule of thumb: 9 respondents Internal studies in usability testing suggest that little new

information is gained after ~9 respondents Representativeness of the population of inference

9 for each key subgroup in the population or 9 total? Typical subpopulations (e.g., racial-ethnic, gender) or

those more likely to be relevant (e.g., instructional program, gender)?

Often, we don’t know enough about which subpopulations may process differently

Cost affordability is a consideration


Task sampling and generalizability

All items, a random sample of items, or exemplars from item subsets (e.g., item families)?

We may not know enough about the tasks (e.g., item families) to sample effectively (and that’s why we’re conducting cog labs)

Numbers of respondents, time per respondent, and cost affordability are considerations


Task difficulty

Rule of thumb: Select tasks that are moderately difficult for respondents Alternately, select respondents that are well matched to the

tasks

Consideration: Select tasks in a range of difficulties Some respondents can verbalize about easy and routinized

tasks Some respondents can verbalize about their thinking, even

for tasks that are too difficult or that they don’t know about


Respondent willingness and effectiveness in thinking aloud and verbalizing

Some respondents are reticent Think of middle-schoolers Think of lower achievers

(e.g., Ferrara, Albert, et al., 1996)

Some respondents are unaware of their thinking (i.e., what information they heed)

Some respondents may be willing and aware, but do not verbalize their processing in illuminating or useful ways…


Target students for assessments of modified achievement standards

Pursuing grade-level content standards but may not progress at the same rate as their peers

May not be comfortable about verbalizing about what they know and don’t know, may not be particularly metacognitive (i.e., aware), may not verbalize effectively


Example from a GSEG project (implications to follow)

Ohio, Minnesota, Oregon, AIR Persistently low-performing SWDs

(PLP/SWD) Borrowed PLP idea from earlier Georgia work

Students in the lowest achievement level for two or three years (depends on the student cohorts)


Example (cont.)

Reading items that function adequately for PLP/SWDs—psychometric definition P values .4-.6 Point biserials and polyserials GE .20 And items that did not function well

Try to determine what distinguishes adequately functioning items and how to make other items more psychometrically sound for PLP/SWDs

(One of several research activities to identify item and test modifications to provide valid and accessible items for PLP/SWDs)


Example (cont.)

Initial findings for one cohort Adequately functioning items

Little and no inference required by comprehension items

Vocabulary items: Definitions in the text You can put your finger on the answer in the

passage Other items

Inference and synthesis required


Further…

The project will define eligibility, in part by identifying students “at the bottom of” grade-level assessments and “at the top of” alternate assessments

Think about trying to get verbal data from students who currently participate in alternate assessments


Implications for cog labs with target students They probably are fairly concrete thinkers They are not likely to be highly verbal (in the

colloquial sense) They may be reticent They may not be highly metacognitive (i.e., aware)

How much useful verbal data might we expect to get from students who are likely to be eligible for assessments of modified achievement standards?

Some encouraging results regarding think-alouds with students with learning disabilities

(Johnstone, Liu, Altman, Thurlow, 2007)


This is not an argument against using cog labs in this situation

But choose the items (and other assessment tasks), students, and think-aloud prompts and probes with the target students clearly in mind

Also, maybe consider a new idea: group cognitive labs As far as I know, this idea has not been proposed

elsewhere Think of it as focus groups where the focus is cognitive

processing while responding to test items


Group cognitive labs idea

4-6 respondents per group Probably homogeneous in terms of

achievement, verbalization, etc. OTL is an important consideration Get diversity and generalizability across groups

Could matrix sample items so that respondents do individual think-alouds for 2-3 items


Group cognitive labs idea (cont.)

Retrospective reports Training and practice:

Thinking aloud Reporting similarities and differences in thinking of other

respondents Avoiding being unduly influenced by others’ thinking

Round-robin think-alouds Respondent A thinks aloud for item 1 Other respondents report similarities and differences Etc.


Group cognitive labs idea (cont.)

Possible advantages Cost-efficiency Broader sampling Possible improvement in quantity and quality of

verbal reports Possible drawbacks

Increase in reticence Respondents influence each other, obscure

individual processing, and pollute verbal reports


Good luck!

Steve Ferrara

CTB McGraw-Hill

[email protected]


ReferencesDesimone, L. M., & Le Floch, K. C. (2004). Are we asking the right questions? Using cognitive interviews to improve

surveys in education research. Educational Evaluation and Policy Analysis, 26(1), 1-22.

Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. (rev. ed.). Cambridge, MA: The MIT Press.

Ferrara, S., Albert, F., Gilmartin, D., Knott, T., Michaels, H., Pollack, J, Schuder, T., Vaeth, R., & Wise, S. (1996, April). A qualitative study of the information examinees consider during item review on a computer-adaptive test. In L. Wolf, (Moderator), Item review in computerized adaptive testing. Symposium conducted at the annual meeting of the National Council on Measurement in Education, New York.

Ferrara, S., Duncan, T. G., Freed, R., Velez-Paschke, A., McGivern, J., Mushlin, S., Mattessich, A., Rogers, A., & Westphalen, K. (2004). Examining test score validity by examining item construct validity: Preliminary analysis of evidence of the alignment of targeted and observed content, skills, and cognitive processes in a middle school science assessment. Paper presented at the annual meeting of the American Educational Research Association, San Diego.

Johnstone, C., Liu, K., Altman, J., & Thurlow, M. (2007). Student think aloud reflections on comprehensible and readable assessment items: Perspectives on what does and does not make an item readable . (Technical Report 48.) Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.

Leighton, J. P. (2004). Avoiding misconception, misuse, and missed opportunities: the collection of verbal reports in educational achievement testing. Educational Measurement: Issues and Practice, 23(4), 6-15.

thinking about the think-aloud method to guide development of assessments of modified achievement...

Documents

cognitive labsthinkalouds

cognitive labsdo

cognitive labsoverview

useadapting cognitive

processes ferrara

achievement item development

achievement items

respondents aware