bsherin lak presentation
TRANSCRIPT
Using computational methods to discover
student science conceptions in interview
data
Bruce Sherin
School of Education and Social Policy
Northwestern University
LAK 2012
Goals of this work
• Use computational analytic methods with traditional data
• Videos of interviews intended to study kids’ “prior conceptions” in
science.
• Automate the traditional analysis
• The traditional analysis:
1. Identify a set of “conceptions”
2. Code the data in terms of these conceptions.
• Go as far as possible with simple analytic techniques.
Some specifics
• The data: 54 interviews with middle school students.
• The subject matter: The earth’s seasons
• The approach: Simple vector space models, clustering
There are reasons to think automating the analysis of this
data should be difficult
• The amount of data is small
• Student speech is halting and ambiguous
• Gestures and diagrams are important
Prior science conceptions• Prior science conceptions: The prior understandings that students
bring to science learning
• Bibliography by Duit (2009) lists over 8000 papers
Health and disease
Genetics
Evolution
Geologic Time
Nature of matter
Ecosystems
Water cycle
Weather
• Two theoretical poles:
• Theory-Theory: Prior science knowledge consists of relatively well-
elaborated theories.
• Knowledge-in-Pieces. Prior science knowledge consists of a moderately
large number of not-well-organized conceptions.
• In an interview, students may construct explanations in-the-moment, drawing on
some of these conceptions.
The seasons corpus
• 54 interviews with middle school students
• Our interview protocol, in brief:
1. “Why is it warmer in the summer and colder in the winter?”
2. Follow up questions for clarification.
3. Asked to make a drawing.
4. Follow up questions for clarification.
5. Challenges for certain answers.
Prototypical explanations
Example Interview: Edgar
Starts with side-based, emphasizing the sun’s rays:
E: Here’s the earth slanted. Here’s the axis. Here’s the
North Pole, South Pole, and here’s our country. And
the sun’s right, and the rays hitting like directly right
here. So everything’s getting hotter over the summer
and once this thing turns, the country will be here
and the sun can't reach as much. It's not as hot as
the winter.
Shifts to typical closer-farther
E: Actually, I don't think this moves it turns and it moves like that and it
turns and that thing like is um further away once it orbit around the s-
Earth- I mean the sun.
Example Interview: Zelda
Tilt-based explanation, with the tilt causing light to be more or less direct
Z: Because, I think because the earth is on a tilt, and then, like that side of
the Earth is tilting toward the sun, or it’s facing the sun or something so
the sun shines more directly on that area, so its warmer.
Example Interview: Caden
Tilt-based explanation, with the tilt causing closer-farther
I: So the first question is why is it warmer in the summer and colder in the
winter?
C: Because at certain points of the earth’s rotation, orbit around the sun,
the axis is pointing at an angle, so that sometimes, most times,
sometimes on the northern half of the hemisphere is closer to the sun
than the southern hemisphere, which, change changes the
temperatures. And then, as, as it’s pointing here, the northern
hemisphere it goes away, is further away from the sun and get’s colder.
I: Okay, so how does it, sometimes the northern hemisphere is, is toward
the sun and sometimes it’s away?
C: Yes because the at—I’m sorry, the earth is tilted on its axis. And it’s
always pointed towards one position.
Analysis Procedure
1. Clean transcripts, removing everything except words
spoken by students
2. Break each transcript into 100-word segments, with a
moving window that steps forward 25 words
• Results in 794 segments
3. Map each segment to a vector
4. DeviationalizeTM the vectors
5. Cluster the vectors
6. Interpret the clusters
7. Apply clusters to analyze transcripts
Mapping segments to vectors4
2
0
2
1
1
2
0
1
3
…
2.1
1.7
0
1.7
1
1
1.7
0
1
2.1
…
sun
earth
side
away
tilted
closer
axis
day
farther
time
…
• Compile the vocabulary
• Stop list consisting of 782 words
• Results in vocabulary with 647 words
• For each segment count number of
occurrences of each of these words.
• Weight as 1 + log(count)
• Normalize
• Result: 794 vectors, each with 647
dimensions.
DeviationalizeTM
• Average all of the segment vectors and replace each by
their difference from this average.
Cluster
• Use hierarchical agglomerative clustering
# of clusters Sizes of the clusters
10 19 72 9 68 140 62 44 122 136 122
9 19 72 68 62 44 122 136 122 149
8 19 72 68 44 122 136 122 211
7 72 68 44 122 122 211 155
6 68 44 122 122 211 227
5 68 122 122 211 271
4 122 122 271 279
3 271 279 244
What do the clusters mean?
Apply clusters to transcripts
For each transcript:
• Segment into 100-word chunks
• Find the vector for each segment
• For each segment, find the dot product between the
segment vector and each of the cluster centroids
• Plot the results
EdgarStarts with side-based, emphasizing the sun’s rays:
E: … and the rays hitting like directly right here. … once this thing turns, the
country will be here and the sun can't reach as much
Shifts to typical closer-farther
E: that thing like is um further away once it orbit around the s- Earth- I
mean the sun.
ZeldaTilt-based explanation, with the tilt causing light to be more or less direct
Z: … that side of the Earth is tilting toward the sun, or it’s facing the sun or
something so the sun shines more directly on that area, so its warmer.
CadenTilt-based explanation, with the tilt causing closer-farther
C: … the axis is pointing at an angle, so that sometimes … the northern
half of the hemisphere is closer to the sun… .
Summary
• Used traditional data set:
• Videos of interviews intended to study kids’ “prior conceptions” in
science.
• Set out to produce a “knowledge-in-pieces” analysis
• Notable difficulties:
• Small amount of data
• Halting and ambiguous speech.
• Gestures, diagrams are referenced
• Keep the methods as simple as possible
• Deviationalizing is an exception
• Results are “suggestive”
• (That we can capture features of student knowledge that are widely
recognized to be important)
What does this buy me?
What role might these computational techniques play in the
toolkit of researchers who study prior conceptions science
students?
• Can we replace human coders?
• Actually, a human played an important role here.
• Can play a role as kind of independent support for the
work of human analysts!
Open issues and next steps
1. Apply to subject matter other than the seasons
2. Systematic comparison to human analysis
3. Apply to answer some new research questions
4. Systematic investigation of alternative analysis methods
• In the paper: (1) Different segment size, (2) Without deviationalizing
5. Why does this work?