badges: a solution to our teacher evaluation disaster?
TRANSCRIPT
-
7/30/2019 Badges: A solution to our teacher evaluation disaster?
1/6
Badges: A solution to our teacher evaluation disaster?
ByValerie Strauss
This was written by Cathy N. Davidson, a Duke U niversity professor and author of Now You
See It: How the Brain Science of Attention Will Transform the Way We Live, Work, and
Learn .
By Cathy N. Davidson
Last spring, when Googles Project Oxygen revealed the results obtained from number-
crunching its entire stock of personnel records hiring, firing, merit raises, promotions no
one was more surprised than Google to find that the famously data-driven company had
actually been promoting managers for their squishy, soft, Management 101 people skills.
Google prides itself on managers who have technical chops, but technical expertise didnt
even make the Big Eight of esteemed management qualities. Fortunately, Google used a
flexible and open enough text-mining system to be able to see what was there, not what
wasnt, and to make its own contradictions visible. The company is now re-examining its
own management rules and its deepest assumption about who and what makes a goodmanager.
But what if Google had asked the question to find out how well its employees fulfilled the
companys stated data-driven values: How many of our managers have the technical
expertise to be good managers? The outcome of its own data-crunching could have been a
disaster. Google (the 2012 top company on the Fortune list, by the way) might have failed
its own empirical and objective and standardized test. If Google had been a public
school, it might have been slated for closure in 2014 because of its failure.
Of course Im overstating the case for effect. But the point here is that all the data in the
world doesnt matter if you ask the wrong question of the data or if the method of testing isnt
flexible enough to yield real, true data about success and failure.
All the data in the world doesnt matter if you are collecting one kind of information but the
real problem or virtue lies elsewhere. I believe this is the conundrum we are now in with the
multiple-choice, end-of-grade form oftesting that the United States (and the world) now uses
as its gold standard. Its not only an outmoded form of testing, but teaching aimed at
ensuring that students achieve success at bubble tests does not ensure real learning. It also
does not ensure that students will retain what they have learned and be able to apply it to
their next level of learning challenges, in the classroom or beyond.
How do we measure learning innovation?
This issue came up pointedly at last weeks Harvard Innovations in Learning and Teaching
(HILT) symposium where I was delighted to be one of the plenary speakers. The symposium
was designed to help us all use the best research on learning to rethink the traditional
-
7/30/2019 Badges: A solution to our teacher evaluation disaster?
2/6
classroom. One of the closing speakers said that they would be crunching the data to make
sure the results of learning experiments were rigorous. The best innovators are often not
the best evaluators, he said.
I thought of Project Oxygen and the dismal state ofNo Child Left Behind school
evaluation methods and responded, True enough! But the best evaluators are often not thebest innovators. We also have to be clear that our metrics are expansive enough to count
values that may not be testable by current measures.
Fortunately, Henry Roddy Roediger was also a plenary speaker. Roedigers research
shows the limitations of item-response testing that is divorced from that which is being
learned. His work in theMemory Lab at Washington University also shows that lecturing is
the least effective learning method. If you want people to retain and to be able to master and
apply what they learn, they have to be tested over and over as they are learning, and with
feedback that helps them to learn better.
Roedigers testing methods include a variety of challenges, including teaching others whatyou learn, working with someone who has a different answer than you to explain and correct
your thinking, writing up your conclusions for a public audience that will challenge you, and
other interactive forms of challenge-based testing.
Harvard physicist Eric Mazuralso demonstrated his interactive testing-learning methods at
HILT. He posed a basic physics problem to the crowd, we clicked our answers, and then he
had us try to convince someone else who had a different answer to change their minds.
In my interactions, a problem occurred. Perhaps because I was a plenary speaker, the
stranger I chose as my partner, a very smart and lovely person who knew the right answer,
didnt prevail on me forcefully enough to change my answer. I was wavering, convince-able,but the learning transfer didnt happen in our exchange. (It did, however, when it turned out
he was right and I was wrong; I will probably neverforget that physics lesson, which proves
Mazurs point, in long form).
But lets back up. If my partner in this physics audience had been a Web developer, and we
were doing a Web-building project together, my wrong answer well might have been the one
we went with and our common projectwould have failed.
Because so much code is written collaboratively, with strangers, where outcomes matter to
the success of the project, for future jobs and future collaborations, coders have developed a
complex yet easy (and difficult to game) system of awarding one another badges forsuccessful, innovative collaboration. They dont need a multiple-choice test to prove they are
good coders. In fact, unlike doctors, accountants, beauticians, or financial advisers,
programmers dont even have a formal certification or credentialing system.
Millions of Web programmers worldwide have learned to innovate at a far faster pace than
most of us and to evaluate one another rigorously throughpeer assessment. Really. That is
so counter-intuitive that Im going to repeat it: Millions of Web programmers worldwide have
-
7/30/2019 Badges: A solution to our teacher evaluation disaster?
3/6
learned to innovate at a far faster pace than most of us and to evaluate one another
rigorously through peer assessment.
How is this possible? How can peers really evaluate one another. They can and do in Web
world by awarding badges as peer-given contribution and reputation points. Badges are the
visible symbol of a complex system of rigorous peer evaluation of all the complex skills (thekind Project Oxygen turned up at Google) as well as all the innovative programming that
Web coders contribute to one another.
I believe we can learn much from what and how they do what they do.
Badges, innovation, and evaluation: The example of stack exchange
To understand more about the world of Badges I interviewed Jeff Atwood, cofounder
ofStack Exchange, a question-and-answer website which also includes Stack Overflow for
programmers, Server Fault for system administrators, and more than 70 others that range
from photography to productivity. He also writes the popular blogCoding Horror.
Stack Overflow serves the 12-13 million community of programmers world wide, seeing site
traffic in the range of 16 million views per months to their site. Atwood likes to say that one
of Stack Exchanges chief contributions is making platforms that make it easy for people to
contribute their knowledge to one another. Memberspose questions and other members
answer them, and, if the answer is good, you award points to your coding colleague.
If you are heading in a wrong direction (as I was in Eric Mazurs session at HILT), and
someone is able to steer you in the right direction, you award points to that person for their
teaching abilities. The points add up, and you can see the results on your own personal
website where programmers can proudly display their badges. Click on a programmers
glowing gold badge and you find a detailed assessment of absolutely everything that
contributed to the high scores, including my personal comments about why.
Im not talking about resume-speak. If I award points, you can read the actual details and
reasons for why Captain Coder over in Beijing earned points for her C++ programming
chops, or why Mr. Algorithmic in Sidney was awarded top points for being precognitive,
someone who follows development of new ideas and communities during the earliest stages.
Cruncher from Cambridge might earn points for being a self-learner, or a teacher, or for
being tenacious, outspoken, or disciplined, different assets in the community based on
algorithms and contributions to the site. (You can see the Stack Overflow badges and points
here .)
These qualities merge programming skills with teaching and learning skills collaborative
skills because, to deliver code on time, you need all of those (as Project Oxygen also
found with its personnel-record data-mining). Atwood calls them a reputational breadcrumb
trail on the Internet. But hes being modest.
-
7/30/2019 Badges: A solution to our teacher evaluation disaster?
4/6
Another part of StackExchange is Careers 2.0, a job posting and connecting service, a kind
of Match.Com for jobs. Reputation based on badges and points are the currency of the
realm and it is a leading service for employers looking to hire managers, programmers,
and just about anyone else in the worlds mobile, distributed programmer workforce. It should
come as no surprise that many of the best tech companies, including Google, use Careers
2.0 for their recruiting.
But one more word about badging. Its not just about jobs. As Atwood says, the badges on
Stack Exchange dont just record participation, they incentivize it. They also allow you to
match a range of qualities you value with the complex range of qualities that peers have
recognized and rewarded. You do a good job, others give you credit. And, if I, as an
employer, want to find out whysomeone has earned a badge, all I have to do is click on your
badge, find out the details, and read the comments and then I can decide how much I do or
dont trust the reputation. Its open, so I can see where Mr. Algorithmic is getting his points.
That is the thing about non-standardized open content: others can comment on it, emend it,
challenge it. And, if you want to crunch such loosey-goosey evaluation, well, we now havetext-mining software that allows that, with remarkable complexity, as we saw from Project
Oxygen.
We no longerneedto use the A, B, C, D, or None of the Above multiple-choice test invented
in 1914 and patterned after the state-of-the-art mass production of its time, Henry Fords
assembly line. When you think about it, its pretty hard to believe that the state of the art
evaluation system the world is currently using forevaluating something as complex
as learningdates back to the Model T.
Better evaluation systems exist now
We have computers now, everyone. Imagine that! But we are still using the testing
methods designed for the era of the Model T, a form of testing for lower order thinking that
measures the narrow range of thinking measured by Best Available Answer testing. We
know, from the best data-based research, this form of testing is a dis-incentive to learning,
especially for kids who dont believe they have any chance of obtaining the goal of using
good test scores to get into college. In other words, the tests incentivize, to use Jeff
Atwords word, only those aspiring to get to an end: college, a certificate, a credential. The
tests do not incentivize contribution, participation, collaboration, and learning what Stack
Exchange strives for.
Think about that. We have a system of tests designed for citizens of the Industrial Age,
based on the assembly line, that are extremely costly, dont measure much of content, and
dont motivate learning. And millions of programmers have found a way that works so well
they dont even need formal credentials and accreditation systems. What they do works
and works based on peers evaluating contribution (they dont even have a system of
-
7/30/2019 Badges: A solution to our teacher evaluation disaster?
5/6
failing: they reward what works, what is good, setting the bar for reputation at its highest,
not at its lowest denominator).
We not only can use far more interactive, complex, humane, interesting, challenging, and
innovative forms of assessment for real learning, real teaching, real collaboration the tech
community is already doing that. Teachers, researchers, experimenters, and evaluators allneed to think about these systems and learn from them. Project Oxygen revealed patterns
even Google didnt suspect. Stack Exchange is doing that daily, with millions of people.
The badging systems Im interested in exploring have to be offered by non-profit learning
organizations in order to avoid further commercialization and exploitation of our educational
system. They have to be less not more expensive to administer than the current
cumbersome system of either Human Resource (HR) evaluation or end of grade tests or
teacher standards and evaluation or merit systems. They have to include peer
components. They have to include a range of skills, content, subject matter, mastery,
application, theory, and practice, competencies and collaborative or character
qualities. And, most important, they have to be tied to the learning process itselfand incentivize and motivate not just document real, long-term, engaged, interactive
learning.
Badges for lifelong learning
Since September, the nonprofit learning network I cofounded,HASTAC (haystack) has
been working with the John D. and Catherine T. MacArthur Foundation and the Mozilla
Foundation to run competitions on Badges for Life Long Learning, as part of our
annualDigital Media and Learning Competition .
It turns out that many institutions join us in thinking our Model T form of testing is archaic and
a dis-incentive to either real learning or real learning innovation, in schools, in informal
learning settings, or in the workplace. Nearly 340 different institutions from NASA to Intel,
from small local schools to the Department of Education have offered challenges. Weve
just announced winners of the first phase of a separate Teacher Mastery Competition too.
And were now challenging developers to apply to work with institutions to co-create badging
systems that work for the values and learning goals of the institutions.
In the end, we will have a rich portfolio of active projects, all developing badging and
reputation systems online, funded for a year so that they can learn and so that we thepublic can learn from an open competition, an open year of co-developing, and an open
year of evaluating, recommending, refining, improving, and creating together. That is what
learning is about. We can all learn to do this together, in the way that the Open Web has
developed for the 21stcentury but that has yet to penetrate into our institutions of formal
learning and into many of our business institutions as well.
You cant build the next generation of the Web with an assembly line
-
7/30/2019 Badges: A solution to our teacher evaluation disaster?
6/6
At the HILT conference at Harvard, we talked a lot about how real metrics, real data, real
experiment can serve real learning innovation. If we dont also think
about innovative metrics, data, and experimental methods, we will replicate old standards
and values but with some relatively insignificant new tweaks. If we want true innovation in
learning, we must strive for true innovation in the methods we use for deciding what counts
and how we count. Im hopeful that we are at a tipping point. I believe we are on the vergeof using the successful methods already being used by the developers of the Internet to find
the best ways of learning for the Internet Age. I believe we will soon be finding new ways to
measure contribution and to motivate learning not for the era of the Model T but for the
21st
century.
-0-
Follow The Answer Sheet every day by
bookmarkinghttp://www.washingtonpost.com/blogs/answer-sheet.