crowdsourcing pwi sept-2011
DESCRIPTION
Presentation of the crowdsourcing business model to the Professional Women International association. It describes the pros and cons, how to scale with Machine Learning, and the emergence of reputation systems.TRANSCRIPT
![Page 1: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/1.jpg)
http://bitsofknowledge.waterloohills.com
• Introduction• Crowd Motivation• Client Motivations and Types of tasks• Scale up with Machine Learning• Quality Management• Workflows for Complex tasks• Reputation Systems• Economic shift
PWI - September 29, 2011 [email protected]
http://bitsofknowledge.waterloohills.com
![Page 2: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/2.jpg)
http://bitsofknowledge.waterloohills.com
Crowd or Community (online audience)
Crowdsourcing
1 2
3
4
![Page 3: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/3.jpg)
http://bitsofknowledge.waterloohills.com
Ex: “Adult Websites” Classification
• Large number of sites to label• Get people to look at sites and classify them as:
– G (general audience) – PG (parental guidance)– R (restricted)– X (porn)
[Panos Ipeirotis. WWW2011 tutorial]
![Page 4: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/4.jpg)
http://bitsofknowledge.waterloohills.com
Ex: “Adult Websites” Classification• Large number of hand‐labeled sites• Get people to look at sites and classify them as:
– G (general audience) – PG (parental guidance)– R (restricted)– X (porn)
Cost/Speed Statistics:• Undergrad intern: 200 websites/hr, cost: $15/hr• MTurk: 2500 websites/hr, cost: $12/hr
[Panos Ipeirotis. WWW2011 tutorial]
![Page 5: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/5.jpg)
http://bitsofknowledge.waterloohills.com
Crowd Motivation
• €,$ = Money!• Self-serving purpose (learning new skills,
get recognition, avoid boredom, enjoyment, create a network with other profesionals)
• Socializing, feeling of belonging to a community, friendship
• Altruism (public good, help others)
![Page 6: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/6.jpg)
http://bitsofknowledge.waterloohills.com
Examples: Altruism
![Page 7: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/7.jpg)
http://bitsofknowledge.waterloohills.com
Crowd Demography (background defines motivation)
• The 2008 survey at iStockphoto indicates that the crowd is quite homogenous and elite.
• Amazon’s Mechanical Turk workers come mainly from 2 countries: a) USA b) India
![Page 8: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/8.jpg)
http://bitsofknowledge.waterloohills.com
Crowd Demography
![Page 9: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/9.jpg)
http://bitsofknowledge.waterloohills.com
Client motivation
• Need Suppliers:
Mass work, Distributed work, or just tedious work Creative work Look for specific talent Testing Support To offload peak demands Tackle problems that need specific communities or human variety Any work that can be done cheaper this way.
![Page 10: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/10.jpg)
http://bitsofknowledge.waterloohills.com
Client motivation
• Need customers!
• Need Funding
• Need to be Backed up
• Crowdsourcing is your business!
![Page 11: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/11.jpg)
http://bitsofknowledge.waterloohills.com
Examples of Funding
![Page 12: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/12.jpg)
http://bitsofknowledge.waterloohills.com
Client Tasks Goals 3 main goals for a task to be done:
1. Minimize Cost (cheap)2. Minimize Completion Time (fast)3. Maximize Quality (good)
Remember Crowd Motivation! (ex.: Game-ify your task, explain the final purpose)
![Page 13: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/13.jpg)
http://bitsofknowledge.waterloohills.com
Examples: Games
![Page 14: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/14.jpg)
http://bitsofknowledge.waterloohills.com[Panos Ipeirotis. WWW2011 tutorial]
![Page 15: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/15.jpg)
http://bitsofknowledge.waterloohills.com
Pros• Quicker: Parallellism reduces time• Cheap• Creativity, Innovation • Quality (*depends)• Access to scarce resources: The ‘long tail’• Multiple feedback• Allows to create a community (followers)• Business Agility• Scales up! (*up to a level)
![Page 16: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/16.jpg)
http://bitsofknowledge.waterloohills.com
Cons• Lack of professionalism: Unverified quality • Too many answers• No standards • Not always cheap: Added costs to bring a project to conclusion• Too few participants if task or pay is not attractive• If worker is not motivated, lower quality of work
![Page 17: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/17.jpg)
http://bitsofknowledge.waterloohills.com
Scale Up with Machine Learning Build an ‘Adult Website’ Classifier
• Crowdsourcing is cheap but not free- Workers cannot do more than xxhours/day, Cannot scale to web without help
Build automatic classification models using examples from crowdsourced data
![Page 18: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/18.jpg)
http://bitsofknowledge.waterloohills.com
Integration with Machine Learning
• Humans label training data• Use training data to build model
![Page 19: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/19.jpg)
http://bitsofknowledge.waterloohills.com
Quality Management Ex: “Adult Website” Classification
• Bad news: Spammers!• Worker ATAMRO447HWJQ labeled
X (porn) sites as G (general audience)
[Panos Ipeirotis. WWW2011 tutorial]
![Page 20: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/20.jpg)
http://bitsofknowledge.waterloohills.com
Quality Management Majority Voting and Label Quality
• Spammers try to go undetected• Good willing workers may have bias
difficult to set apart.
1. Ask multiple labelers2. Keep majority label as
“true” label
Use the probability ofbeing correct as theQuality Indicator
![Page 21: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/21.jpg)
http://bitsofknowledge.waterloohills.com
Complex tasks Handle answers through workflow
• Q: “My task does not have discrete answers….”• A: Break into two Human Intelligence Tasks (HITs):
– “Create” HIT– “Vote” HIT
Vote controls quality of Creation HIT• Redundancy controls quality of Voting HIT
![Page 22: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/22.jpg)
http://bitsofknowledge.waterloohills.com
Collaboration: Photo descriptionBut the free-form answer can be more complex, not just right or wrong…
TurkIt toolkit [Little et al., UIST 2010]: http://groups.csail.mit.edu/uid/turkit/
![Page 23: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/23.jpg)
http://bitsofknowledge.waterloohills.com
Collaboration: Description Versions1. A partial view of a pocket calculator
together with some coins and a pen.2. ...3. A close‐up photograph of the following
items: A CASIO multi‐function calculator. A ball point pen, uncapped. Various coins, apparently European, both copper and gold. Seems to be a theme illustration for a brochure or document cover treating finance, probably personal finance.
4. …8. A close‐up photograph of the following items: A CASIO
multi‐function, solar powered scientific calculator. A blue ball point pen with a blue rubber grip and the tip extended. Six British coins; two of £1value, three of 20p value and one of 1p value. Seems to be a theme illustration for a brochure or document cover treating finance ‐ probably personal finance.
![Page 24: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/24.jpg)
http://bitsofknowledge.waterloohills.com
Collaboration
• Exploration / exploitation tradeoff (Independence/or not)
– Can accelerate learning, by sharing good solutions
– But can lead to premature convergence on suboptimal solution
[Mason and Watts, submitted to Science, 2011]
![Page 25: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/25.jpg)
http://bitsofknowledge.waterloohills.com
Collaboration: Positive• Building iteratively allows better outcomes
for the image description task.• In the FoldIt puzzles, workers built on each
other’s results. They recently found in 10 days the molecular structure of a protein- cutting enzyme from an AIDS-like virus.
![Page 26: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/26.jpg)
http://bitsofknowledge.waterloohills.com
Collaboration: Negative Group Thinking Effect
• Individual search strategies affect group success:
Players copying each other make less exploring lower probability of finding peak on a round
![Page 27: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/27.jpg)
http://bitsofknowledge.waterloohills.com
Workflow Patterns• Generate / Create• Find• Improve / Edit / Fix
Creation• Vote for accept‐reject• Vote up, vote down, to generate rank• Vote for best / select top‐k
Quality Control• Split task• Aggregate Flow Control• Iterate
Flow Control
![Page 28: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/28.jpg)
http://bitsofknowledge.waterloohills.com
AdSafe Crowdsourcing Experience
![Page 29: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/29.jpg)
http://bitsofknowledge.waterloohills.com
![Page 30: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/30.jpg)
http://bitsofknowledge.waterloohills.com
AdSafe Crowdsourcing Experience•Detect pages that discuss swine flu– Pharmaceutical firm had drug “treating” (off-label) swine flu– FDA prohibited pharmaceuticals to display drug ad in pages about swine flu
Two days to comply!
• Big fast-food chain does not want ad to appear:– In pages that discuss the brand (99% negative sentiment)– In pages discussing obesity
![Page 31: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/31.jpg)
http://bitsofknowledge.waterloohills.com
Adsafe Crowdsourcing Experience Workflow to classify URLs
• Find URLs for a given topic (hate speech, gambling, alcoholabuse, guns, bombs, celebrity gossip, etc etc)http://url‐collector.appspot.com/allTopics.jsp
• Classify URLs into appropriate categorieshttp://url‐annotator.appspot.com/AdminFiles/Categories.jsp
• Mesure quality of the labelers and remove spammershttp://qmturk.appspot.com/
• Get humans to “beat” the classifier by providing cases wherethe classifier failshttp://adsafe‐beatthemachine.appspot.com/
![Page 32: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/32.jpg)
http://bitsofknowledge.waterloohills.com
Crowdsourcing AggregatorsAct as Portals• Create a crowd or community.• Create a site to connect a client to the crowd• Deal with workflow of complex tasks, like decomposition into simpler tasks and answer recomposition• Works as Broker and Bank, Mediator
Allow anonymity Consumers can benefit from a crowd without the need to create it.
![Page 33: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/33.jpg)
http://bitsofknowledge.waterloohills.com
Market Design: Crude vs Intelligent Crowdsourcing• Intelligent Crowdsourcing uses an
organized workflow to tackle CONS of crude crowdsourcing.
Complex task is divided by experts, Given to relevant crowds, and not to
everyoneIndividual answers are recomposed by
experts into general answer
![Page 34: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/34.jpg)
http://bitsofknowledge.waterloohills.com
Lack of Reputation and Market for Lemons
“When quality of sold good is uncertain and hidden before transaction, prize goes to value of lowest valued good” [Akerlof, 1970; Nobel prize winner]
• Market evolution steps: 1. Employers pays $10 to good worker, $0.1 to bad worker 2. 50% good workers, 50% bad; indistinguishable from each other 3. Employer offers price in the middle: $5 4. Some good workers leave the market (pay too low) 5. Employer revised prices downwards as % of bad increased 6. More good workers leave the market… death spiral
http://en.wikipedia.org/wiki/The_Market_for_Lemons
![Page 35: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/35.jpg)
http://bitsofknowledge.waterloohills.com
Reputation systems• Challenges:
- Insufficient participation - Overwhelmingly positive feedback
+ Hoping to get a positive ranking in return - Negative feedback avoided for fear of retaliation
- Dishonest reports + « Riddle for a PENNY! No shipping-Positive Feedback » - « Bad-mouth » reports
• Incentive mechanisms to get honest feedback - pay rater if report matches next; - delay next transaction over time
![Page 36: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/36.jpg)
http://bitsofknowledge.waterloohills.com
Reputation systems• “Cheap pseudonyms”: easy to disappear and
reregister under a new identity with almost no cost. [Friedman and Resnick 2001] Introduce opportunities to misbehave without paying reputational consequences.
Increase the difficulty of online identity changes Impose upfront costs to new entrants: allow new identities (forget the past) but make it costly.
• 2-sided Reputation Mechanisms – Crowd: To ensure worker quality – Employer: To ensure their trustworthiness
![Page 37: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/37.jpg)
http://bitsofknowledge.waterloohills.com
Economical Shift• From Social Networking to Social Production
through Collaborative Innovation
Mass-Collaboration changes how Products & Services are Designed,Manufactured,Marketed
• Classical geo-political and economical organisations do not correspond to new economy
Realignment of competitive advantages Move towards Collaborative Enterprises based on Open Infrastructure
![Page 38: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/38.jpg)
http://bitsofknowledge.waterloohills.com
Societal Shift Moral values Reinforcement
• Open data access makes actions Transparent• Transparency makes people Accountable• Accountability forces/fosters Integrity• Integrity breeds Community Support
Link between Ethical values and ROI
![Page 39: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/39.jpg)
http://bitsofknowledge.waterloohills.com
References• Wikipedia,2011• Dion Hinchcliffe Crowdsourcing: 5 Reasons Its Not Just For Start Ups Anymore,2009• Tomoko A. Hosaka, MSNBC. "Facebook asks users to translate for free“,2008. • Daren C. Brabham. "Moving the Crowd at iStockphoto: The Composition of the Crowd and Motivations for Participation in a Crowdsourcing Application", First Monday, 13(6),2008.• Karim R. Lakhani, Lars Bo Jeppesen, Peter A. Lohse & Jill A. Panetta. The value of openness in scientific problem solving (Harvard Business School Working Paper No. 07-050),2007.• Klaus-Peter Speidel How to Do Intelligent Crowdsourcing,2011• Panos Ipeirotis. Managing Crowdsourced Human Computation, WWW2011 tutorial,2011• Omar Alonso & Matthew Lease. Crowdsourcing 101: Putting the WSDM of Crowds to Work for You, WSDM Hong Kong 2011. • Sanjoy Dasgupta, http://videolectures.net/icml09_dasgupta_langford_actl/,2009•Don Tapscott, Anthony Williams. Macrowikinomics, 2010.
![Page 40: Crowdsourcing PWI Sept-2011](https://reader034.vdocuments.site/reader034/viewer/2022051411/546ff8a5af795991308b45c3/html5/thumbnails/40.jpg)
http://bitsofknowledge.waterloohills.com
Call For Ideas:
If you have a large set of examples or just an idea of application
for a program to classify or predict, I would love to hear from you!
[email protected]://bitsofknowledge.waterloohills.com
PWI - September 29, 2011
Questions?