a novel approach to big data veracity using crowd-sourcing techniques
DESCRIPTION
Technical paper on - "A novel approach to big data veracity using crowd-sourcing techniques." presented at BMS Institute of Technology, Bangalore.TRANSCRIPT
![Page 1: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/1.jpg)
BIG DATA and VERACITY:A novel approach to data
veracity using crowd-sourcing techniques
Samarth Bhargav, Bhoomika Agarwal, Abhiram Ravikumar and Vrishabh DN
April 18, 2014Presented at BMS Institute of Technology, Bangalore
![Page 2: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/2.jpg)
Introduction
Big Data
● What is Big Data?● The 3 traditional V’s
o Volumeo Velocityo Variety
● Fourth V● Crowdsourcing
Volume
VarietyVelocity
Veracity
![Page 3: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/3.jpg)
The 4 Vs of Big Data
Source: http://well-managed-business-intelligence.blogspot.in/2012/06/big-data-fourth.html
![Page 4: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/4.jpg)
Crowdsourcing - Models in place
GOOGLE MAPS
WIKIPEDIA
DUOLINGO
RECAPTCHA
AMAZON TURK
![Page 5: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/5.jpg)
● Digitizing one word at a time● Utilize the 10 seconds spent by humans, productively● Digitizing old books - herculean task for computers ● An efficient alternative to OCR● Workflow - entry, multiple-checks, verify, upload● 20 years of The New York Times Daily was digitized in
just a couple of months
reCAPTCHA
![Page 6: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/6.jpg)
● “Enrich Google Maps with your local knowledge”● The Google Map Maker project● Data used by Google Maps and Google Earth● Projects like PhotoSphere and StreetView use huge
contributions from the masses● Workflow
○ add/edit places○ verified by a moderator○ cross-referenced and updated
Google Maps
![Page 7: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/7.jpg)
WIKIPEDIA
● Termed as the “mother of all encyclopedias” ● Hosts an immense pool of data, multi-linguistic in nature
and entirely community driven● Run by donations from all over the world (crowdfunding)● Dynamic and constantly updated, thus scores big over
traditional encyclopedias
● Unbiased and high-quality information
● Data-verification and validation done instantly by both experts and general public
![Page 8: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/8.jpg)
DUOLINGO
● Learn a language and translate the Web● Entirely free and crowd-driven● Luis van Ahn - ESP games and reCAPTCHA● Workflow
o website to be translated is uploadedo broken into parts & given to studentso students translate the doc during learning procedureo translated doc returned to owner
● Win-win situation for both students and corporates● Popular on both web as well as mobile platforms
![Page 9: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/9.jpg)
Amazon Mechanical Turk
● Use of artificial intelligence to run businesses● HITs enable machine learning concepts● Workflow
o Requester places task on the site or through APIo Provider picks a suitable task o Payments made through Amazon gift certificates
● Advantages includeo Quality assuranceo Scalability optionso Lower cost
![Page 10: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/10.jpg)
Analysis
● Handling data IS important● Google FLU tracker● KickStarter and CosmoQuest ● Lot of scope and wide opportunities
![Page 11: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/11.jpg)
Repercussions
● Senator Kennedy’s story● FCRA (Fair Credit Reporting Act)● Crowds unaware of data-acquisition● Confidential data and security-leaks to be
addressed with care
![Page 12: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/12.jpg)
Conclusion
Crowdsourcing model
Volume Velocity Variety Veracity
Google Maps terabytes high low medium
Duolingo terabytes medium high high
reCAPTCHA petabytes very high very high very high
Amazon Turk petabytes medium very high high
Wikipedia petabytes medium high very high
![Page 13: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/13.jpg)
References1.
http://crowdsourcingweek.com/you-have-helped-digitize-millions-of-books-through-online-collaboration/
2. http://www.loopinsight.com/2014/03/14/duolingo-recaptcha-and-a-magnificent-piece-of-crowdsourcing/
3. http://www.cracked.com/article_19431_5-mind-blowing-things-crowds-do-better-than-experts.html
4. http://royal.pingdom.com/2012/02/08/google-maps-turns-7-years-old-amazing-facts-and-figures/
5. http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk6. http://www.pomona.edu/academics/departments/psychology/files/Buhrmester%20-Crowdsourci
ng-Amazon-MTurk.pdf7. http://hcil2.cs.umd.edu/trs/2010-09/2010-09.pdf8. http://www.slideshare.net/davidgracia/crowdsourcing-at-wikipedia-85865849.
http://info.articleonepartners.com/crowdsourcing-series-wikipedia-the-godfather-of-crowdsourcing/
10. http://ezinearticles.com/?Wikipedia---A-Successful-Crowdsourcing-Project&id=3736803
![Page 14: A novel approach to big data veracity using crowd-sourcing techniques](https://reader033.vdocuments.site/reader033/viewer/2022061300/54c647174a79594b448b459f/html5/thumbnails/14.jpg)
Question & Answers time! :-)
Source:http://2.bp.blogspot.com/
Thank you, UTSAHA 2k’14.