Download - Million Monkeys User Group
![Page 1: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/1.jpg)
1
Headline Goes HereSpeaker Name or Subhead Goes Here
DO NOT USE PUBLICLY PRIOR TO 10/23/12Million Monkeys
Jesse Anderson | Curriculum Developer and InstructorNovember 2012
![Page 2: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/2.jpg)
2
About Me
• Cloudera - Educational Services Team• Twitter - @jessetanderson• Blog and more info: http://www.jesse-anderson.com• Screencasts on Pragmatic Programmers: Buy It Now on
http://www.jesse-anderson.com• President – Northern Nevada Software Developers Group
![Page 3: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/3.jpg)
3
About Cloudera
• Cloudera is “The commercial Hadoop company”• Founded by leading experts on Hadoop from Facebook, Google,
Oracle and Yahoo• Provides consulting and training services for Hadoop users• Staff includes committers to virtually all Hadoop projects
![Page 4: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/4.jpg)
4
Introduction
• Infinite Monkey Theorem• Hadoop• Million Monkeys Algorithm• Business Case
![Page 5: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/5.jpg)
Infinite Monkey Theorem
5
“A million monkeys on a million typewriters will eventually recreate Shakespeare
”
![Page 6: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/6.jpg)
6
Exponential Growth (aka Big Data)
Odds of finding a group of characters is 1 in 26 raised to the power of
the number of contiguous characters
1 in 26n
Contiguous Characters Combinations
8 208,827,064,576
9 5,429,503,678,976
10 141,167,095,653,376
![Page 7: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/7.jpg)
7
Hadoop
• Apache Project• Reliable, Scalable, Distributed Computing• Software Framework• MapReduce• Distributed File System (HDFS)• Other projects
![Page 8: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/8.jpg)
8
MapCreate or process the input data
![Page 9: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/9.jpg)
9
ReduceProcess data from Map into something usable
![Page 10: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/10.jpg)
10
Data Flow
![Page 11: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/11.jpg)
11
Million Monkeys Algorithm
![Page 12: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/12.jpg)
12
Business Case
![Page 13: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/13.jpg)
13
Hadoop Scalability
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200
20
40
60
80
100Percent of Linear Scalability
RDBMSHadoop
Perc
ent
RDBMS = Relational DatabaseNodes
![Page 14: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/14.jpg)
14
Scaling does not require massive re-engineering
and complete rewrites of code
Business Value of Scalability
Adding more computers to cluster gets a
predictable increase in computational power and
storage
$$$SAVETIMESAVE
![Page 15: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/15.jpg)
15
Going Viral (and taking over the world)
26,000 unique visits from 119 countries in one day
Covered internationally in BBC, Wall Street Journal, Wired and Slashdot
![Page 16: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/16.jpg)
16
Next Steps
• Books• Hadoop: The Definitive Guide - Tom White• Hadoop Operations - Eric Sammer
• Cloudera Training• Developer, Admin, Hive and Pig, HBase, Essentials
• CDH• Cloudera's Apache Distribution Including Hadoop• Open Source• VM Image
![Page 17: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/17.jpg)
17
Conclusion
• MapReduce breaks up problem efficiently• No code changes to scale• Incredible scalability• Enables previously impossible tasks
![Page 18: Million Monkeys User Group](https://reader033.vdocuments.site/reader033/viewer/2022061206/5482b44fb4af9f6d148b4592/html5/thumbnails/18.jpg)
18