tackling big data with the elephant in the room
TRANSCRIPT
![Page 1: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/1.jpg)
TACKLING BIG DATA WITH THE ELEPHANT IN THE ROOM
![Page 2: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/2.jpg)
WHAT’S THE PROBLEM WITH BIG DATA?
Volume Variety Velocity
![Page 3: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/3.jpg)
WHAT’S THE SOLUTION TO BIG DATA?
“In pioneer days they used oxen for heavy pulling, and when one oxen couldn’t budge
a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger
computers, but for more systems of computers.” – Grace Hopper
![Page 4: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/4.jpg)
HADOOP’S SOLUTION
Sqoop
Pig Hive
HBase Mahout Flume
Oozie …
Hadoop Distributed File System
MapReduce
Hadoop Core
Components
Hadoop Ecosystem
![Page 5: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/5.jpg)
WHAT IS
HDFS?
![Page 6: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/6.jpg)
HOW DOES HDFS WORK?
Large Data File
Block #1
Block #2
![Page 7: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/7.jpg)
HOW DOES HDFS WORK?
Large Data File
Block #1
Block #2
Block #1
Block #1
Block #1
![Page 8: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/8.jpg)
HOW DOES HDFS WORK?
Large Data File
Block #1
Block #2
Block #1
Block #1
Block #1
Block #2
Block #2
Block #2
![Page 9: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/9.jpg)
HOW DOES HDFS WORK?
Large Data File
Block #1
Block #2
Block #1
Block #1
Block #1
Block #2
Block #2
Block #2
![Page 10: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/10.jpg)
WHAT IS MAP-REDUCE? Core Ideas
– Data Locality – Parallelism – Block Independence
Three Stages 1. Map 2. Swap & Sort 3. Reduce
![Page 11: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/11.jpg)
WORD COUNT MAP
the cat sat on the mat the aardvark sat on the …
Node 1
the mahout drove the ….
Node 2
the cat sat on the mat The aardvark sat on the … The mahout drove the …
![Page 12: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/12.jpg)
Mapper
WORD COUNT MAP
the cat sat on the mat the aardvark sat on the …
Node 1
the mahout drove the ….
Node 2
Mapper
map()
map()
![Page 13: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/13.jpg)
Mapper
WORD COUNT MAP
the cat sat on the mat the aardvark sat on the …
Node 1
the mahout drove the ….
Node 2
Mapper
map()
map()
the 1
cat 1
sat 1
on 1
the 1
mat 1
the 1
mahout 1
drove 1
the 1
![Page 14: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/14.jpg)
Mapper
WORD COUNT MAP
the cat sat on the mat the aardvark sat on the …
Node 1
the mahout drove the ….
Node 2
Mapper
map()
map()
the 1
cat 1
sat 1
on 1
the 1
mat 1
the 1
mahout 1
drove 1
the 1
map() the 1
aardvark 1
sat 1
on 1
the 1
![Page 15: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/15.jpg)
WORD COUNT SWAP & SORT the 1
cat 1
sat 1
on 1
the 1
mat 1
the 1
mahout 1
drove 1
the 1
the 1
aardvark 1
sat 1
on 1
the 1
![Page 16: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/16.jpg)
WORD COUNT SWAP & SORT the 1
cat 1
sat 1
on 1
the 1
mat 1
the 1
mahout 1
drove 1
the 1
the 1
aardvark 1
sat 1
on 1
the 1
aardvark 1
cat 1
mat 1
on 1,1
sat 1
the 1,1,1,1
drove 1
mahout 1
the 1,1
![Page 17: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/17.jpg)
WORD COUNT SWAP & SORT the 1
cat 1
sat 1
on 1
the 1
mat 1
the 1
mahout 1
drove 1
the 1
the 1
aardvark 1
sat 1
on 1
the 1
aardvark 1
cat 1
mat 1
on 1,1
sat 1
the 1,1,1,1
drove 1
mahout 1
the 1,1
aardvark 1
cat 1
mat 1
mahout 1
sat 1
drove 1
on 1,1
the 1,1,1,1,1,1
Node 3
Node 4
Node 5
![Page 18: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/18.jpg)
WORD COUNT REDUCER aardvark 1
cat 1
mat 1
mahout 1
sat 1
drove 1
on 1,1
the 1,1,1,1,1,1
Node 3
Node 4
Node 5
Reducer 0
Reducer 1
Reducer 2
aardvark 1
cat 1
mat 1
mahout 1
sat 1
drove 1
on 2
the 6
![Page 19: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/19.jpg)
TAKE-AWAYS
Sqoop
Pig Hive
HBase Mahout Flume
Oozie …
Hadoop Distributed File System
MapReduce
Hadoop Core
Components
Hadoop Ecosystem
![Page 20: Tackling Big Data with the Elephant in the Room](https://reader034.vdocuments.site/reader034/viewer/2022042608/55cfa33fbb61eb7d5c8b47ab/html5/thumbnails/20.jpg)
QUESTIONS?