an introduction to apache hadoop mapreduce
TRANSCRIPT
Apache Hadoop MapReduce
What is it ?
Why use it ?
How does it work
Some examples
Big users
MapReduce What is it ?
Processing engine of Hadoop
Developers create Map and Reduce jobs
Used for big data batch processing
Parallel processing of huge data volumes
Fault tolerant
Scalable
MapReduce Why use it ?
Your data in Terabyte / Petabyte range
You have huge I/O
Hadoop framework takes care of
Job and task management
Failures
Storage
Replication
You just write Map and Reduce jobs
MapReduce How does it work ?
Take word counting as an example, something that Google does all of the time.
MapReduce How does it work ?
Input data split into shards
Split data mapped to key,value pairs i.e. Bear,1
Mapped data shuffled/sorted by key i.e. Bear
Sorted data reduced i.e. Bear, 2
Final data stored on HDFS
There might be extra map layer before shuffle
JobTracker controls all tasks in job
TaskTracker controls map and reduce
MapReduce - Some examples
A visual example with colours to show you the cycle
Split -> Map -> Shuffle -> Reduce
MapReduce - Some examples
A visual example of MapReduce with job and task trackers added to individual map and reduce jobs.
Hadoop MapReduce Big users
Users
Yahoo
Amazon
Ebay
Providers
Amazon
Cloudera
HortonWorks
MapR
Contact Us
Feel free to contact us at
www.semtech-solutions.co.nz
We offer IT project consultancy
We are happy to hear about your problems
You can just pay for those hours that you need
To solve your problems