distributed computing in hazelcast - geekout 2014 edition

Post on 10-May-2015

935 Views

Category:

Engineering

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Today’s amounts of collected data are showing a nearly exponential growth. More than 75% of all the data have been collected in the past 5 years. To store this data and process it in an appropriate time you need to partition the data and parallelize the processing of reports and analytics. This talk will demonstrate how to parallelize data processing using Hazelcast and it’s underlying distributed data structures. With a quick introduction into the different terms and some short live coding examples we will make the journey into the distributed computing. Sourcecode of the demonstrations are available here: 1. https://github.com/noctarius/hazelcast-mapreduce-presentation 2. https://github.com/noctarius/hazelcast-distributed-computing

TRANSCRIPT

DISTRIBUTED COMPUTINGIN HAZELCAST

Source: http://www.newscientist.com/gallery/dn17805-computer-museums-of-the-world/11

www.hazelcast.com

THAT'S MEChristoph EngelbertTwitter: @noctarius2k8+ years of Java WeirdonessPerformance, GC, traffic topicsApache CommitterGaming, Travel Management, ...

www.hazelcast.com

OUR SPACE TRIP ROADMAPHazelcastDistributed ComputingDistributed ExecutorServiceEntryProcessorMap & ReduceQuestions

www.hazelcast.com

HAZELCASTPICKIN' DIAMONDS

www.hazelcast.com

WHAT IS HAZELCAST?In-Memory Data-GridData Partioning (Sharding)Java Collections ImplementationDistributed Computing Platform

www.hazelcast.com

WHY HAZELCAST?Automatic PartitioningFault ToleranceSync / Async BackupsFully DistributedIn-Memory for Highest Speed

www.hazelcast.com

WHY HAZELCAST?

www.hazelcast.com

www.hazelcast.com

WHY IN-MEMORYCOMPUTING?

www.hazelcast.com

TREND OF PRICES

Data Source: http://www.jcmit.com/memoryprice.htm

www.hazelcast.com

SPEED DIFFERENCE

Data Source: http://i.imgur.com/ykOjTVw.png

www.hazelcast.com

DISTRIBUTEDCOMPUTING

OR

MULTICORE CPU ON STEROIDS

www.hazelcast.com

THE IDEA OF DISTRIBUTED COMPUTING

Source: https://www.flickr.com/photos/stefan_ledwina/1853508040

www.hazelcast.com

THE BEGINNING

Source: http://en.wikipedia.org/wiki/File:KL_Advanced_Micro_Devices_AM9080.jpg

www.hazelcast.com

MULTICORE IS NOT NEW

Source: http://en.wikipedia.org/wiki/File:80386with387.JPG

www.hazelcast.com

CLUSTER IT

Source: http://rarecpus.com/images2/cpu_cluster.jpg

www.hazelcast.com

SUPER COMPUTER

Source: http://www.dkrz.de/about/aufgaben/dkrz-geschichte/rechnerhistorie-1

www.hazelcast.com

CLOUD COMPUTING

Source: https://farm6.staticflickr.com/5523/11407118963_e0e0870846_b_d.jpg

www.hazelcast.com

DISTRIBUTEDEXECUTORSERVICE

THE WHOLE CLUSTER IN YOUR HANDS

www.hazelcast.com

WHY A DISTRIBUTED EXECUTORSERVICE?j.l.Runnable / j.u.c.CallableOnly needs to be serializableSame Task all / multiple NodesShould not work on Data

www.hazelcast.com

Print node name on all nodes

QUICK EXAMPLE

Runnable runnable = () -> println("Running on Node: " + member.node);IExecutorService executorService = hazelcastInstance.getExecutorService("default");executorService.executeOnAllMembers(runnable);

www.hazelcast.com

DEMONSTRATION

www.hazelcast.com

ENTRYPROCESSORLOCKFREE DATA OPERATIONS

www.hazelcast.com

WHY ENTRYPROCESSOR?Prevents external LockingGuarantees AtomicityKinda "Cluster-wide Thread-Safe"

www.hazelcast.com

Incrementing a counter atomically

QUICK EXAMPLE

private int increment(Map.Entry entry) { val newValue = entry.getValue() + 1; entry.setValue(newValue); return newValue;}

IMap map = hazelcastInstance.getMap("default");int newId = map.executeOnKey("idgen", this::increment);

www.hazelcast.com

DEMONSTRATION

www.hazelcast.com

MAP & REDUCETHE BLACK MAGIC FROM PLANET GOOGLE

www.hazelcast.com

USE CASESLog AnalysisData QueryingAggregation and summingDistributed SortETL (Extract Transform Load)and more...

www.hazelcast.com

SIMPLE STEPSReadMap / TransformReduce

www.hazelcast.com

FULL STEPSReadMap / TransformCombiningGrouping / ShufflingReduceCollating

www.hazelcast.com

MAPREDUCE WORKFLOW

www.hazelcast.com

Data are mapped / transformed in a set of key-value pairs

SOME PSEUDO CODE (1/3)

MAPPING

map( key:String, document:String ):Void -> for each w:Word in document: emit( w, 1 )

www.hazelcast.com

Multiple values are combined to an intermediate result to preserve traffic

SOME PSEUDO CODE (2/3)

COMBINING

combine( word:Word, counts:List[Int] ):Void -> emit( word, sum( counts ) )

www.hazelcast.com

Values are reduced / aggregated to the requested result

SOME PSEUDO CODE (3/3)

REDUCING

reduce( word:String, counts:List[Int] ):Int -> return sum( counts )

www.hazelcast.com

FOR MATHEMATICIANSProcess: (K x V)* → (L x W)* ⇒ [(l1, w1), …, (lm, wm)]

Mapping: (K x V) → (L x W)* ⇒ (k, v) → [(l1, w1), …, (ln, wn)]

Reducing: L x W* → X* ⇒ (l, [w1, …, wn]) → [x1, …,xn]

www.hazelcast.com

MAPREDUCE PROGRAMS INGOOGLE SOURCE TREE

Source: http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0005.html

www.hazelcast.com

DEMONSTRATION

www.hazelcast.com

@noctarius2k@hazelcast

http://www.sourceprojects.comhttp://github.com/noctarius

THANK YOU!ANY QUESTIONS?

Images: All images are licensed under Creative Commons

www.hazelcast.com

top related