apache spark talk @ the amsterdam applied machine learning meetup group

GoDataDrivenPROUDLY PART OF THE XEBIA GROUP

@fzk frisovanvollenhoven@godatadriven.com

Apache Spark

Friso van Vollenhoven

for applied machine learning

GoDataDriven

This talk is about tools.

GoDataDriven

Resilient Distributed Dataset

• Immutable set of records (e.g. tuples)

• Distributed across a cluster of workers

• Stored in RAM or on disk (partially)

• Built through transformations

• Automatically rebuilt on failure

• Possibly replicated

GoDataDriven

Operations

• Operate on RDD’s

• Create a new RDD

• Or materialise RDD and return data

• Transformations: map, filter, groupBy, etc.

• Actions: count, collect, reduce, save, etc.

GoDataDriven

The good parts

• Language bindings for Java, Scala and Python

• Works interactively from a shell:

• Scala + IPython (notebook)

• Plays nice with Hadoop

• Deploy on top of YARN cluster manager

• Read data from HDFS

• Hadoop-like fault tolerance

The better part?https://github.com/Bridgewater/scala-notebook

https://github.com/Sotera/spark-distributed-louvain-modularity

GoDataDriven

We’re hiring / Questions? / Thank you!

@fzk frisovanvollenhoven@godatadriven.com

Friso van Vollenhoven

apache spark talk @ the amsterdam applied machine learning meetup group

cluster of workersstored

yarn cluster managerread

xebia group

ipython notebookplays

disk partiallybuilt

new rddor

good parts language

fault tolerance

Technology

uvod u apache spark zagreb meetup

apache spark part of eindhoven java meetup

knime italy meetup - going big data on apache spark

developing apache spark applications · apache spark...

barcelona spain apache spark meetup oct 20, 2015: spark...

classical distributed computing studies. washington dc...

introducing apache prediction io (incubating) (bay area...

advanced apache spark meetup spark and elasticsearch...

apache spark - las vegas big data meetup dec 3rd 2014

apache spark - lightning fast cluster computing - hyderabad...

meetup#4, apache spark as sql engine

reading cassandra meetup feb 2015: apache spark

spark sql | apache spark

dublin spark meetup - meetup 1 - intro to spark

apache spark-melbourne-april-2015-meetup

ankara spark meetup - big data & apache spark mimarisi...

talend spark meetup 03042017 - paris spark meetup

using apache spark to fight world hunger - spark meetup

[spark meetup] spark streaming overview

advanced analytics and recommendations with apache spark -...