a recommendation system illustrated with...

55
A Recommendation System Illustrated with Spark Lauri Niskanen [email protected]

Upload: others

Post on 22-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

A RecommendationSystem Illustrated

with SparkLauri Niskanen

[email protected]

Page 2: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

AgendaScala - some nice partsScala - syntax illustrated with few examplesAkka framework - short introductionSpark framework, concepts and main featuresRecommondation system principlesA simple recommandation system demo using SparkSummary

Page 3: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Scala, some nice partsFunctional compiled programming with a strong type systemCurryingImmutability embracedPattern matching constructsAnonymous functions and closuresScala collections

Page 4: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Scala syntax - some selected parts

Page 5: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Scala - Must have Hello World exmpale

val str:String="Hello Scala World"

str split(" ") foreach({l => println(l)})

HelloScalaWorld

Notes

val's are immutableeverything is typed, eventhough some times can be omitted and is inferred by thecompilerno operators => every operator is a method of some objectdots are not needed in chaining methods => DSL's arising…

Page 6: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Scala - Spice up with curryingfunctions can have multiple parameter lists, invoked at later timehandy to "initialize" with some other functionExample: calculate distance values on the same data

import scala.math.{sqrt,abs,pow}import scala.util.Random

case class Pair(x:Double,y:Double)

val euclDist=(a:Pair,b:Pair)=>sqrt(pow(a.x-b.x,2)+pow(a.y-b.y,2))val manhDist=(a:Pair,b:Pair)=>abs(a.x-b.x)+abs(a.y-b.y)

val R=new Random()val scoresA=(for (i <- Range(1,10)) yield Pair(R.nextDouble*5.0+0.5,R.nextDouble*5.0+0.5)).toListval scoresB=(for (i <- Range(1,10)) yield Pair(R.nextDouble*5.0+0.5,R.nextDouble*5.0+0.5)).toList

def setData(x:List[Pair],y:List[Pair])(func:(Pair,Pair)=>Double)={ x.zip(y).map({e=> func(e._1,e._2)}).sum}

val distance=setData(scoresA,scoresB)(_)

println("Eucl.Distance: "+distance(euclDist))println("Manh.Distance: "+distance(manhDist))

Eucl.Distance: 21.093889034545143Manh.Distance: 27.09414385739492

Page 7: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Scala - The RegExp matching artNice and simple extractor

import scala.util.matching.Regex

val testStr="5,Father of the Bride Part II (1995),Comedy"

val Movie= """(\d+),([̂,]*)(.*)""".r

val movieTitle= testStr match { case Movie(id,title,rest)=> title case _ => ""}

println(movieTitle)

Father of the Bride Part II (1995)

Page 8: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Scala - Close up with functional magic

case class Sales(amount:Int,price:Double,product:String)

val data:List[Sales]=List(Sales(10,20.5,"Apple"),Sales(21,39.0,"Apple"),Sales(10,18.0,"Orange"),Sales(30,27.0,"Orange"))

case class Summary(amount:Int,price:Double)

val averagePricesPerFruit=data.groupBy({e=>e.product}).mapValues({v=>v.foldLeft(Summary(0,0.0))({(acc,a)=>Summary(acc.amount+a.amount,acc.price+a.price)})}).mapValues({s=>s.price/s.amount})

println(averagePricesPerFruit)

Map(Orange -> 1.125, Apple -> 1.9193548387096775)

Is this readable?…well not really. Let's see the same in step wise motion.

Page 9: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Same in step wise motion

case class Sales(amount:Int,price:Double,product:String)

val data:List[Sales]=List(Sales(10,20.5,"Apple"),Sales(21,39.0,"Apple"),Sales(10,18.0,"Orange"),Sales(30,27.0,"Orange"))

case class Summary(amount:Int,price:Double)

//GroupBy fruitprintln(data.groupBy({e=>e.product}))

//Sum the amounts and prices for each fruit println(data.groupBy({e=>e.product}).mapValues({v=>v.foldLeft(Summary(0,0.0))({(acc,a)=>Summary(acc.amount+a.amount,acc.price+a.price)})}))

//And in last stage count the averageprintln(data.groupBy({e=>e.product}).mapValues({v=>v.foldLeft(Summary(0,0.0))({(acc,a)=>Summary(acc.amount+a.amount,acc.price+a.price)})}).mapValues({s=>s.price/s.amount}))

Map(Orange -> List(Sales(10,18.0,Orange), Sales(30,27.0,Orange)), Apple -> List(Sales(10,20.5,Apple), Sales(21,39.0,Apple)))Map(Orange -> Summary(40,45.0), Apple -> Summary(31,59.5))Map(Orange -> 1.125, Apple -> 1.9193548387096775)

Page 10: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Scala - Why functionality andimmutability matters

You cannot bring all big data to your host environment for calculationYou may need to count results on partitions (segments) of data in remote machinebefore getting the resultsImmutable data structures and closures (functions with data essentially) are passedon to remote partitions to do work for youStrong static typed systems are nice for production because runtime errors arelimted to different types as input and output are compile time secured withstrongly typed signaturesAbove mentioned things enable good chaining of functions in parallel operations

Page 11: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Akka - in few slidesAkka is a asynchronous Actor Model system (encapsulate the behavior and state)

with message based communicatotion and supervision structure inspired byframeworks in Erlang programming language. Attractiveness comes from

Picture by Ryan Knight.

1. Light weigthness (no threading)2. Isolation (of actors)3. Transparent restart of an actor upon a failure4. Messaging across devices or processes

Used under the hood in the Spark among others.

Page 12: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Akka - ping pong, re-starting example

Page 13: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str
Page 14: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Akka - ping pong ActorSystem and supervisor

import akka.actor.{Actor,ActorRef,ActorSystem,PoisonPill,ExtensionKey,Extension}import akka.actor.{Props,ExtendedActorSystem}import com.typesafe.config.{ConfigFactory,ConfigValueFactory}

object Wizard { def run()={ val conf = ConfigFactory.load() val system = ActorSystem("ActorSystem") val supervisor = system.actorOf(Props(classOf[Supervisor]),name="Controller") }}

class Supervisor extends Actor { //Just to retrieve long address form val remoteAddr = RemoteAddressExtension(context.system).address val thisPath = self.path.toStringWithAddress(remoteAddr) println("\nConstructor: " + thisPath) val ponger=context.actorOf(Props(classOf[Ponger]),name="ponger") val pinger=context.actorOf(Props(classOf[Pinger],ponger),name="pinger")

// Actor's message receive handling def receive = { case "STOP-THE-SYSTEM" => println("STOP-THE-SYSTEM received from "+sender);context.system.shutdown() case msg => }

override def preStart() = { println("preRestart called for " + thisPath) } override def postRestart(reason: Throwable) = { println("postRestart called for " + thisPath) } override def postStop() = { println("postStop called for " + thisPath) }}

Page 15: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Akka - ping pong cont

class Pinger(peer:ActorRef) extends Actor { val remoteAddr = RemoteAddressExtension(context.system).address val thisPath = self.path.toStringWithAddress(remoteAddr) println("Constructor: " + thisPath)

peer ! "TEST"

def receive = { case "I-WANT-OUT" => sender ! PoisonPill; context.parent ! "STOP-THE-SYSTEM" case someMsg => println(someMsg + " received from "+sender); sender ! "SHOW-ME-CRASH" }

override def preStart() = { println("preRestart called for " + thisPath) } override def postRestart(reason: Throwable) = { println("postRestart called for " + thisPath) } override def postStop() = { println("postStop called for " + thisPath) }

}

Page 16: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Akka - ping pong cont2

class Ponger extends Actor { val remoteAddr = RemoteAddressExtension(context.system).address val thisPath = self.path.toStringWithAddress(remoteAddr) println("Constructor: " + thisPath)

def receive = { case "TEST" => println("TEST received from "+ sender); sender ! "ROGER" case "ONE-MORE" => println("ONE-MORE received from "+sender); sender ! "I-WANT-OUT" case "SHOW-ME-CRASH" => println("SHOW-ME-CRASH received from"+sender); 1/0 }

override def preStart() = { println("preRestart called for " + thisPath) } override def postRestart(reason: Throwable) = { println("postRestart called with reason " + reason) context.actorSelection("akka.tcp://[email protected]:2552/user/Controller/pinger") ! "I-WANT-OUT" } override def postStop() = { println("postStop called for " + thisPath) }}

//This is just means to show the complete path in the systemclass RemoteAddressExtensionImpl(system: ExtendedActorSystem) extends Extension { def address = system.provider.getDefaultAddress}

object RemoteAddressExtension extends ExtensionKey[RemoteAddressExtensionImpl]

Page 17: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Akka - config used

akka { actor { provider = "akka.remote.RemoteActorRefProvider" } remote { enabled-transports = ["akka.remote.netty.tcp"] netty.tcp { hostname = "127.0.0.1" port = 2552 } }}

Page 18: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Akka - example output

scala> Wizard.run[INFO] [11/09/2015 19:13:45.200] [run-main-4] [Remoting] Starting remoting[INFO] [11/09/2015 19:13:45.377] [run-main-4] [Remoting] Remoting started; listening on addresses :[akka.tcp://[email protected]:2552][INFO] [11/09/2015 19:13:45.381] [run-main-4] [Remoting] Remoting now listens on addresses: [akka.tcp://[email protected]:2552]Constructor: akka.tcp://[email protected]:2552/user/ControllerConstructor: akka.tcp://[email protected]:2552/user/Controller/pongerpreRestart called for akka.tcp://[email protected]:2552/user/Controller/pongerConstructor: akka.tcp://[email protected]:2552/user/Controller/pingerpreRestart called for akka.tcp://[email protected]:2552/user/ControllerTEST received from Actor[akka://ActorSystem/user/Controller/pinger#-454979336]preRestart called for akka.tcp://[email protected]:2552/user/Controller/pingerROGER received from Actor[akka://ActorSystem/user/Controller/ponger#-1639700423]SHOW-ME-CRASH received fromActor[akka://ActorSystem/user/Controller/pinger#-454979336]postStop called for akka.tcp://[email protected]:2552/user/Controller/ponger[ERROR] [11/09/2015 19:13:45.405] [ActorSystem-akka.actor.default-dispatcher-2] [akka://ActorSystem/user/Controller/ponger] / by zerojava.lang.ArithmeticException: / by zero at Ponger$$anonfun$receive$3.applyOrElse(some.scala:76) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at Ponger.aroundReceive(some.scala:68)Constructor: akka.tcp://[email protected]:2552/user/Controller/pongerpostRestart called with reason java.lang.ArithmeticException: / by zeroSTOP-THE-SYSTEM received from Actor[akka://ActorSystem/user/Controller/pinger#-454979336]postStop called for akka.tcp://[email protected]:2552/user/Controller/pongerpostStop called for akka.tcp://[email protected]:2552/user/Controller/pingerpostStop called for akka.tcp://[email protected]:2552/user/Controller[INFO] [11/09/2015 19:13:45.427] [ActorSystem-akka.remote.default-remote-dispatcher-6] [akka.tcp://[email protected]:2552/system/remoting-terminator] Shutting down remote daemon.[INFO] [11/09/2015 19:13:45.429] [ActorSystem-akka.remote.default-remote-dispatcher-6] [akka.tcp://[email protected]:2552/system/remoting-terminator] Remote daemon shut down; proceeding with flushing remote transports.[INFO] [11/09/2015 19:13:45.459] [ActorSystem-akka.actor.default-dispatcher-4] [Remoting] Remoting shut down[INFO] [11/09/2015 19:13:45.459] [ActorSystem-akka.remote.default-remote-dispatcher-6] [akka.tcp://[email protected]:2552/system/remoting-terminator] Remoting shut down.

Page 19: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Akka - example summaryAsynchronous actors are lightweightCommunication is trivial message basedComms works nicely across hostsCrash recovery fast (let it crash)

Page 20: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Akka - Supervisor and actor creation

import akka.actor.{Actor,ActorRef,ActorSystem,PoisonPill,ExtensionKey,Extension}import akka.actor.{Props,ExtendedActorSystem}import com.typesafe.config.{ConfigFactory,ConfigValueFactory}

//Object to run actor systemobject Wizard { def run()={ val conf = ConfigFactory.load() val system = ActorSystem("PingPongActorSystem") val supervisor = system.actorOf(Props(classOf[Supervisor]),name="Controller") }}

// create supervising actor called Supervisor (here)class Supervisor extends Actor { //Just to retrieve long address form val remoteAddr = RemoteAddressExtension(context.system).address val thisPath = self.path.toStringWithAddress(remoteAddr)

//Create supervised actor val ponger=context.actorOf(Props(classOf[Ponger]),name="ponger")

//Create another supervised actor and //pass the reference of the ponger in the constructor val pinger=context.actorOf(Props(classOf[Pinger],ponger),name="pinger")

Page 21: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Akka - Message handler andsupervisioning callbacks

// Actor's message receive handling def receive = { case "STOP-THE-SYSTEM" => println("STOP-THE-SYSTEM received from "+sender);context.system.shutdown() case someMsg => println("message "+someMsg + "received from " + sender) }

//Right after starting the actor, its preStart method is invoked. override def preStart() = { println("preRestart called for " + thisPath) }

//The old actor is informed by calling preRestart with the exception which //caused the restart and the message which triggered that exception. override def preStart(reason: Throwable message: Option[Any]) = { println("preRestart called for " + thisPath) }

//The new actor’s postRestart method is invoked with the exception which caused the restart. override def postRestart(reason: Throwable) = { println("postRestart called for " + thisPath) }

//After stopping an actor, its postStop hook is called. override def postStop() = { println("postStop called for " + thisPath) }}

Page 22: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

So what is SparkGeneral engine for large-scale data processingParalell and in-memory (or file base or combination) data management systemLibraries supporting streaming, data frames,SQL, graph analysis and MachineLearningAPI's for , , , Written in Scala, utilizes underneath previously mentioned Akka frameworkData source support for HDFS, Cassandara, HBase …and good old plain text files

Scala Java Python R

Page 23: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Spark history in 30 secondsStarted as a research project in UC Berkley RAD Lab 2009At introduciontion time was already 10-20x faster than hadoopHadoop Map Reduce was not good enough for iterative and interactivedevelopmentOpen sourced 2010Became part of Apache Foundation on 2013

Page 24: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Spark is active

Page 25: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Spark concepts

Driver connects via context to a cluster of spark nodes co-ordinated by a clustermanager (transparent)Application is essentially your code, the driver + your app's executorsRDD (Resilient Distributed Data) represents immutable data on partitionsTransformations are done via RDD functions dived into smaller independent tasksData (copies,mapped dirs or via cluster filesystem) and code (right versions) mustexist on all cluster nodesTask is split into paralell jobs and further staged tasks

Page 26: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Spark Operations - Good to understandTransformations

operate on your RDD's on the cluster and return new RDD'snot cached (cleared after usage) unless specifically cached (for reuse in nextoperation)

Actions

will actually execute transformationsRETURN DATA …make sure you know the expected return size of the result

Lazy evaluation

All operations are lazy i.e. only executed when needed not at the definition point

Page 27: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Spark Major Data Structures Resilient Distributed Data, such as transformations and actions:RDD

map - apply a function to each element in the RDD, return a new RDDflatMap - Returns a new RDD by first applying a function to all elements of this RDD,and then flattening the results.intersect - Returns an RDD with common elements found in both RDD'scollect - Return all elements from the RDDreduce - Combine the elements of the RDD together in paralelltake - Returns number of elements fromt the RDD

is key value version of RDD with additional functionspairRDD

mapValues - Applies a function to each value without changing the keykeys - A new RDD of the keys in given RDDjoin - A new RDD of the inner joined two RDD's

is equivalent to a relational table in Spark SQLData frames

SQLish access style supported, not covered in this presentation

For distributed environment there and variables as well.Accumulators Broadcast

Page 28: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Spark job execution

Picture by Ashwini Kuntamukkala,Software Architect, SciSpike

Page 29: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Configuring your Spark/home/lniskanen/Spark/spark├── bin│   ├── spark-class│   ├── spark-class2.cmd│   ├── spark-class.cmd│   ├── sparkR│   ├── sparkR2.cmd│   ├── sparkR.cmd│   ├── spark-shell│   ├── spark-shell2.cmd│   ├── spark-shell.cmd│   ├── spark-sql│   ├── spark-submit│   ├── spark-submit2.cmd│   └── spark-submit.cmd├── conf│   ├── slaves│   ├── slaves.template│   ├── spark-defaults.conf.template│   ├── spark-env.sh│   └── spark-env.sh.template└── sbin ├── slaves.sh ├── spark-config.sh ├── spark-daemon.sh ├── spark-daemons.sh ├── start-all.sh ├── start-history-server.sh ├── start-master.sh ├── start-mesos-dispatcher.sh ├── start-mesos-shuffle-service.sh ├── start-shuffle-service.sh ├── start-slave.sh ├── start-slaves.sh └── start-thriftserver.sh

3 directories, 31 files

Page 30: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Fire up spark cluster and Spark UI

$SPARK_HOME/sbin/start-all.sh

To view cluster and jobs status on your browser open SparkUI on your local host port 8080

http://127.0.1.1:8080/

Page 31: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Starting spark-shell (REPL)

lniskanen@Machine:~/Spark/spark$ ./bin/spark-shell --master spark://Machine:7077 --jars /home/lniskanen/ScalaApps/RecommendationSystem/recommender/target/scala-2.11/recommender_2.11-0.0.1.jar,/home/lniskanen/ScalaApps/RecommendationSystem/common/target/scala-2.11/common_2.11-0.0.1.jar Spark context available as sc.SQL context available as sqlContext.Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ ̀/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.1 /_/

Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)Type in expressions to have them evaluated.Type :help for more information.

scala> sc.accumulable clearCallSite getLocalProperty master setLocalProperty accumulableCollection clearFiles getPersistentRDDs metricsSystem setLogLevel accumulator clearJars getPoolForName newAPIHadoopFile sparkUser addFile clearJobGroup getRDDStorageInfo newAPIHadoopRDD startTime addJar defaultMinPartitions getSchedulingMode objectFile statusTracker addSparkListener defaultMinSplits hadoopConfiguration parallelize stop appName defaultParallelism hadoopFile range submitJob applicationAttemptId emptyRDD hadoopRDD requestExecutors tachyonFolderName applicationId externalBlockStoreFolderName initLocalProperties runApproximateJob textFile asInstanceOf files isInstanceOf runJob toString

Page 32: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

String binaryFiles getAllPools isLocal sequenceFile un

ion binaryRecords getCheckpointDir jars setCallSite version broadcast getConf killExecutor setCheckpointDir wholeTextFiles cancelAllJobs getExecutorMemoryStatus killExecutors setJobDescription cancelJobGroup getExecutorStorageStatus makeRDD setJobGroup

scala> sc. |

Page 33: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

SparkUi - example

Page 34: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Recommendation systems very shortlyAccording to Wikipedia:

Recommender systems or recommendation systems(sometimes replacing "system" with a synonym such as

platform or engine) are a subclass of information filteringsystem that seek to predict the 'rating' or 'preference'

that a user would give to an item.

Page 35: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

For seperation use some metricDistance

Dis =tEucl ( −∑ni xi yi)2

− −−−−−−−−−−√Similarities

Similarit =yEucl1

1+ ( −∑ni xi yi)

2√

Similarit =yPearson−∑n

i XiYi

∑ni

Xi ∑ni

Yi

N

( − )( − )∑ni Xi

(∑ni

Xi)2

N∑n

i Yi

(∑ni

Yi)2

N√

Page 36: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Movie dataset from GroupLensMovie dataset kindly provided by the University of MinnesotaGroupLens is a research lab in the Department of Computer Science andEngineering at the University of Minnesota

It contains 20000263 ratings (only first 100k rows used here) and 465564 tagapplications across 27278 movies.All selected users had rated at least 20 movies.

http://grouplens.org/datasets/http://files.grouplens.org/datasets/movielens/ml-20m-README.html

Page 37: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Data set structure

Using only:

movies.csv

title of the moviegenre

ratings.csv

one rating of one movie by one user, at least 20 ratings per userratings (0.5 stars - 5.0 stars)

Page 38: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Movies data

import scala.io.Source

val movies="/home/lniskanen/Ammatti/DataScienceMeetup/2015_December/ml-20m/movies.csv"

val movieIter = io.Source.fromFile(movies).getLines()

movieIter.take(10).foreach(println)

movieId,title,genres1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy2,Jumanji (1995),Adventure|Children|Fantasy3,Grumpier Old Men (1995),Comedy|Romance4,Waiting to Exhale (1995),Comedy|Drama|Romance5,Father of the Bride Part II (1995),Comedy6,Heat (1995),Action|Crime|Thriller7,Sabrina (1995),Comedy|Romance8,Tom and Huck (1995),Adventure|Children9,Sudden Death (1995),Action

Page 39: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Ratings data

val ratings="/home/lniskanen/Ammatti/DataScienceMeetup/2015_December/ml-20m/ratings.csv"val ratingsIter = io.Source.fromFile(ratings).getLines()ratingsIter.take(10).foreach(println)

userId,movieId,rating,timestamp1,2,3.5,11124860271,29,3.5,11124846761,32,3.5,11124848191,47,3.5,11124847271,50,3.5,11124845801,112,3.5,10947857401,151,4.0,10947857341,223,4.0,11124855731,253,4.0,1112484940

Page 40: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Project structure for the demo/home/lniskanen/ScalaApps/RecommendationSystem├── build.sbt├── common│   ├── src│   │   └── main│   │   └── scala│   └── target│   └── scala-2.11│   ├── common_2.11-0.0.1.jar│   └── common-assembly-0.0.1.jar├── project│   ├── build.properties│   ├── Build.scala│   └── target├── recommender│   ├── src│   │   └── main│   │   ├── resources│   │   └── scala│   └── target│   └── scala-2.11│   ├── recommender_2.11-0.0.1.jar│   └── recommender-assembly-0.0.1.jar├── src│   └── main│   ├── resources│   └── scala│   └── recommBootstrapper.scala└── target └── scala-2.11 ├── rmsbootstrapper_2.11-0.0.1.jar └── rmsBootstrapper-uber.jar

21 directories, 10 files

Page 41: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Project component relations

Page 42: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Data types used here

abstract sealed trait MovieType

case class Movie(movieId:Int,movie:String,genres:String) extends MovieType

case class Rating(userId:Int,movieId:Int,rating:Double,timeStamp:Long) extends MovieType

Case Class is a construct in Scala for which scala compiler:

prefixes parameters with val (immutable)generates equals and hashCode for object comparisongenerates also copy, companion object with apply and unapply methods

Using sealed in Scala means that all definitions are only in this file.

Page 43: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Parsing a CSV file

package recommender

import java.io.{File}import scala.io.{Source,BufferedSource}

// The constructor is initialized with a parser function that// accepts a string and returns some abstract type Option[T] class CsvIterator[T](file:File,parse:(String)=>Option[T]) extends Iterator[Option[T]] {

val bufSource:BufferedSource=Source.fromFile(file.getAbsoluteFile()) val linesIter=bufSource.getLines()

override def hasNext:Boolean={ linesIter.hasNext}

override def next():Option[T]={ parse(linesIter.next())

}

}

Readily packages csv readers available such as at .OpenCsv Maven repository

Page 44: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Loading the movies data into Sparkcluster

def loadMovies(filePath:String):Unit={ // just replace in linux prefix ~ (if any) with actual home directory path val path = filePath.replaceFirst("̂~",System.getProperty("user.home"))

//Loading data as RDD type movies:Option[RDD[Rating]]=Some(sc.makeRDD(new CsvIterator[Movie](new File(path),str2Movies).filter({e=> e != None}).map({c=>c.get}).toSeq).cache())

println("movies loading finished...")}

// Csv parsing functiondef str2Movies(str:String):Option[Movie]={ val MovieReg="""(\d+),(.+),(.+)""".r str match { case MovieReg(id,m,_) => Some(Movie(id.toInt,m)) case _ => None }}

In Scala Option[T] is a container for an optional value of type T. If the value of type T ispresent, the Option[T] is an instance of Some[T], containing the present value of type

T. If the value is absent, the Option[T] is the object None.

Page 45: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Example snippets - similarity

/** * Calculate euclidean "similarity" for ratings of two users * * @param uid1 Rating.userId * @param uid2 Rating.userId * @return Similarity */ def eucl_sim(uid1:Int,uid2:Int):Double={

// search records for uid1 and uid2, groupBy movieId field val uid1Ratings=ratings.get.filter({i=> i.userId==uid1}).groupBy({r=>r.movieId}) val uid2Ratings=ratings.get.filter({i=> i.userId==uid2}).groupBy({r=>r.movieId}) // join ratings by movieId val commonRatings=uid1Ratings.join(uid2Ratings)

val N=commonRatings.keys.count

val dist = N match { case 0 => 10000 // no ratings in common, some arbitrary high distance case _ => commonRatings.mapValues({v=>pow(v._1.head.rating-v._2.head.rating,2)}).values.sum }

//Return similarity 1.0/(1.0+dist) }

Page 46: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Example snippets - get commonratings

/** * Movies by uid2 not rated by uid1. * *@return count of items only in uid1 and only in uid2, and common in both */def disjointCommon(uid1:Int,uid2:Int):(Long,Long,Long)={ // search records for uid1 and uid2, groupBy movieId field val uid1Ratings=ratings.get.filter({i=> i.userId==uid1}).groupBy({r=>r.movieId}).cache() val uid2Ratings=ratings.get.filter({i=> i.userId==uid2}).groupBy({r=>r.movieId}).cache()

val notInUid1=uid2Ratings.subtractByKey(uid1Ratings).keys.count

val notInUid2=uid1Ratings.subtractByKey(uid2Ratings).keys.count

val common=uid1Ratings.join(uid2Ratings).map({k=>k._1}).collect.length

(notInUid2,notInUid1,common)}

Page 47: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Example snippets - show closest

/** * Get number of common ratings with uid1 and uid2 * * @param uid1 Rating.userId * @param uid2 Rating.userId */ def commonRatings(uid1:Int,uid2:Int):Long={ lazy val uid1Ratings=ratings.get.filter({i=> i.userId==uid1}).groupBy({r=>r.movieId}) lazy val uid2Ratings=ratings.get.filter({i=> i.userId==uid2}).groupBy({r=>r.movieId}) uid1Ratings.join(uid2Ratings).map({k=>k._1}).collect.length

}

/** Display most similar movie rater comparted to uid based on euclidean similarity where compared rater has at least 8 common movies with the uid rater. * * @param uid Rating.userId */ def closest(uid:Int):Unit={ val uids=ratings.get.groupBy({r=>r.userId}).keys.filter({i=>i != uid}).collect val m=uids.filter({k=> commonRatings(uid,k) > 7}).map({u=> (u,eucl_sim(uid,u))})

val max = m.maxBy({e=>e._2})

println("other recommenders: "+uids+ " and closest uid is "+ max) }

Page 48: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Example snippets - show movierecommendations

/** * Movies rated bt uid2 but NOT rated by uid1. */ def notRatedByUid1(uid1:Int,uid2:Int):Array[Int]={ // search records for uid1 and uid2, groupBy movieId field val uid1Ratings=ratings.get.filter({i=> i.userId==uid1}).groupBy({r=>r.movieId}) val uid2Ratings=ratings.get.filter({i=> i.userId==uid2}).groupBy({r=>r.movieId})

uid2Ratings.subtractByKey(uid1Ratings).keys.collect.toArray

}/** * Returns list of movie names for given list of movieId's */ def movieNames(movieIds:Array[Int]):Array[String]={

val mkeys=movies.get.groupBy(_.movieId)

val res:Array[String]=movieIds.map({id=>mkeys.lookup(id)}).map({s=> s.map({i=>i.toArray}).flatten }).flatten.map(_.movie).toArray

res }

def showMovieProposals(uid1:Int,uid2:Int):Unit={ movieNames(notRatedByUid1(uid1,uid2)).foreach(println) }

Page 49: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Demo - using command lineapplication

You can bunddle scala client apps to fat jars with all includeddependencies, and then launch it from the command line

with java.

~/ScalaApps/RecommendationSystem/target/scala-2.11$ java -Dlog4j.configuration=file:/home/lniskanen/Spark/spark/conf/log4j.properties -jar ./rmsBootstrapper-uber.jar

Page 50: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Demo - using spark-shell

./bin/spark-shell –master <master address> –jars<some1.jar>,<some2.jar>

./bin/spark-shell --master spark://Machine:7077 --jars /home/lniskanen/ScalaApps/RecommendationSystem/recommender/target/scala-2.11/recommender_2.11-0.0.1.jar,/home/lniskanen/ScalaApps/RecommendationSystem/common/target/scala-2.11/common_2.11-0.0.1.jar

Page 51: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

SummaryGlimpse of Scala, reasons why to use it in big data environmentsBasics of Akka, the framework under the hood of SparkSpark basic conceptsHow to get started with Spark cluster and spark-shellLoading data into Spark and making basic queriesVery basic example of recommendation system application

In my trainings you can further learn how to

get the user's expected rating for a given unwatched movie (product)find similar moviesdo error analysis for predictions and iterate better algorithms

Page 52: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Investigating GPU accelerationAccelerating calculations with GPU's using from Gpu SystemsLibra SDK provides GPU agnostic interface to GPU programming

is a realtime compute provider and accelerates computations usingmodern accelerators such as CPUs, GPUs and future devices while simplifiyingimplementations.

Libra SDK

Gpu Systems

Page 53: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

A short list of my consultation andtraining services

Getting started with ML projects - How to do data intelligence projectsPredictive Analytics - Elaborated hands on examples using RRecommendation Systems - Techniques behind Netflix successProgramming in R - Defacto data science programming languageProgramming in Scala - Functional and object flavored language for reliable, scalablesystemsReactive programming - Using Akka framework with ScalaFast access to big data using in-memory technologies - Spark with ScalaBecomming a Data Scientist - Compentencies and Career Planning

Page 54: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

Who am I

Big Data SW Product Architect and Machine Learning, R andScala -programming consultantSymbio 2009-2014 Presales, Business Director, Global IT DirectorArdites 2005-2009 Founder of Tampere operations, SW Development businessNokia 1996-2005 Mobile phones R&D; Cellular Testing, SW Architect,SW ReleaseManager,Technology Director

LinkedIn: Lauri NiskanenIntelligentpipe Oy

Page 55: A Recommendation System Illustrated with Sparkaaltone3/Spark-illustrated-with-simple-recommendati… · Scala - Must have Hello World exmpale val str:String="Hello Scala World" str

- FOR MORE INSIGHTSNÄKEMYKSIIN