csc 536 lecture 2
DESCRIPTION
CSC 536 Lecture 2. Outline. Concurrency on the JVM (and between JVMs ) Working problem Java concurrency tools (review) Solution using traditional Java concurrency tools Solution using Akka concurrency tools Overview of Akka. Working problem. Users. sammy. ellie. etc. docs. - PowerPoint PPT PresentationTRANSCRIPT
CSC 536 Lecture 2
Outline
Concurrency on the JVM (and between JVMs)Working problemJava concurrency tools (review)Solution using traditional Java concurrency toolsSolution using Akka concurrency toolsOverview of Akka
Compute the total size of all regular files stored,directly or indirectly, in a directory
> java Sequential C:\WindowsTotal size: 34405975972Time taken: 47.777222426
Working problem
Users
docsetc
elliesammy
foo.txt bar.txt
xyz.txt
abc.txt
A recursive solution
Basis step: if input is a regular file, return its size
Recursive step: if input is a directory, call function recursively on every item in the directory, add up the returned values and return the sum
(Depth-First Traversal)
Sequential.java
Threads
Should be using threads to traverse filesystem in parallel
A thread is a “lightweight process”A thread really lives inside a process
A thread has its own:program counterstackregister set
A thread shares with other threads in the processcodeglobal variables
Interface Runnable
Must be implemented by any class that will be executed by a thread
Implement method run() with code the thread will run
Anonymous class example:
new Runnable() { public void run() { // code to be run by thread }}
Class Thread
Encapsulates a thread of execution in a program To execute a thread:
An instance of a Runnable class is passed as an argument when creating the threadThe thread is started with method start()
Example:
Runnable r = new Runnable() { public void run() { // code executed by thread }};new Thread(r).start();
Class Thread
Encapsulates a thread of execution in a program To execute a thread:
An instance of a Runnable class is passed as an argument when creating the threadThe thread is started with method start()
Example:
Runnable r = new Runnable() { public void run() { // code executed by thread }};new Thread(r).start();
Issue with threads: synchronizing access to shared data
Producer-Consumer example
SetupA shared memory bufferProducer puts objects into the bufferConsumer reads objects from the buffer
ProducerConsumerTest.java, UnsyncBuffer.java
Producer-Consumer example
SetupA shared memory bufferProducer puts objects into the bufferConsumer reads objects from the buffer
ProducerConsumerTest.java, UnsyncBuffer.java
Problem:producer can over-produce, consumer can over-consume (example of race condition)Need to synchronize (coordinate) the processes
Synchronization
Mechanisms that ensure that concurrent threads/processes do not render shared data inconsistent
Three most widely used synchronization mechanisms in centralized systems are
SemaphoresLocksMonitors
Monitors
Monitor = Set of operations + set of variables + lockSet of variables is the monitor’s stateVariables can be accessed only by the monitor’s operationsAt most one thread can be active within the monitor at a timeTo execute a monitor’s operation, thread A must obtain the monitor’s lockIf thread B holds the monitor’s lock, thread A must wait on the monitor’s queue (wait)Once thread A is done with the monitor’s lock, it must release it so that other threads can obtain it (notify)
Synchronization in Java
Each Java class becomes a monitor when at least one of its methods uses the synchronized modifier
The synchronized modifier is used to write code blocks and methods that require a thread to obtain a lockSynchronization is always done with respect to an object
ProducerConsumerTest.java, SyncBuffer.java
Java Memory model (before Java 5)
Before Java 5: ill defineda thread not seeing values written by other threads a thread observing impossible behaviors by other threads
Java 5 and laterMonitor lock rule: a release of a lock happens before the subsequent acquire of the same lock Volatile variable rule: a write of a volatile variable happens before every subsequent read of the same volatile variable
Disadvantages of synchronization
Disadvantages:Synchronization is error-prone Synchronization blocks threads and takes timeImproper synchronization results in deadlocksCreating a thread is not a low-overhead operationToo many threads slow down the system
Disadvantages of synchronization
Disadvantages:Synchronization is error-prone Synchronization blocks threads and takes timeImproper synchronization results in deadlocksCreating a thread is not a low-overhead operationToo many threads slow down the system
Thread pooling
Thread pooling is a solution to the thread creation and management problem
The main idea is to create a bunch of threads in advance and have them wait for something to doThe same thread can be recycled for different operations
Thread pool components:A blocking queueA pool of threads
Blocking queue
Queue is a sequence of objects
Two basic operations: enqueuedequeue
Blocking Queue:A dequeue thread must block if the queue is emptyAn enqueue thread must add an object to the queue and notify blocked threads
Blocking queue must be thread safe
Blocking Queue dequeue
To dequeue an object from the queue:Wait until the lock on the queue is obtainedIf the queue is empty, release lock and sleepIf the queue is not empty, pop the first element and return it
To enqueue an object to the queue:Wait until the lock on the queue is obtainedPop the first element and return itNotify any sleeping thread
BlockingQueue.java
Thread Pool = threads + tasks
Thread pool = group of threads + queue of Runnable tasks
Thread pool starts by creating the group of threadsEach thread loops indefinitelyIn every iteration, each thread attempts to dequeue a task from the task queueIf the task queue is empty, block on the queueIf a task is dequeued, run the task
Thread pool method execute(task)simply adds the task to the task queue
ThreadPool.java, ThreadPoolTest.java
Java thread pool API
Interface ExecutorService defines objects that run Runnable tasks
Using method execute()
Class Executors defines factory methods for obtaining a thread pool (i.e. an ExecutorService object)
newFixedThreadPool(n) creates a pool of n threads
ExecutorService service = Executors.newFixedThreadPool(10);
service.execute(new Runnable() { public void run() { // task code });
Compute the total size of all regular files stored,directly or indirectly, in a directory
Back to working problem
Users
docsetc
elliesammy
foo.txt bar.txt
xyz.txt
abc.txt
Modern Java Concurrent solution
Use Runnable objectsCreate Runnable object for every (sub)directory
Use thread poolKeeps the number of threads manageableKeep overhead of thread creation lowReuse threads
Avoid sharing stateVariable totalSize onlyAccess must be synchronized
Concurrent1.java Does not work
AtomicLong
Accumulator variable totalSize is incremented by all threads
Must insure that the incrementing operation (the critical section) is not interrupted by a context switch
Solution 1: Use a Java lock to synchronize access to the critical section
Solution 2: Use class AtomicLongmethod addAndGet() executes as a single atomic instruction
Concurrent1 problem
The main thread must wait until all (sub)directories have been processed
No way to know when that happens
Need to:1. keep track of pending tasks, i.e. (directory processing) task
creation and termination 2. Block the main thread until the number of pending tasks is 0
Modern Java Concurrent solution
Use Runnable objectsCreate Runnable object for every (sub)directory
Use thread poolKeeps the number of threads manageableKeep overhead of thread creation lowReuse threads
Avoid sharing stateVariable totalSize onlyAccess must be synchronized
Require synchronization variablesTo terminate the application Concurrent2.java
CountDownLatch
Synchronization tool that allows one or more threads to wait until a set of operations being performed in other threads completes.
initialized with a given countmethod await() blocks until count reaches 0method countdown() decrements count by 1
After count reaches 0, any subsequent invocations of await return immediately.
A CountDownLatch initialized with a count of 1 serves as a simple on/off gate: all threads invoking await wait at the gate until it is opened by a thread invoking countDown().
An Akka/Scala concurrent solution
Use Akka ActorsTask of processing a directory is given to a worker actor by a master actorWorker actor processes directory
computes the total size of all the regular files and sends it to mastersends to master the (path)name of every sub-directory
Master actorInitiates the processsends tasks to worker actorscollects the total sizekeeps track of pending tasks
ConcurrentAkka.java
Akka
Actor-based concurrency frameworkProvides solutions for non-blocking concurrencyWritten in Scala, but also has Java API Each actor has a state that is invisible to other actorsEach actor has a message queue Actors receive and handle messages
sequentially, therefore no synchronization issues
Actors should rarely blockActors are lightweight and asynchronous
650 bytescan have millions of actors running on a few threads on a single machine
Why use Scala and Akka in DSII?
Distributed computingActors do not share state and interact through messagesActor locations (local vs remote) are transparentAkka developed for distributed applications from ground up
TransactionsScala includes an implementation of Software Transactional Memory
Fault toleranceImplements “let-it-crash” semanticsUses supervisor hierarchies that self-heal
Reliable communicationAkka includes an implementation of reactive streams
Actors
StateSupposed to be invisible to other actors
BehaviorThe actions to be taken in reaction to a message
MailboxActors process messages from mailbox sequentially
ChildrenActors can create other actorsA hierarchy of actors
Supervisor strategyAn actor is supervised by its parent
Actorsclass First extends Actor { def receive = { case "hello" => println("Hello world!") case msg: String => println("Got " + msg + " from " + sender) case _ => println("Unknown message") }}object Server extends App { val system = ActorSystem("FirstExample") val first = system.actorOf(Props[First], name = "first") println("The path associated with first is " + first.path) first ! "hello" first ! "Goodbye" first ! 4}
First.scala
Using sbt
Simple Build Tool (http://www.scala-sbt.org/)Easy to set up
Sample build.sbt configuration file
lazy val root = (project in file(".")).settings ( name := "First Example", version := "1.0", scalaVersion := "2.11.6", resolvers += "Typesafe Repository" at
"http://repo.typesafe.com/typesafe/releases/", libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.3.9")
Abstract Class Actor
Extend Actor class and implement method receive
Method receive should have case statements thatdefine the messages the actor handlesimplement the logic of how messages are handleduse Scala pattern matching
class First extends Actor { def receive = { case "hello" => println("Hello world!") case msg: String => println("Got " + msg) case _ => println("Unknown message") }}
Class ActorSystem
Actors form hierarchies, i.e. a system
Class ActorSystem encapsulates a hierarchy of actors
Class ActorSystem provides methods forcreating actorslooking up actors.
At least the first actor in the system is created using it
Class ActorContext
Class ActorContext also provides methods forcreating actorslooking up actors.
Each actor has its own instance of ActorContext that allows it to create (child) actors and lookup references to actors
Obtaining actor references
Creating actorsActorSystem.actorOf() ActorContext.actorOf()
Both methods return ActorRef reference to new actor
Looking up existing actor by concrete pathActorSystem.actorSelection() ActorContext.actorSelection()
Both methods return ActorSelection reference to new actor
ActorRef or ActorSelection references can be used to send a message to the actor
Class ActorRef
Immutable and serializable handle to an actoractor could be in the same ActorSystem, a different one, or even another, remote JVMobtained from ActorSystem (or indirectly from ActorContext)
ActorRefs can be shared among actors by message passingyou can serialize it, send it over the wire and use it on a remote host and it will still be representing the same Actor on the original node, across the network.In fact, every message carries the ActorRef of the sender
Message passing conversely is their only purpose
Actor System
Class Props
Props is an Actor configuration objectrecipe for creating an actor including associated deployment info
Used when creating new actors through ActorSystem.actorOfActorContext.actorOf
Sending messages
Messages are sent to an Actor through one ofmethod tell or simply !
means “fire-and-forget”, e.g. send a message asynchronously and return immediately.
method ask or simply ? sends a message asynchronously and returns a Future representing a possible reply
Message ordering is guaranteed on a per-sender basis
Tell is the preferred way of sending messages.No blocking waiting for a messageBest concurrency and scalability characteristics
Message ordering
For a given pair of actors, messages sent from the first to the second will be received in the order they were sent
Causality between messages is not guaranteed!Actor A sends message M1 to actor CActor A then sends message M2 to actor BActor B forwards message M2 to actor CActor C may receive M1 and M2 in any order
Also, message delivery is “at-most-once delivery” i.e. no guaranteed delivery
Message ordering
Akka also guarantees
The actor send ruleThe send of the message to an actor happens before the receive of that message by the same actor.
The actor subsequent processing ruleprocessing of one message happens before processing of the next message by the same actor.
Both rules only apply for the same actor instance and are not valid if different actors are used
Messages and immutability
Messages can be any kind of object but have to be immutable.
Scala can’t enforce immutability (yet) so this has to be by convention.Primitives like String, Int, Boolean are always immutable.Apart from these the recommended approach is to use Scala case classes which are immutable (if you don’t explicitly expose the state) and work great with pattern matching at the receiver sideOther good messages types are scala.Tuple2, scala.List, scala.Map which are all immutable and great for pattern matching
Actor API
Scala trait (think partially implemented Java Interface) that defines one abstract method: receive()
Offers useful references:self: reference to the ActorRef of actorsender: reference to sender Actor of the last received message
typically used for replying to messages
context: reference to ActorContext of actor that includes references to
factory methods to create child actors (actorOf)system that the actor belongs toparent supervisorsupervised children
Ping Pong examples
Second.scala
Third.scala
Scala pattern matching
Scala has a built-in general pattern matching mechanismIt allows to match on any sort of data with a first-match policy
object MatchTest1 extends App { def matchTest(x: Int): String = x match { case 1 => "one" case 2 => "two" case _ => "many" } println(matchTest(3)) println(matchTest(2)) println(matchTest(1))}
Scala pattern matching
Scala has a built-in general pattern matching mechanismIt allows to match on any sort of data with a first-match policy
object MatchTest2 extends App { def matchTest(x: Any): Any = x match { case 1 => "one" case "two" => 2 case y: Int => "scala.Int: " + y } println(matchTest(1)) println(matchTest("two")) println(matchTest(3)) println(matchTest("four"))}
Scala case classes
Case classes are regular classes with special conveniencesautomatically have factory methods with the name of the class all constructor parameters become immutable public fields of the classhave natural implementations of toString, hashode, and equalsare serializable by defaultprovide a decomposition mechanism via pattern matching
case class Start(secondPath : String)case object PINGcase object PONG
Scala pattern matching
Scala has a built-in general pattern matching mechanismIt allows to match on any sort of data with a first-match policy
case class Start(secondPath : String)case object PINGcase object PONG
object MatchTest3 extends App { def matchTest(x: Any): Any = x match { case Start(secondPath) => "got " + secondPath case PING => "got ping" case PONG => "got pong" } println(matchTest(Start("path"))) println(matchTest(PING))}
Scala pattern matching
Scala has a built-in general pattern matching mechanismIt allows to match on any sort of data with a first-match policy
object MatchTest4 extends App { def length [X] (xs:List[X]): Int = xs match { case Nil => 0 case y :: ys => 1 + length(ys) } println(length(List())) println(length(List(1,2))) println(length(List("one", "two", "three")))}
Scala pattern matchingsealed trait Opcase object OpAdd extends Opcase object OpSub extends Opcase object OpMul extends Opcase object OpDiv extends Op
sealed trait Exp case class ExpNum (n:Double) extends Expcase class ExpOp (e1:Exp, op:Op, e2:Exp) extends Exp
object MatchTest5 extends App { def evaluate (e:Exp) : Double = e match { case ExpNum (v) => v case ExpOp (e1, op, e2) => val n1:Double = evaluate (e1) val n2:Double = evaluate (e2) op match { case OpAdd => n1 + n2 case OpSub => n1 - n2 case OpMul => n1 * n2 case OpDiv => n1 / n2 } }}
Defining Akka message classes
Use Scala case classes
case class Start(secondPath : String)case object PINGcase object PONG
class PingPong extends Actor { def receive = { case PING => ... case PONG => ... case Start(secondPath) => ... }}
An Akka/Scala concurrent solution,in more detail
Use Akka ActorsTask of processing a directory is given to a worker actor by a master actorWorker actor processes directory
computes the total size of all the regular files and sends it to mastersends to master the (path)name of every sub-directory
Master actorInitiates the processsends tasks to worker actorscollects the total sizekeeps track of pending tasks
ConcurrentAkka.scala
class RoundRobinPool
Creating a new worker actor for every task (processing a directory) is not efficient.
tasks are very small so Actor creation overhead is relatively large
Instead, create a pool of worker actors (routees) managed by a router actor of type RoundRobinPool
the router is the parent of the routeesa message (task) sent by some actor A to the router is forwarded to a routee chosen in a round-robin fashion The routee sees actor A as the sender of the message
context.actorOf(RoundRobinPool(50).props(Props[FileProcessor]), name = "workerRouter")
Remote Actors
Distributed by defaultActor interactions are asynchronous messagesLocal actor interactions are optimization of general case (remote actors)
Remoting configuration (server)Add dependency to build.sbtlibraryDependencies += "com.typesafe.akka" %% "akka-remote" % "2.3.9"
Add to application.confserver { include "common" akka { actor { provider = "akka.remote.RemoteActorRefProvider" } remote { netty.tcp { hostname = "127.0.0.1" port = 2552 } } }}
Remoting Configuration
application.conf configuration:
Change provider from akka.actor.LocalActorRefProvider to akka.remote.RemoteActorRefProvider
Transport mode: specifies transport protocol
Host name
Port number - the port the actor system should listen on, set to 0 to have it chosen automatically (e.g., for clients)
Remoting configuration (client)Add dependency to build.sbtlibraryDependencies += "com.typesafe.akka" %% "akka-remote" % "2.3.9"
Add to application.confremotelookup { include "common" akka { actor { provider = "akka.remote.RemoteActorRefProvider" } remote { netty.tcp { hostname = "127.0.0.1" port = 0 } } }}
Client-side code
To obtain the handle to a remote actor:ActorSystem method actorSelection() takes the URL of a remote actor and returns an ActorSelection to it
val joe = system.actorSelection("akka.tcp://[email protected]:2552/user/joe”)
You can now send messages to the remote actor as usual
To acquire an ActorRef for an ActorSelectionyou need to send a message to the selectionthe actor should replyuse the reference of the reply from the actor
Remoting example
server.scala + application.conf
client.scala + application.conf
Project remote
Remoting example
server.scala + application.conf
client.scala + application.conf
Project remote2