streaming data with scalaz-stream

30
Streaming Data with scalaz-stream Gary Coady [email protected]

Upload: gary-coady

Post on 15-Apr-2017

287 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Streaming Data with scalaz-stream

Streaming Data with scalaz-stream

Gary Coady [email protected]

Page 2: Streaming Data with scalaz-stream

• Why do we want streaming APIs?

• Introduction to scalaz-stream

• Use case: Server-Sent Events implementation

Contents

Page 3: Streaming Data with scalaz-stream

Why do we want streaming APIs?

Page 4: Streaming Data with scalaz-stream

Information with Indeterminate/unbounded size• Lines from a text file

• Bytes from a binary file

• Chunks of data from a TCP connection

• TCP connections

• Data from Kinesis or SQS or SNS or Kafka or…

• Data from an API with paged implementation

Page 5: Streaming Data with scalaz-stream

“Dangerous” Choices

• scala.collection.Iterable Provides an iterator to step through items in sequence

• scala.collection.immutable.Stream Lazily evaluated, possibly infinite list of values

Page 6: Streaming Data with scalaz-stream

Do The Right Thing• Safe setup and cleanup

• Constant memory usage

• Constant stack usage

• Refactor with confidence

• Composable

• Back-pressure

Page 7: Streaming Data with scalaz-stream

• Creates co-data • Safe resource management • Referential transparency • Controlled asynchronous effects

What is scalaz-stream

Page 8: Streaming Data with scalaz-stream

User code

Process.await

“Waiting” for callback

User code

Callback

Page 9: Streaming Data with scalaz-stream

sealed  trait  Process[+F[_],  +O]

Effect

Output

Page 10: Streaming Data with scalaz-stream

case  class  Halt(cause:  Cause)  extends  Process[Nothing,  Nothing]

Page 11: Streaming Data with scalaz-stream

case  class  Emit[+O](seq:  Seq[O])  extends  Process[Nothing,  O]

Page 12: Streaming Data with scalaz-stream

case  class  Await[+F[_],  A,  +O](    req:  F[A],    rcv:  (EarlyCause  \/  A)  =>  Process[F,  O]  )  extends  Process[F,  O]

Page 13: Streaming Data with scalaz-stream

Composition OptionsProcess1[I,  O]    -­‐  Stateful  transducer,  converts  I  =>  O  (with  state)    -­‐  Combine  with  “pipe”  

Channel[F[_],  I,  O]    -­‐  Takes  I  values,  runs  function  I  =>  F[O]    -­‐  Combine  with  “through”  or  “observe”.  

Sink[F[_],  I]    -­‐  Takes  I  values,  runs  function  I  =>  F[Unit]    -­‐  Add  with  “to”.

Page 14: Streaming Data with scalaz-stream

Implementing Server-sent Events (SSE)

This specification defines an API for opening an HTTP connection for

receiving push notifications from a server in the form of DOM events.

Page 15: Streaming Data with scalaz-stream

case  class  SSEEvent(eventName:  Option[String],  data:  String)

data:  This  is  the  first  message.  

data:  This  is  the  second  message,  it  data:  has  two  lines.  

data:  This  is  the  third  message.  

event:  add  data:  73857293  

event:  remove  data:  2153  

event:  add  data:  113411

Example streams

Page 16: Streaming Data with scalaz-stream

We want this type:

Process[Task,  SSEEvent]

“A potentially infinite stream of SSE event messages”

Page 17: Streaming Data with scalaz-stream

async.boundedQueue[A]

• Items added to queue are removed in same order

• Connect different asynchronous domains

• Methods:def  enqueueOne(a:  A):  Task[Unit]def  dequeue:  Process[Task,  A]

Page 18: Streaming Data with scalaz-stream

HTTP Client Implementation

• Use Apache AsyncHTTPClient • Hook into onBodyPartReceived callback • Use async.boundedQueue to convert chunks into

stream

Page 19: Streaming Data with scalaz-stream

def  httpRequest(client:  AsyncHttpClient,  url:  String):        Process[Task,  ByteVector]  =  {  

   val  contentQueue  =  async.boundedQueue[ByteVector](10)  

   val  req  =  client.prepareGet(url)  

   req.execute(new  AsyncCompletionHandler[Unit]  {

       override  def  onBodyPartReceived(content:  HttpResponseBodyPart)  =  {            contentQueue.enqueueOne(                  ByteVector(content.getBodyByteBuffer)              ).run  

           super.onBodyPartReceived(content)        }    })

   contentQueue.dequeue  }

Page 20: Streaming Data with scalaz-stream

How to terminate stream?

Page 21: Streaming Data with scalaz-stream

req.execute(new  AsyncCompletionHandler[Unit]  {  

   ...  

   override  def  onCompleted(r:  Response):  Unit  =  {        logger.debug("Request  completed")        contentQueue.close.run    }  

   ...  

}

Page 22: Streaming Data with scalaz-stream

How to terminate stream with errors?

Page 23: Streaming Data with scalaz-stream

req.execute(new  AsyncCompletionHandler[Unit]  {  

   ...  

   override  def  onThrowable(t:  Throwable):  Unit  =  {        logger.debug("Request  failed  with  error",  t)        contentQueue.fail(t).run    }  

   ...  

}

Page 24: Streaming Data with scalaz-stream

Process[Task, ByteVector]

Process[Task, SSEEvent]

Process[Task, Underpants]

Step 1

Step 2

Step 3

Page 25: Streaming Data with scalaz-stream

• Split at line endings

• Convert ByteVector into UTF-8 Strings

• Partition by SSE “tag” (“data”, “id”, “event”, …)

• Emit accumulated SSE data when blank line found

Page 26: Streaming Data with scalaz-stream

• Split at line endingsByteVector  =>  Seq[ByteVector]

• Convert ByteVector into UTF-8 StringsByteVector  =>  String

• Partition by SSE “tag” (“data”, “id”, “event”, …)String  =>  SSEMessage

• Emit accumulated SSE data when blank line foundSSEMessage  =>  SSEEvent

Page 27: Streaming Data with scalaz-stream

Handling Network Errors

• If a network error occurs:

• Sleep a while

• Set up the connection again and keep going

• Append the same Process definition again!

Page 28: Streaming Data with scalaz-stream

def  sseStream:  Process[Task,  SSEEvent]  =  {      httpRequest(client,  url)          .pipe(splitLines)          .pipe(emitMessages)          .pipe(emitEvents)          .partialAttempt  {              case  e:  ConnectException  =>  retryRequest              case  e:  TimeoutException  =>  retryRequest          }          .map(_.merge)  }  

def  retryRequest:  Process[Task,  SSEEvent]  =  {      time.sleep(retryTime)  ++  sseStream  }

Page 29: Streaming Data with scalaz-stream

Usage

sseStream(client,  url)  pipe  jsonToString  to  io.stdOutLines

Page 30: Streaming Data with scalaz-stream

Questions?