liveperson dld 2015

30
DLD. Tel-Aviv. 2015 Making Scale a Non-Issue for Real-Time Data Apps Vladi Feigin, LivePerson Kobi Salant, LivePerson

Upload: liveperson

Post on 13-Apr-2017

1.128 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Liveperson DLD 2015

DLD. Tel-Aviv. 2015

Making Scale a Non-Issue for Real-Time Data Apps

Vladi Feigin, LivePersonKobi Salant, LivePerson

Page 2: Liveperson DLD 2015

Agenda

Intro About LivePerson Digital Engagements Call Center Use Case Architecture Zoom-In

Page 3: Liveperson DLD 2015

Bio

Vladi Feigin System Architect in LivePerson 18 years in software development Interests : distributed computing, data, analytics and martial arts

Page 4: Liveperson DLD 2015

Bio

Kobi Salant Data Platform Tech Lead in LivePerson   25 years in software development Interests : Application performance, traveling and coffee

Page 5: Liveperson DLD 2015

LivePerson

We do Digital Engagements

Agile and very technological

Real Big Data and Analytics company

Really cool place to work in

One of the SaaS pioneers

6 Data Centers across the world

Founded in 1995, a public company since 2000 (NASDAQ: LPSN)

More than 18,000 customers worldwide

More than 1000 employees

Page 6: Liveperson DLD 2015

LivePerson technology stack

Page 7: Liveperson DLD 2015

We are Big Data

1.4 Million concurrent visits

1 Million events per second

2 billion site visits per month

27 million live engagements per month

Data freshness SLA (RT flow): up to 5 seconds

Page 8: Liveperson DLD 2015

Visitor

Page 9: Liveperson DLD 2015

Agent

Page 10: Liveperson DLD 2015

Visitor

Page 11: Liveperson DLD 2015

Agent

Page 12: Liveperson DLD 2015

Call Center Operating

Digital engagement requires operating a call center in the most efficient way

How to operate a call center in the most efficient way? Provide operational metrics … In real-time

What are the challenges? Huge scale, load peaks, real-time calculations, high data freshness SLA

Page 13: Liveperson DLD 2015

Call Center Operating

Page 14: Liveperson DLD 2015

Architecture. Real-Time data flow

producer

(agent)

producer

(sess.)

producer

(chat)

Kafka

Storm

Cassandra

Storm

Fast topic

ElasticSearch CouchBase

API

Consistent topic

Batch layer

(Hadoop)

producer

(conv.)

producer

(other)

Custom Apps.

Page 15: Liveperson DLD 2015

Chat History. Example

producer

(agent)

producer

(sess.)

producer

(chat.)

Kafka

Storm

Fast topic

ElasticSearch

API

Consistent topic

MR job Very low latency

99.5% of data High latency99.999% of data

Page 16: Liveperson DLD 2015

Data Producers. Requirements Real time “Five nines” persistence Small footprint No interference with service Multiple producers & platforms Monolithic to service oriented

ManyMore

Services

Page 17: Liveperson DLD 2015

Data Producers. Lessons learned

Hundreds of services Complex rollouts Minimal logic to avoid painful fixes Audit streaming? Split to buckets

Real time and “five nines” persistence are incompatible

 

In House

1

Bucket Bucket

Page 18: Liveperson DLD 2015

Consistent Topic

Send message to Kafka

local file

Persist message to local disk

Kafka Bridge

Send message to Kafka

Fast Topic

Kafka Resilience

Real-time Customers

Offline Customers

Kafka

Data Producers. Flow

Page 19: Liveperson DLD 2015

Data Model Framework

Why Avro: Schema based evolution Performance - Untagged bytes HDFS ecosystem support

Lessons Learned: Schema evolution breaks Big schema (ours is over 65k) not recommended Avoid deep nesting and multiple unions Need a framework

Chaos – Non-Schema space delimited

Order – Avro Schema

Page 20: Liveperson DLD 2015

Framework Flow

1. Event is created according to Avro Schema version 3.5

2. Schema is registered into the repository (once)

3. Value 3.5 is written to header4. Event is encoded with schema

version 3.5 and added to message5. Message is sent to Kafka6. Message is read by consumer7. Header is read from message8. Schema is retrieved from repository

according to scheme version9. Event decoded using the proper Avro

schema10.Decoded event is processed

3.5

3.5

Consumer

Repository

Page 21: Liveperson DLD 2015

Apache Kafka More than 15 billion events a day More than 1 million events per second Hundreds of producers & consumers

Why Kafka? Scale where traditional MQs fail Industry standard for big data log messaging Reliable, flexible and easy to use

Deployment: We have 15 clusters across the world Our biggest cluster has 8 nodes with more than 6TB (Avro + Kafka

compression) Maximum retention of 72 hours

Page 22: Liveperson DLD 2015

Apache Kafka. Lessons Learned Scale horizontally for hardware resources and vertically for throughput

Look at trends of network & IO & Kafka's JMX statistics

Partitions Servers

Bytes in

Page 23: Liveperson DLD 2015

Apache Kafka. Lessons Learned cont. Know your data and message sizes:

Large messages can break you Data growth can overfill your capacity Set the right configuration

Adding or removing a broker is not trivial

Decide on single or multiple topics

Page 24: Liveperson DLD 2015

Apache Storm

Why Storm? Growing community with good integration to Kafka At the time, it was the leading product  Easy development and customization The POC was successful 

Deployment: We have 6 clusters across the world Our biggest cluster has more then 30 nodes We have 20 topologies on a single cluster Uptime of months for a single topology

Page 25: Liveperson DLD 2015

Apache Storm. Typical topology

Storm Topology

KAFKA SPOUT FILTER BOLT WRITER BOLT

emit emit

ack ack

fetch

Zookeeper

Kafka Fast topic

writecommit

Page 26: Liveperson DLD 2015

Apache Storm. Lessons learned Develop SDK and educate R&D Where did my topology run last week? What is my overtime capacity?

Know your bolts, must return a timely answer Coding is easy, performance is hard Use isolation

Capacity

Page 27: Liveperson DLD 2015

Apache Storm. Lessons learned cont. Use local shuffling   Use Ack

KAFKA SPOUT FILTER BOLT WRITER BOLT

KAFKA SPOUT FILTER BOLT WRITER BOLT

Local emit

ACKER BOLT

ACKER BOLT

COMM BOLT

COMM BOLT

Worker A

Worker B

Local emit

Local emit

Local emit

Page 28: Liveperson DLD 2015

Summary

No one-size-fits-all solution Ask product for a clearly defined SLA Separate between fast and consistent data flows - they don’t merge!

Use schema for a data model - keep it flat and small Kafka rules! It’s reliable and fast - use it Storm has it’s toll. For some use-cases we would be using Spark Streaming today

Page 29: Liveperson DLD 2015

THANK YOU!

We are hiring

http://www.liveperson.com/company/careers

Q/A

Page 30: Liveperson DLD 2015

YouTube.com/LivePersonDev

Twitter.com/LivePersonDev

Facebook.com/LivePersonDev

Slideshare.net/LivePersonDev