ndc london 2017 - the data dichotomy- rethinking data and services with streams

107
1 The Data Dichotomy: Rethinking data and services with streams Ben Stopford @benstopford

Upload: ben-stopford

Post on 21-Jan-2018

119 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

1

The Data Dichotomy: Rethinking data and services with streamsBen Stopford@benstopford

Page 2: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

2

Build Features

Build for the Future

Page 3: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

3

Evolution!

Page 4: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

4

KAFKA

ServingLayer

(Cassandra etc.)

Kafka Streams / KSQL

Streaming Platforms

Data is embedded in each engine

High Throughput Messaging

Clustered Java App

Page 5: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

5

authorization_attempts possible_fraud

Streaming Example

Page 6: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

6

CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

authorization_attempts possible_fraud

Page 7: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

7

CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

authorization_attempts possible_fraud

Page 8: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

8

CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

authorization_attempts possible_fraud

Page 9: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

9

CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

authorization_attempts possible_fraud

Page 10: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

10

CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

authorization_attempts possible_fraud

Page 11: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

11

CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

authorization_attempts possible_fraud

Page 12: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

12

Streaming == Manipulating Data in Flight

Page 13: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

13

Business Applications

Page 14: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

14

EcosystemsApp

Increasingly we build ecosystems

Page 15: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

15

SOA / Microservices / EDA

CustomerService

ShippingService

Page 16: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

16

The Problem is DATA

Page 17: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

17

Most services share the same core facts.

Catalog

Most services live in here

Page 18: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

18

OrdersService

PaymentsService

CustomersService

Data becomes spread out and we need to bring it together

Useful Grid

Page 19: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

19

Service A Service B

Service C

One option is to share a database

Page 20: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

20

Service A Service B

Service C

Databases provide a very rich form of coupling

Page 21: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

21

Two different forces compete in our designs

Page 22: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

22

Single Sign On Business Serviceauthorise(),

We are taught to encapsulate

LOOSE COUPLING!

Page 23: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

23

But data systems have little to do

with encapsulation

Page 24: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

24

Service Database

Data on inside

Data on outside

Data on inside

Data on outside

Interface hides data

Interface amplifies

data

Databases amplify the data they hold

Page 25: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

25

The data dichotomyData systems are about exposing data.

Services are about hiding it.

Page 26: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

26

Microservices shouldn’t

share a database!

Page 27: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

27

Tension

We want all the good stuff which comes with a database.

We don’t want to share that database with anyone else.

But we do want to share datasets in a sensible way.

Page 28: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

28

So how do we share data between services?

OrdersService

ShippingService

CustomerService

Webserver

Page 29: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

29

Buying an iPad (with REST)

SubmitOrder

shipOrder() getCustomer()

OrdersService

ShippingService

CustomerService

Webserver

Page 30: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

30

Buying an iPad with Events

Message Broker (Kafka)

Notification Data is replicated

(incrementally)

SubmitOrder

Order Created

Customer Updated

OrdersService

ShippingService

CustomerService

Webserver

KAFKA

Page 31: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

31

Events for Notification Only

Message Broker (Kafka)

SubmitOrder

Order Created

getCustomer()REST

Notification

OrdersService

ShippingService

CustomerService

Webserver

KAFKA

Page 32: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

32

Events for Data Locality

Customer Updated

SubmitOrder

Order Created

Data is replicated

OrdersService

ShippingService

CustomerService

Webserver

KAFKA

Page 33: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

33

Events have two hats

Notification Data replication

Page 34: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

34

Events are the key to scalable service ecosystems

Page 35: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

35

Streaming is the toolset for dealing with events as they move!

Page 36: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

36

Streaming Platform

The Log ConnectorsConnectors

Producer Consumer

Streaming Engine

Page 37: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

37

Streaming Platform

The Log ConnectorsConnectors

Producer Consumer

Streaming Engine

Page 38: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

38

What is a Distributed Log?

Page 39: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

39

Shard on the way in

ProducingServices

Kafka

ConsumingServices

Page 40: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

40

Each shard is a queue

ProducingServices

Kafka

ConsumingServices

Page 41: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

41

Consumers share load

ProducingServices

Kafka

ConsumingServices

Page 42: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

42

A log can Rewound and Replayed

Rewind & Replay

Page 43: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

43

Compacted Log(retains only latest version)

Version 3

Version 2

Version 1

Version 2

Version 1

Version 5

Version 4

Version 3

Version 2

Version 1

Page 44: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

44

Streaming Platform

The Log ConnectorsConnectors

Producer Consumer

Streaming Engine

Page 45: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

45

Kafka Connect

KafkaConnect

KafkaConnect

Kafka

Page 46: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

46

Streaming Platform

The Log ConnectorsConnectors

Producer Consumer

Streaming Engine

Page 47: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

47

A database engine for data-in-flight

Page 48: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

48

SELECT card_number, count(*)FROM authorization_attemptsWINDOW (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

Continuously Running Queries

Page 49: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

49

Features: similar to database query engine

JoinFilterAggr-egate

View

Window

Page 50: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

50

CompactedTopic

Join

Stream

Table

KafkaKafka Streams / KSQL

Topic

Join Streams and Tables

Page 51: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

51

Handle Asynchronicity

In an asynchronous world, will the payment come first, or the order?

KAFKA

Buffer 5 mins

Join by Key

Page 52: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

52

Handle Asynchronicity

KAFKA

Buffer 5 mins

Join by Key

KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”);

orders.join(payments, KeyValue::new, JoinWindows.of(1 * MIN)).peek((key, pair) -> emailer.sendMail(pair));

Page 53: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

53

KAFKA

Join

A KTable is just a stream with infinite retention

Page 54: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

54

A KTable is just a stream with infinite retention

KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”);KTable customers = builder.table(“Customers”);

orders.join(payments, EmailTuple::new, JoinWindows.of(1*MIN)).join(customers, (tuple, cust) -> tuple.setCust(cust))

.peek((key, tuple) -> emailer.sendMail(tuple));

Materialize a table in two lines of code!

Page 55: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

55

KAFKA

Emailer

With KSQL and Node.js

Create stream ToEmailFrom Orders, Payment,Customer where …

Page 56: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

56

Scales Out

Page 57: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

57

Streaming is about

1. Processing data incrementally

2. Moving data to where it needs to be processed (quickly and efficiently)

On Notification

Data Replication

Page 58: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

58

Steps to Streaming Services

Page 59: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

59

1. Take Responsibility for the past and evolve

Page 60: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

60

Stay Simple. Take Responsibility for the past

Browser

Webserver

Page 61: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

61

Evolve Forwards

Browser

WebserverOrdersService

Page 62: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

62

2. Raise events. Don’t talk to services.

Page 63: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

63

Raise events. Don’t talk to services

Browser

WebserverOrdersService

Page 64: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

64

KAFKA

Order RequestedOrder

Received

Browser

Webserver

OrdersService

Raise events. Don’t talk to services

Page 65: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

65

KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

Raise events. Don’t talk to services

Page 66: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

66

KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

Use Kafka as a Backbone for Events

Page 67: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

67

3. Use Connect (& CDC) to evolve away from legacy

Page 68: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

68KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

Evolve away from Legacy

Page 69: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

69KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

Use the Database as a ‘Seam’

Connect

Products

Page 70: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

70

4. Make use of Schemas

Page 71: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

71KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

Schemas are your API

Connect

ProductsSchema Registry

Page 72: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

72

5. Use the Single Writer Principal

Page 73: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

73KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

Apply the single writer principal

Connect

ProductsSchema Registry

Order Completed

Page 74: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

74

Orders Service

EmailService

T1 T2

T3

T4

RESTService

T5

Single Writer Principal

Page 75: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

75

Single Writer Principal

- Creates local consistency points in the absence of Global Consistency

- Makes schema upgrades easier to manage.

Page 76: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

76

6. Store Datasets in the Log

Page 77: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

77

Messaging that Remembers

Orders Customers

PaymentsStock

Page 78: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

78KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

New Service, No Problem!

Connect

Products

Schema Registry

Order Completed Repricing

Page 79: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

79

Orders Customers

PaymentsStock

Single, Shared Source of Truth

Page 80: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

80

But how do you query a log?

Page 81: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

81

7. Move Data to Code

Page 82: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

82

Page 83: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

83

Connect

Order Requested

Order Validated

Order Completed

Order Received

Products

Browser

Webserver

Schema Registry

OrdersService Stock

Stock

Materialize Stock ‘View’ Inside Service

KAFKA

Page 84: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

84

Connect

Order Requested

Order Validated

Order Completed

Order Received

Products

Browser

Webserver

Schema Registry

OrdersService Stock

Stock

Take only the data we need

KAFKA

Page 85: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

85

Data Movement

Be realistic:• Network is no longer the bottleneck• Indexing is:

• In memory indexes help• Keep datasets focused

Page 86: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

86

8. Use the log as a ‘database’

Page 87: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

87

Connect

Order Requested

Order Validated

Order Completed

Order Received

Products

Browser

Webserver

Schema Registry

OrdersService

Reserved Stocks

Stock

Stock

Reserved Stocks

Apply Event Sourcing

KAFKA

Table

Page 88: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

88

Connect

Order Requested

Order Validated

Order Completed

Order Received

Products

Browser

Webserver

Schema Registry

OrdersService

Reserved Stocks

Stock

Stock

Reserved Stocks

Order Service Loads Reserved Stocks on Startup

KAFKA

Page 89: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

89

Kafka has several features for reducing the need to move data on startup

- Standby Replicas- Disk Checkpoints- Compacted topics

Page 90: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

90

9. Use Transactions to tie All Interactions Together

Page 91: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

91

OrderRequested(IPad)

2a. Order Validated

2c. Offset Commit2b. IPad Reserved

Internal State:Stock = 17Reservations = 2

Tie Events & State with Transactions

Page 92: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

92

Connect

TRANSACTION

Order Requested

Order Validated

Order Completed

Order Received

Products

Browser

Webserver

Schema Registry

OrdersService

Reserved Stocks

Stock

Stock

Reserved Stocks

Transactions

KAFKA

Page 93: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

93

10. Bridge the Sync/Async Divide with a Streaming Ecosystem

Page 94: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

94

POST

GET

Load

Ba

lance

r

ORDE

RSOR

DERS

OV T

OPIC

Order ValidationsKAFKA

INVENTORY

Orders

Inventory

Fraud Service

Order DetailsService

InventoryService

(see previous figure)

Order Created

Order Validated

Orders View

Q in CQRS

Orders ServiceC is CQRS

Services in the Micro: Orders ServiceFind the code online!

Page 95: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

95

Orders Customers

Payments Stock

Each service is optimized for autonomy

A Database Inside Out

HISTORICAL EVENT STREAMS

Page 96: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

96

Kafka

KAFKA

New York

Tokyo

London

Global / Disconnected Ecosystems

Page 97: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

97

So…

Page 98: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

98

Good architectures have little to do with this:

Page 99: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

99

It’s about how systems evolves over time

Page 100: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

100

Request driven isn’t enough

• High coupling• Hard to handle

async flows• Hard to move and

join datasets.

Page 101: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

101

Leverage the Duality of Events

Notification Data replication

Page 102: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

102

With a toolset built for data in flight

Page 103: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

103

The data dichotomyData systems are about exposing data.

Services are about hiding it.

Remember the data dichotomy

Page 104: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

104

The Data Dichotomy

We want all the good stuff which comes with a database.

We don’t want to share that database with anyone else.

But we do want to share datasets in a sensible way.

Page 105: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

105

• Broadcast events• Retain them in the log• Compose streaming functions• Recasting the event stream into

views when you need to query.

Event Driven Services

Page 106: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

106

Services built on a Streaming

Platform

Page 107: NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams

107

Thank You@benstopford

Blog Series: https://www.confluent.io/blog/tag/microservices/Code: https://github.com/confluentinc/kafka-streams-examples