ndc london 2017 - the data dichotomy- rethinking data and services with streams
TRANSCRIPT
1
The Data Dichotomy: Rethinking data and services with streamsBen Stopford@benstopford
2
Build Features
Build for the Future
3
Evolution!
4
KAFKA
ServingLayer
(Cassandra etc.)
Kafka Streams / KSQL
Streaming Platforms
Data is embedded in each engine
High Throughput Messaging
Clustered Java App
5
authorization_attempts possible_fraud
Streaming Example
6
CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;
authorization_attempts possible_fraud
7
CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;
authorization_attempts possible_fraud
8
CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;
authorization_attempts possible_fraud
9
CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;
authorization_attempts possible_fraud
10
CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;
authorization_attempts possible_fraud
11
CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;
authorization_attempts possible_fraud
12
Streaming == Manipulating Data in Flight
13
Business Applications
14
EcosystemsApp
Increasingly we build ecosystems
15
SOA / Microservices / EDA
CustomerService
ShippingService
16
The Problem is DATA
17
Most services share the same core facts.
Catalog
Most services live in here
18
OrdersService
PaymentsService
CustomersService
Data becomes spread out and we need to bring it together
Useful Grid
19
Service A Service B
Service C
One option is to share a database
20
Service A Service B
Service C
Databases provide a very rich form of coupling
21
Two different forces compete in our designs
22
Single Sign On Business Serviceauthorise(),
We are taught to encapsulate
LOOSE COUPLING!
23
But data systems have little to do
with encapsulation
24
Service Database
Data on inside
Data on outside
Data on inside
Data on outside
Interface hides data
Interface amplifies
data
Databases amplify the data they hold
25
The data dichotomyData systems are about exposing data.
Services are about hiding it.
26
Microservices shouldn’t
share a database!
27
Tension
We want all the good stuff which comes with a database.
We don’t want to share that database with anyone else.
But we do want to share datasets in a sensible way.
28
So how do we share data between services?
OrdersService
ShippingService
CustomerService
Webserver
29
Buying an iPad (with REST)
SubmitOrder
shipOrder() getCustomer()
OrdersService
ShippingService
CustomerService
Webserver
30
Buying an iPad with Events
Message Broker (Kafka)
Notification Data is replicated
(incrementally)
SubmitOrder
Order Created
Customer Updated
OrdersService
ShippingService
CustomerService
Webserver
KAFKA
31
Events for Notification Only
Message Broker (Kafka)
SubmitOrder
Order Created
getCustomer()REST
Notification
OrdersService
ShippingService
CustomerService
Webserver
KAFKA
32
Events for Data Locality
Customer Updated
SubmitOrder
Order Created
Data is replicated
OrdersService
ShippingService
CustomerService
Webserver
KAFKA
33
Events have two hats
Notification Data replication
34
Events are the key to scalable service ecosystems
35
Streaming is the toolset for dealing with events as they move!
36
Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
37
Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
38
What is a Distributed Log?
39
Shard on the way in
ProducingServices
Kafka
ConsumingServices
40
Each shard is a queue
ProducingServices
Kafka
ConsumingServices
41
Consumers share load
ProducingServices
Kafka
ConsumingServices
42
A log can Rewound and Replayed
Rewind & Replay
43
Compacted Log(retains only latest version)
Version 3
Version 2
Version 1
Version 2
Version 1
Version 5
Version 4
Version 3
Version 2
Version 1
44
Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
45
Kafka Connect
KafkaConnect
KafkaConnect
Kafka
46
Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
47
A database engine for data-in-flight
48
SELECT card_number, count(*)FROM authorization_attemptsWINDOW (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;
Continuously Running Queries
49
Features: similar to database query engine
JoinFilterAggr-egate
View
Window
50
CompactedTopic
Join
Stream
Table
KafkaKafka Streams / KSQL
Topic
Join Streams and Tables
51
Handle Asynchronicity
In an asynchronous world, will the payment come first, or the order?
KAFKA
Buffer 5 mins
Join by Key
52
Handle Asynchronicity
KAFKA
Buffer 5 mins
Join by Key
KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”);
orders.join(payments, KeyValue::new, JoinWindows.of(1 * MIN)).peek((key, pair) -> emailer.sendMail(pair));
53
KAFKA
Join
A KTable is just a stream with infinite retention
54
A KTable is just a stream with infinite retention
KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”);KTable customers = builder.table(“Customers”);
orders.join(payments, EmailTuple::new, JoinWindows.of(1*MIN)).join(customers, (tuple, cust) -> tuple.setCust(cust))
.peek((key, tuple) -> emailer.sendMail(tuple));
Materialize a table in two lines of code!
55
KAFKA
Emailer
With KSQL and Node.js
Create stream ToEmailFrom Orders, Payment,Customer where …
56
Scales Out
57
Streaming is about
1. Processing data incrementally
2. Moving data to where it needs to be processed (quickly and efficiently)
On Notification
Data Replication
58
Steps to Streaming Services
59
1. Take Responsibility for the past and evolve
60
Stay Simple. Take Responsibility for the past
Browser
Webserver
61
Evolve Forwards
Browser
WebserverOrdersService
62
2. Raise events. Don’t talk to services.
63
Raise events. Don’t talk to services
Browser
WebserverOrdersService
64
KAFKA
Order RequestedOrder
Received
Browser
Webserver
OrdersService
Raise events. Don’t talk to services
65
KAFKA
Order Requested
Order Validated
Order Received
Browser
Webserver
OrdersService
Raise events. Don’t talk to services
66
KAFKA
Order Requested
Order Validated
Order Received
Browser
Webserver
OrdersService
Use Kafka as a Backbone for Events
67
3. Use Connect (& CDC) to evolve away from legacy
68KAFKA
Order Requested
Order Validated
Order Received
Browser
Webserver
OrdersService
Evolve away from Legacy
69KAFKA
Order Requested
Order Validated
Order Received
Browser
Webserver
OrdersService
Use the Database as a ‘Seam’
Connect
Products
70
4. Make use of Schemas
71KAFKA
Order Requested
Order Validated
Order Received
Browser
Webserver
OrdersService
Schemas are your API
Connect
ProductsSchema Registry
72
5. Use the Single Writer Principal
73KAFKA
Order Requested
Order Validated
Order Received
Browser
Webserver
OrdersService
Apply the single writer principal
Connect
ProductsSchema Registry
Order Completed
74
Orders Service
EmailService
T1 T2
T3
T4
RESTService
T5
Single Writer Principal
75
Single Writer Principal
- Creates local consistency points in the absence of Global Consistency
- Makes schema upgrades easier to manage.
76
6. Store Datasets in the Log
77
Messaging that Remembers
Orders Customers
PaymentsStock
78KAFKA
Order Requested
Order Validated
Order Received
Browser
Webserver
OrdersService
New Service, No Problem!
Connect
Products
Schema Registry
Order Completed Repricing
79
Orders Customers
PaymentsStock
Single, Shared Source of Truth
80
But how do you query a log?
81
7. Move Data to Code
82
83
Connect
Order Requested
Order Validated
Order Completed
Order Received
Products
Browser
Webserver
Schema Registry
OrdersService Stock
Stock
Materialize Stock ‘View’ Inside Service
KAFKA
84
Connect
Order Requested
Order Validated
Order Completed
Order Received
Products
Browser
Webserver
Schema Registry
OrdersService Stock
Stock
Take only the data we need
KAFKA
85
Data Movement
Be realistic:• Network is no longer the bottleneck• Indexing is:
• In memory indexes help• Keep datasets focused
86
8. Use the log as a ‘database’
87
Connect
Order Requested
Order Validated
Order Completed
Order Received
Products
Browser
Webserver
Schema Registry
OrdersService
Reserved Stocks
Stock
Stock
Reserved Stocks
Apply Event Sourcing
KAFKA
Table
88
Connect
Order Requested
Order Validated
Order Completed
Order Received
Products
Browser
Webserver
Schema Registry
OrdersService
Reserved Stocks
Stock
Stock
Reserved Stocks
Order Service Loads Reserved Stocks on Startup
KAFKA
89
Kafka has several features for reducing the need to move data on startup
- Standby Replicas- Disk Checkpoints- Compacted topics
90
9. Use Transactions to tie All Interactions Together
91
OrderRequested(IPad)
2a. Order Validated
2c. Offset Commit2b. IPad Reserved
Internal State:Stock = 17Reservations = 2
Tie Events & State with Transactions
92
Connect
TRANSACTION
Order Requested
Order Validated
Order Completed
Order Received
Products
Browser
Webserver
Schema Registry
OrdersService
Reserved Stocks
Stock
Stock
Reserved Stocks
Transactions
KAFKA
93
10. Bridge the Sync/Async Divide with a Streaming Ecosystem
94
POST
GET
Load
Ba
lance
r
ORDE
RSOR
DERS
OV T
OPIC
Order ValidationsKAFKA
INVENTORY
Orders
Inventory
Fraud Service
Order DetailsService
InventoryService
(see previous figure)
Order Created
Order Validated
Orders View
Q in CQRS
Orders ServiceC is CQRS
Services in the Micro: Orders ServiceFind the code online!
95
Orders Customers
Payments Stock
Each service is optimized for autonomy
A Database Inside Out
HISTORICAL EVENT STREAMS
96
Kafka
KAFKA
New York
Tokyo
London
Global / Disconnected Ecosystems
97
So…
98
Good architectures have little to do with this:
99
It’s about how systems evolves over time
100
Request driven isn’t enough
• High coupling• Hard to handle
async flows• Hard to move and
join datasets.
101
Leverage the Duality of Events
Notification Data replication
102
With a toolset built for data in flight
103
The data dichotomyData systems are about exposing data.
Services are about hiding it.
Remember the data dichotomy
104
The Data Dichotomy
We want all the good stuff which comes with a database.
We don’t want to share that database with anyone else.
But we do want to share datasets in a sensible way.
105
• Broadcast events• Retain them in the log• Compose streaming functions• Recasting the event stream into
views when you need to query.
Event Driven Services
106
Services built on a Streaming
Platform
107
Thank You@benstopford
Blog Series: https://www.confluent.io/blog/tag/microservices/Code: https://github.com/confluentinc/kafka-streams-examples