scaling systems using change propagation across data stores

Scaling systems using change propagation across data stores

Jagadeesh Huliyar

I’ll talk about

➔ Need for Tiered Data Stores in scaling systems and Role of real time data change propagation systems. ◆ Example : Payments system

● Issues in old data tier architecture & motivation for new one.

◆ Design Choices ➔ Aesop - Real Time Data Change Propagation System

◆ Aesop Scaling and High Availability

Payments - What does it do?

➔ Managing Transactions - Takes the customer through the life cycle of a Payment Transaction.

➔ Reconciliation and Settlement - Reconcile with Bank and Settle to Merchants.➔ Fraud Detection - Detect Payment Fraud➔ Monitoring and Routing - Monitor for success rate of Transactions on various

dimensions and modifying routing.

Data Needs

Use Case Operation Requirement Data Retention

Transaction Flow Write + Read ACID + Normalised Structure + Low Latency

Transactions during Life Cycle + All Data related to a Transaction (Data for a Month)

Console Read + Search Denormalized Some attributes of a Transaction

Fraud Detection, Financial Reports, Monitoring

Aggregation + Unique Values for a Transaction Dimension

Aggregation and Large Data access

1 year

Archival (Regulation) Reads, Reports Horizontally scalable data store to store large amounts of data.

All the Data

Data Needs

Payments - Old Data Tier Architecture

➔ MySql Master + Hot Standby + Slave.➔ Application writes to Master. Transactional

and Real Time reads from Master.➔ Historical Reads from Slave.➔ Analytical and Aggregation queries onto

slave.

Can one store fit all these use cases?

➔ From these signs it was apparent that changes were required in the data tier design.

➔ The current approach of one data store fits all required change.

➔ The data tier would have to scale horizontally and we needed more than one data store.

Multiple Data Stores

Multiple Data Stores - Issues?

➔ Data Consistency➔ Real Time Data Availability across

stores

ETL?

Classic ETL approach has been around for decades and has a well defined and known solution. However this was not an option for us because

➔ Data from the secondary stores is used to feed more than just business decisions. ➔ At Payments this data is supposed to feed into REAL TIME use cases like Console, Fraud Detection and

Monitoring Systems.

Dual Writes?

Application writes to destination data stores, synchronously or asynchronously. Application can write to a Publisher-Subscriber system in which the Subscribers are consumers that eventually write to Destination Data stores

➔ Pros : Appears Easy : Application can publish the same event that is being inserted/updated in the Primary Data Source.

➔ Cons : Difficult to maintain consistency ◆ Writes are not Atomic - Ordering Issues◆ Updates with non-primary-key where clause. ◆ Application Failures and Crashes.◆ Manual changes in Primary Data Store will be missed.

Log Mining?

Log Mining

Separate application/service can extract changes from Database commit logs and publish them. This would use the same approach used by database for replication.

➔ Pros : Consistency can be guaranteed as changes are being read from commit logs (bin log in case of MySql).

➔ Cons◆ Appears tough - But definitely possible.◆ Tied to mechanism used by database for replication. Tied to commit log format, etc … Tightly

coupled approach.

Since Consistency across Datastores is of paramount importance to a financial system like Payments we chose the Log Mining approach.

Approaches to Log Mining

MySql Bin Log Parsing

➔ Pros : Familiar approach◆ Open source softwares were available

that parsed MySql bin logs. Open Replicator and Tungsten Replicator

➔ Cons◆ If format of bin logs changes the parser

would have to change. ◆ Open Replicator was supporting MySql

version 5.5. We would have to modify Open Replicator to support MySql v5.6 and checksum feature introduced in v5.6.

Custom Storage Engine

➔ Pros : Independent of binlog format. Layers above Storage Engine take care of parsing.

➔ Cons : Unfamiliar approach. Unknown pitfalls.

Decided to go with known pitfalls and picked Bin Log Parsing approach.

Introducing Aesop - Putting it all together

Reliability and Data Consistency

High Availability, Load Balancing and Scaling - Client Cluster

High Availability, Load Balancing and Scaling - Relay HA

Multiple Relay Servers read from the Source Data Sources.

➔ The Clients connect to Relay Server via a LB.➔ Since the requests from clients are over HTTP one of the

Relay Servers or both can be serving the request based on the configuration in the LB.

➔ When one Relay goes down the other can still handle the requests.

Event Transformation

➔ Transforms the event as per the

mapping of source and destination

schema. It maps the source entity to

destination entity. The source

attribute is mapped to destination

attribute within the entity.

➔ A source entity can be mapped to

more than one destination entity

types.

➔ Map-All - one to one

➔ Hierarchical mapping

Monitoring

➔ Dashboard➔ JMX

Summary

➔ Performance◆ Relay : 1 XL VM (8 core, 32GB)◆ Consumers : 4XL VM, 200 partitions◆ Throughput : 20K-30K Inserts per sec

(MySQL to HBase)◆ Data size : 500 GB

➔ What it is?◆ Supports multiple data stores◆ Delivers updates reliably - at least once◆ Maintains Ordering within every

Partition◆ Supports varying consumer speeds

➔ What is it not?◆ Not exactly-once delivery◆ Not a storage system◆ No global ordering

➔ Support For◆ Source

● MySql● HBase

◆ Destination● MySql● HBase● Elasticsearch● Kafka● Mapped Event Stream

More Details

➔ Project◆ Open Source : https://github.com/Flipkart/aesop ◆ Support : [email protected] ◆ Multiple production deployments at Flipkart

➔ Related Work ◆ LinkedIn Databus◆ Facebook Wormhole

➔ References◆ Architecture of a Database System : http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf ◆ Wormhole Paper: https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-sharma.pdf

scaling systems using change propagation across data stores

Documents