yugabyte db - postgres conf · 3/21/2019  · © 2019 all rights reserved. 1 yugabyte db...

48
1 © 2019 All rights reserved. YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar 21, 2019

Upload: others

Post on 05-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

1© 2019 All rights reserved.

YugaByte DBDistributed PostgreSQL on Google Spanner Architecture

Karthik RanganathanMihnea Iancu

Mar 21, 2019

Page 2: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

2© 2019 All rights reserved.

Introduction

Karthik Ranganathan

Co-Founder & CTO, YugaByteNutanix Facebook Microsoft

IIT-Madras, University of Texas-Austin

@karthikr

Mihnea Iancu

Software Engineer, YugaByteSpark SQL YSQL

PhD, Jacobs University Bremen

@mihnea_iancu

Page 3: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

3© 2019 All rights reserved.

High PerformanceLow Latency Queries

Cloud NativeMulti-Cloud and Kubernetes Ready

Distributed SQL DBPostgreSQL compatible, Elasticity, Fault-Tolerance

YugaByte DB

Massive ScaleMillions of IOPS, TBs per Node

Page 4: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

4© 2019 All rights reserved.

YugaByte DBBuilt For Microservices

Page 5: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

5© 2019 All rights reserved.

Workload Patterns in Microservices

Internet-Scale OLTP

Optimize for scale, performance

High throughput, low latency

70% of microservice access pattern

Audit trail, stock market data, shopping cart and checkout, messaging, user history, etc.

Cloud-Scale SQL

Scale-out RDBMS

Needs query flexibility

Needs referential integrity and joins

Smaller by volume but critical

CRM and ERP applications, supply chain management, billing services,

reporting applications

Distributed SQL

Page 6: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

6© 2019 All rights reserved.

Workload Patterns Fall in a Range

Cloud-ScaleSQL

Scale-outRDBMS

SINGLE-KEY ACCESS MULTI-KEY ACCESS

BLAZING FAST (SUB-MS) FAST (SINGLE-DIGIT MS)DATA MODELING RICHNESS

QUERY PERFORMANCE

Page 7: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

7© 2019 All rights reserved.

Design Follows a Layered Approach

tablet 1’

tablet 1’

tablet 1’

QUERY LAYERExtensible Query Layer

DISTRIBUTED DOCUMENT STORETransactional, High Performance, Globally Distributed

RUN ON ANY HARDWARE/IAAS

Page 8: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

8© 2019 All rights reserved.

Query Layer Supports Distributed Postgres

tablet 1’

tablet 1’

tablet 1’

DISTRIBUTED, DOCUMENT STORETransactional, High Performance, Globally Distributed

RUN ON ANY HARDWARE/IAAS

YCQLSQL-Based Flexible Schema API

YSQLGlobally Distributed Postgres API

Page 9: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

9© 2019 All rights reserved.

Core Features of DocDB

tablet 1’

tablet 1’

tablet 1’

RUN ON ANY HARDWARE/IAAS

YCQLSQL-Based Flexible Schema API

YSQLGlobally Distributed Postgres API

Self-Healing, Fault-Tolerant

Auto Sharding & Rebalancing

ACID Transactions

Global Data Distribution

High Perf, Low Latency

Page 10: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

10© 2019 All rights reserved.

Runs on Bare-metal, VMs, Docker/Kubernetes

tablet 1’

tablet 1’

tablet 1’

Self-Healing, Fault-Tolerant

Auto Sharding & Rebalancing

ACID Transactions

Global Data Distribution

High Throughput, Low Latency

YCQLSQL-Based Flexible Schema API

YSQLGlobally Distributed Postgres API

Page 11: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

11© 2019 All rights reserved.

DocDBA Google Spanner-like

Distributed, Document Store

Page 12: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

12© 2019 All rights reserved.

Design Goals

• CAP Theorem• Consistent• Partition Tolerant• HA on failures

(new leader elected in seconds)

• Transaction Support• Single-row linearizable txns• Multi-row txns• Serializable• Snapshot

• High Performance• All layers in C++ to ensure high perf• Run on large memory machines• Optimized for SSDs

• Run anywhere• No external dependencies• No need for Atomic Clocks• Bare metal, VM and Kubernetes

Page 13: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

13© 2019 All rights reserved.

How Does DocDB Work?

tablet 1’

Let’s start with this logical view of a table

… … …

Page 14: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

14© 2019 All rights reserved.

Each Row is a Document

tablet 1’

A row maps to a document, each column to an attribute

… … …DocumentKey (primary key values) =>{

column1: value1,column2: value2,…

}

Page 15: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

15© 2019 All rights reserved.

Tables are Sharded into Tablets

tablet 1’

Now partition the table using some strategy

… … …

Each partition is a tablet. A row belongs to exactly one tablet.

Tablet #1

Tablet #2

.

.

.

Page 16: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

16© 2019 All rights reserved.

Tablets are Replicated across Nodes

tablet 1’

Tablet Peer 1 on Node X

Tablet #1

Tablet Peer 2 on Node Y

Tablet Peer 3 on Node Z

Page 17: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

17© 2019 All rights reserved.

How Replication works

tablet 1’

Raft Leader

Uses Raft Algorithm

First elect Tablet Leader

Page 18: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

18© 2019 All rights reserved.

How Replication works

tablet 1’

Raft Leader

Writes processed by leader:

Send writes to all peersWait for majority to ack

Write

Page 19: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

19© 2019 All rights reserved.

How Replication works

tablet 1’

Raft Leader

Reads handled by leader

Uses Leader Leases for perf

Read

Page 20: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

20© 2019 All rights reserved.

Single-Key Linearizability

tablet 1’

Raft Leader

This system is now linearizable, HA,fault tolerant with high-performance

But no distributed transactions yet!

Page 21: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

21© 2019 All rights reserved.

What do Distributed Transactions need?

tablet 1’

Updates should get written at the same physical time

Raft Leader Raft Leader

BEGIN TXNUPDATE k1UPDATE k2

COMMIT

But how will nodes agree on time?

Page 22: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

22© 2019 All rights reserved.

Use a Physical Clock

tablet 1’

You would need an Atomic Clock or two lying around

Atomic Clocks are highly available,globally synchronized clocks with tight error bounds

Most of my physical clocks are never synchronized

Jeez! I’m fresh out of those.

Page 23: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

23© 2019 All rights reserved.

Hybrid Logical Clock or HLC

tablet 1’

Combine coarsely-synchronized physical clocks with Lamport Clocks to track causal relationships

(physical component, logical component)

synchronized using NTP a monotonic counter

Nodes update HLC on each Raft exchange for things like heartbeats, leader election and data replication

Page 24: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

24© 2019 All rights reserved.

No Need For Atomic Clocks

tablet 1’

Raft Leader

Uses Hybrid Logical Clock

With NTP synchronization

Page 25: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

25© 2019 All rights reserved.

Read more atblog.yugabyte.com

blog.yugabyte.com/distributed-postgresql-on-a-google-spanner-architecture-storage-layer/

Storage layer details:

Page 26: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

26© 2019 All rights reserved.

YSQLThe PostgreSQL Query Layer

Page 27: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

27© 2019 All rights reserved.

Design Goals

• PostgreSQL compatible• Re-uses PostgreSQL code base• New changes do not break existing PostgreSQL functionality• Aim towards building a pluggable distributed storage engine

• Enable migrating to newer PostgreSQL versions• New features are implemented in a modular fashion• Integrate with new PostgreSQL features in an on-going fashion• E.g. Moved from PostgreSQL 10.4 à 11.2 in a few weeks!

• Cloud native design• Designed for running natively in Kubernetes• Make drivers cluster aware over time • Support multi—zone and geographically replicated deployments

Page 28: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

28© 2019 All rights reserved.

Design Goals - Feature-set Support• All data types• Built-in functions and expressions• Various kinds of joins• Constraints (primary key, foreign key, unique, not null, check)• Secondary indexes (incl. multi-column & covering columns)• Distributed transactions (Serializable and Snapshot Isolation)• Views• Stored Procedures• Triggers

Page 29: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

29© 2019 All rights reserved.

Existing PostgreSQL Architecture

CLIENT Postmaster(Authentication, authorization)

Rewriter Planner

OptimizerExecutor

WAL Writer BG Writer…

DISK

ReuseStateless

PostgreSQL

Page 30: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

30© 2019 All rights reserved.

DocDB as Storage Engine

CLIENT Postmaster(Authentication, authorization)

Rewriter Planner

OptimizerExecutor

YugaByte Node

YugaByte Node …… YugaByte

Node

Replace table storage with DocDB

Page 31: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

31© 2019 All rights reserved.

Make PostgreSQL Stateless

CLIENT Postmaster(Authentication, authorization)

Rewriter Planner

OptimizerExecutor

YugaByte Node

YugaByte Node …… YugaByte

Node

Enhance planner,

optimizer, and executor for

distributed DB

Page 32: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

32© 2019 All rights reserved.

All Nodes are Identical

……

StatelessPostgres

StatelessPostgres

StatelessPostgres

DocDB DocDB DocDB

CLIENT

Can connect to ANY node

YugaByte Node YugaByte Node YugaByte Node

Page 33: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

33© 2019 All rights reserved.

All Nodes are Identical

……

StatelessPostgres

StatelessPostgres

StatelessPostgres

DocDB DocDB DocDB

SQL APPNot affected by

node failures

YugaByte Node YugaByte Node YugaByte Node

Fault tolerantcan survive node

failures

Stateless tierconnect to any

live node

Page 34: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

34© 2019 All rights reserved.

YSQLUsing distributed PostgreSQL

Page 35: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

35© 2019 All rights reserved.

Creating YSQL Tables

• YSQL Tables• User tables map to one DocDB table• Each index maps to a separate DocDB table• PostgreSQL system catalogs map to special DocDB tables• Used for schema enforcement• Handle views, foreign tables, stored procedures, etc.

• YSQL Rows• Each row maps to one document in DocDB: key à document• The primary key column(s) map to the document key• Tables without primary key use an internal ID (logically a row-id)

Page 36: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

36© 2019 All rights reserved.

System Catalogs are Special Tables

……

StatelessPostgres

StatelessPostgres

StatelessPostgres

DocDB DocDB DocDB

CLIENT

System Catalog LeaderDenoted by solid line

System Catalog FollowersDenoted by dotted line

System catalogs are replicated tables with 1 tablet

Page 37: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

37© 2019 All rights reserved.

Creating a Table

……

StatelessPostgres

StatelessPostgres

StatelessPostgres

DocDB DocDB DocDB

CLIENT1) CREATE TABLE

Page 38: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

38© 2019 All rights reserved.

Creating a Table

……

StatelessPostgres

StatelessPostgres

StatelessPostgres

DocDB DocDB DocDB

CLIENT2) RECORD SCHEMA

Page 39: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

39© 2019 All rights reserved.

Creating a Table

……

StatelessPostgres

StatelessPostgres

StatelessPostgres

DocDB DocDB DocDB

CLIENT3) RAFT REPLICATE

Page 40: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

40© 2019 All rights reserved.

Creating a Table

……

StatelessPostgres

StatelessPostgres

StatelessPostgres

DocDB DocDB DocDB

CLIENT4) CREATE DOCDB TABLETS

Page 41: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

41© 2019 All rights reserved.

Using YSQL Tables

• Single-row Operations• Reads and writes handled by DocDB tablet leader• YSQL query layer is aware of clustering and partitioning• Will route queries to the right node (tablet leader).

• Multi-row Operations• Implemented using DocDB distributed transactions• E.g. insert into table with one index will perform the following:

BEGIN DOCDB DISTRIBUTED TRANSACTIONinsert into index values (…)insert into table values (…)

COMMIT

Page 42: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

42© 2019 All rights reserved.

INSERTING DATA

……

StatelessPostgres

StatelessPostgres

StatelessPostgres

DocDB DocDB DocDB

CLIENTINSERT ROW

Page 43: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

43© 2019 All rights reserved.

INSERTING DATA

……

StatelessPostgres

StatelessPostgres

StatelessPostgres

DocDB DocDB DocDB

CLIENTINSERT INTO TABLET LEADER

Page 44: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

44© 2019 All rights reserved.

INSERTING DATA

……

StatelessPostgres

StatelessPostgres

StatelessPostgres

DocDB DocDB DocDB

CLIENTRAFT REPLICATE DATA

Page 45: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

45© 2019 All rights reserved.

Read more atblog.yugabyte.com

blog.yugabyte.com/distributed-postgresql-on-a-google-spanner-architecture-storage-layer/Storage layer details:

https://blog.yugabyte.com/distributed-postgresql-on-a-google-spanner-architecture-query-layer/

Query layer details:

Page 46: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

46© 2019 All rights reserved.

PostgreSQL Meets Spanner!

• Leverage PostgreSQL features• Built-in expressions and functions• Joins, Aggregations, Views• Stored Procedures, Triggers• Extensions like Foreign Data Wrappers (FDW)

• Leverage Spanner-like DocDB features• Linear Scalability• Fault Tolerance with high availability• Run natively in Kubernetes• Zero Downtime SQL database• Alter schema• Rolling software upgrades• Change machine types

Page 47: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

47© 2019 All rights reserved.

DEMOTry it yourself!

Page 48: YugaByte DB - Postgres Conf · 3/21/2019  · © 2019 All rights reserved. 1 YugaByte DB Distributed PostgreSQL on Google Spanner Architecture Karthik Ranganathan Mihnea Iancu Mar

48© 2019 All rights reserved.

Questions?Try it at docs.yugabyte.com/quick-start

Check us out on GitHubhttps://github.com/YugaByte/yugabyte-db