cratedb: a search engine or a database? both! · why are we talking about this? • traditional...

29
CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! HOW WE BUILT A SQL DATABASE ON TOP OF ELASTICSEARCH AND LUCENE Maximilian Michels @stadtlegende [email protected] [email protected]

Upload: others

Post on 25-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

C R AT E D B : A S E A R C H E N G I N E O R A D ATA B A S E ? B O T H !

H O W W E B U I LT A S Q L D ATA B A S E O N T O P O F E L A S T I C S E A R C H A N D L U C E N E

Maximilian Michels @stadtlegende

[email protected] [email protected]

Page 2: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

W H Y A R E W E TA L K I N G A B O U T T H I S ?

• Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

• Scalable search using these can be tricky

• Search engines are databases optimized for search and scale (Lucene, Solr, Elasticsearch)

• You can’t typically use SQL with Search Engines

• Why not stick with an mature query language standard which everybody knows?

2

Page 3: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

“A scalable SQL database optimized for search without the NoSQL bullshit.”

Page 4: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

C R AT E D B I N A N U T S H E L L

• Since 2014: https://github.com/crate/crate

• Apache 2.0 licensed (community edition)

• Built using Elasticsearch, Lucene, Netty, Antlr, …

• SQL-99 compatible

• REST / Postgres Wire Protocol / JDBC / Python …

4

Page 5: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

W H AT T O E X P E C T

• What is great about CrateDB

• Easy to setup

• No funny APIs / SQL

• Great scale out - Massive reads / writes

• Container aware

• Not so great

• Transactions

• Foreign keys

5

Page 6: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

U S I N G C R AT E D B

Page 7: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

C R AT E D B I S J U S T L I K E A S Q L D B

• SQL is the only query API

•CREATE TABLE fosdem.speakers (id int PRIMARY KEY, name string)

•CREATE TABLE fosdem.talks (id INT PRIMARY KEY, title STRING, abstract STRING, speaker INT);

•INSERT INTO fosdem.speakers (id, name) VALUES (1, ’max’)

•INSERT INTO fosdem.talks (id, title, abstract, speaker) VALUES (1, ‘Talk about CrateDB’, ‘bla’, 1)

•SELECT * FROM fosdem.talks t1 LEFT JOIN fosdem.speakers t2 ON t1.id = t2.id

7

Page 8: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

B U T T H E R E I S M O R E

• CrateDB denormalized (no joins necessary) •CREATE TABLE fosdem.speakers (name STRING, talk OBJECT AS (title STRING, abstract STRING))

•INSERT INTO fosdem.speakers (name, talk) VALUES (‘max’, {title = ‘CrateDB’, abstract = ‘Lorem ipsum’})

•SELECT talk[‘title’] as title FROM fosdem.speakers ORDER BY title

8

Page 9: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

C L U S T E R I N G / R E P L I C AT I O N

•CREATE TABLE fosdem.speakers (name STRING, talk OBJECT AS (title STRING, abstract STRING))

•CLUSTERED BY name into 4 shards

9

S H A R D

N O D E 1 N O D E 2 N O D E 3 N O D E 4

Page 10: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

C L U S T E R I N G / R E P L I C AT I O N

•CREATE TABLE fosdem.speakers (name STRING, talk OBJECT AS (title STRING, abstract STRING))

•CLUSTERED BY name into 4 shards

•WITH (number_of_replicas = 1)

10

P R I M A R Y

R E P L I C A

N O D E 1 N O D E 2 N O D E 3 N O D E 4

Page 11: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

PA R T I T I O N E D TA B L E S

•CREATE TABLE fosdem.speakers (name STRING, talk OBJECT as (title = STRING, abstract = STRING), year INT)

•CLUSTERED BY name into 4 shards

•PARTITIONED BY (year, …)

•WITH (number_of_replicas = 1)

11

N O D E 1 N O D E 2 N O D E 3 N O D E 4

P R I M A R Y

R E P L I C A

Page 12: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

M O R E F E AT U R E S

• Aggregations

• Geo search

• Text Analyzers

• UDFs

• Snapshots

• User management

• Schema / Table privileges

• SSL encryption

• MQTT Ingestion

12

Page 13: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

A R C H I T E C T U R E

Page 14: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

14

• CrateDB: Distributed SQL Execution Engine

• Antlr: Parsing of SQL statements

• Netty: REST, Postgres Wire Protocol, Web interface

• Lucene: Storage, Indexing, Queries

• Elasticsearch: Transport, Routing, Replication

O N T H E S H O U L D E R S O F G I A N T S

Page 15: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

I N T R O D U C T I O N T O

• Lucene stores documents which are CrateDB’s rows

• Documents have fields

• { _id : ‘123’, name : ‘Bob’, title : ‘How I Learned to Stop Worrying and Love the Bomb’, text : ‘Lorem ipsum…'}

• Fields are indexed for efficient lookup

• Fields have column store for efficient aggregation

15

Page 16: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

I N T R O D U C T I O N T O E L A S T I C S E A R C H

• Elasticsearch core concepts revolve around indices, shards, and replicas

• An index is a document store with n parts, called shards

• Each shard has 0 or more replicas which hold copies of the shard data

• Replicas are not only useful for fault tolerance but also increase the search performance

16

Page 17: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

H O W TA B L E S R E L AT E T O I N D I C E S A N D S H A R D S• Each table in CrateDB is

represented by an ES index with a mapping

• Each partition in a partitioned table is represented by an ES index

• Partition indices are created by encoding the partition value in the index name

17

TA B L E T 1 T 2 T 3 …

I N D E X t1 t2.day1 t2.day2 … t3 …

S H A RD 1

X X X … X …

S H A RD 2

X X X … X …

S H A RD 3

X X …

S H A RD 4

X …

… …

"properties":{ “name":{"type":"keyword"}, "talks":{"dynamic":"true", "properties":{ "abstract":{"type":"keyword"}, "title":{"type":"keyword"} } } }

Page 18: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

N O D E 1

F R O M Q U E R Y T O E X E C U T I O N• SELECT name, count(*) as talks FROM fosdem.speakers

WHERE room = ‘hpc’ AND year = 2018 GROUP BY name ORDER BY name

18

C L I E N T

PA R S E R A N A LY Z E R

P L A N N E R E X E C U T O R

R E S T / P O S T G R E S

T R A N S P O R T ( E S )

S T O R A G E ( L U C E N E )

W E B J D B CP S Q L C R A S H

N O D E 3

N O D E 2 N O D E 4

N O D E 5

P Y T H O N R U S T N O D E

Page 19: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

A R C H I T E C T U R E H I G H L I G H T S

• Distributed storage / Distributed query execution

• Masterless

• Replication

• Only ephemeral storage needed (Container aware)

• Optimized for search: Indexing of all fields with Lucene (tuneable)

19

N O D E 2

N O D E 1 N O D E 3

N O D E 4

Page 20: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

H A N D S - O N

Page 21: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

21

• Monitoring (IoT, Industry 4.0, Cyber Security)

• Stream Analysis

• Text Analysis

• Time Series Analysis

• Geospatial Queries

W H AT C A N Y O U D O W I T H C R AT E D B ?

Page 22: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

D E M O

CrateDB Web Interface

Page 23: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

CrateDB Web Interface

Page 24: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

CrateDB Web Interface

Page 25: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

CrateDB Web Interface

Page 26: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

C O N C L U S I O N

Page 27: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

W H AT W E H AV E L E A R N E D

• Elasticsearch used Lucene and Netty to built a distributed search engine

• CrateDB used Elasticsearch, Lucene, and Netty to built a distributed SQL database

• CrateDB is perfect when you

• want or have to use SQL

• store large amounts of structured or unstructured data

• have many thousands of queries per second

27

Page 28: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

S E E F O R Y O U R S E L F !

• Try out CrateDB

• Download from https://crate.io/download/

• or $ curl try.crate.io | bash

• or $ docker run crate

• or build from source https://github.com/crate/crate

• Check out https://crate.io/docs

• Contributions welcome

• Check out https://github.com/crate/crate/blob/master/devs/docs/index.rst

• Check out the issues

• Stackoverflow

• Join our Slack channel

28

Page 29: CRATEDB: A SEARCH ENGINE OR A DATABASE? BOTH! · WHY ARE WE TALKING ABOUT THIS? • Traditional databases are well-researched and there are plenty of them (Postgres, MySQL, Oracle…)

T H A N K Y O U !

Maximilian Michels @stadtlegende [email protected]

[email protected]