sasi, cassandra on the full text search ride at voxxed day belgrade 2016

63
SASI, Cassandra on the full text search ride DuyHai DOAN – Apache Cassandra evangelist

Upload: duyhai-doan

Post on 08-Jan-2017

331 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

SASI, Cassandra on the full text search ride

DuyHai DOAN – Apache Cassandra evangelist

Page 2: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

1 5 minutes introduction to Apache Cassandra™ 2 SASI introduction 3 SASI cluster-wide 4 SASI local read/write path 5 Query planner 6 Some benchmarks 7 Take away

2

Page 3: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

5 minutes introduction to Apache Cassandra™

Page 4: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

The tokens

4

Random hash of #partition à token = hash(#p)

Hash: ] –x, x ] hash range: 264 values

x = 264/2

C*

C*

C*

C*

C* C*

C* C*

Page 5: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Token ranges

5

A : −x,−3x4

⎥⎥

⎥⎥

B : −3x4

,− 2x4

⎥⎥

⎥⎥

C : −2x4

,− x4

⎥⎥

⎥⎥

D : − x4

,0⎤

⎥⎥

⎥⎥

E : 0, x4

⎥⎥

⎥⎥

F : x4

,2x4

⎥⎥

⎥⎥

G : 2x4

,3x4

⎥⎥

⎥⎥

H : 3x4

,x⎤

⎥⎥

⎥⎥

H

A

E

D

B C

G F

Page 6: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Distributed tables

6

H

A

E

D

B C

G F

user_id1

user_id2

user_id3

user_id4

user_id5

CREATE TABLE users( user_id int, …, PRIMARY KEY(user_id)),

Page 7: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Distributed tables

7

H

A

E

D

B C

G F

user_id1

user_id2

user_id3

user_id4

user_id5

Page 8: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Coordinator node

8

Responsible for handling requests (read/write)

Every node can be coordinator• masterless • no SPOF • proxy role

H

A

E

D

B C

G F

coordinator

request

1

2 3

Page 9: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai 9

Q & A

! "

Page 10: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

SASI introduction

Page 11: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

What is SASI ?

11

•  SSTable-Attached Secondary Index à new 2nd index impl that follows SSTable life-cycle

•  Objective: provide more performant & capable 2nd index

Page 12: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Who created it ?

12

Open-source contribution by an engineers team

Page 13: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Why is it better than native 2nd index ?

13

•  follow SSTable life-cycle (flush, compaction, rebuild …) à more optimized•  new data-strutures •  range query (<, ≤, >, ≥) possible •  full text search options

Page 14: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai 14

Demo

Page 15: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

SASI cluster-wide

Page 16: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Distributed index

16

On cluster level, SASI works exactly like native 2nd index

H

A

E

D

B C

G F

UK user1 user102 … user493

US user54 user483 … user938

UK user87 user176 … user987

UK user17 user409 … user787

Page 17: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Distributed search algorithm

17

H

A

E

D

B C

G F

coordinator

1st roundConcurrency factor = 1

Page 18: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Distributed search algorithm

18

H

A

E

D

B C

G F

coordinator

Not enough results ?

Page 19: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Distributed search algorithm

19

H

A

E

D

B C

G F

coordinator

2nd roundConcurrency factor = 2

Page 20: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Distributed search algorithm

20

H

A

E

D

B C

G F

coordinator

Still not enough results ?

Page 21: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Distributed search algorithm

21

H

A

E

D

B C

G F

coordinator

3rd roundConcurrency factor = 4

Page 22: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Concurrency factor formula

22

•  more details at: http://www.planetcassandra.org/blog/cassandra-native-secondary-index-deep-dive/

Page 23: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Caveat 1: non restrictive filters

23

H

A

E

D

B C

G F

coordinator

Hit all nodes

eventually L

Page 24: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Caveat 1 solution : always use LIMIT

24

H

A

E

D

B C

G F

coordinator

SELECT * FROM …

WHERE ... LIMIT 1000

Page 25: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Caveat 2: 1-to-1 index (user_email)

25

H

A

E

D

B C

G F

coordinator

Not found WHERE user_email = ‘xxx'

Page 26: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Caveat 2: 1-to-1 index (user_email)

26

H

A

E

D

B C

G F

coordinator

Still no result

WHERE user_email = ‘xxx'

Page 27: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Caveat 2: 1-to-1 index (user_email)

27

H

A

E

D

B C

G F

coordinator

At best 1 user foundAt worst 0 user found

WHERE user_email = ‘xxx'

Page 28: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Caveat 2 solution: materialized views

28

For 1-to-1 index/relationship, use materialized views instead

CREATE MATERIALIZED VIEW user_by_email ASSELECT * FROM usersWHERE user_id IS NOT NULL and user_email IS NOT NULLPRIMARY KEY (user_email, user_id)

But range queries ( <, >, ≤, ≥) not possible …

Page 29: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Caveat 3: fetch all rows for analytics use-case

29

H

A

E

D

B C

G F

coordinator

Client

Page 30: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Caveat 3 solution: use co-located Spark

30

H

A

E

D

B C

G F

Local index filtering in Cassandra Aggregation in Spark

Local index query

Page 31: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

SASI local read/write path

Page 32: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

SASI Life-cycle: in-memory

32

Commit log1

. . .

1

Commit log2

Commit logn

Memory

. . . MemTable Table1

MemTable Table2

MemTable TableN

2

Index MemTable1

Index MemTable2

. . . Index

MemTableN 3

ACK the client

Page 33: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Local write path data structures

33

Index mode, data type Data structure Usage PREFIX, text Guava ConcurrentRadixTree name LIKE 'John%'

CONTAINS, text Guava ConcurrentSuffixTree name LIKE ’%John%'name LIKE ’%ny’

PREFIX, other JDK ConcurrentSkipListSet age = 20age >= 20 AND age <= 30

SPARSE, other JDK ConcurrentSkipListSet age = 20age >= 20 AND age <= 30

suitable for 1-to-N index with N ≤ 5

Page 34: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

SASI Life-cycle: flush to SSTable

34

Commit log1

. . .

1

Commit log2

Commit logn

Memory

Table1

SStable1

Table2 Table3

SStable2 SStable3 4

OnDiskIndex1

OnDiskIndex2 OnDiskIndex3

Page 35: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

SASI Life-cycle: compaction

35

SSTable1 SSTable2 SSTable3

New SSTable

OnDiskIndex1 OnDiskIndex2 OnDiskIndex3

New OnDiskIndex

Page 36: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Local write path summary

36

Index files are built•  on memtable flush•  on compaction flush

To avoid OOM, index files are split into chunk of•  1Gb for memtable flush•  max_compaction_flush_memory_in_mb for compaction flush

à consequences: SASI has impact on write bandwidth (CPU & disk I/O)

Page 37: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Local read path

37

•  first, optimize query using Query Planer (see later)•  then load chunks (4k) of index files from disk into memory•  perform binary search to find the indexed value(s)•  retrieve the corresponding partition keys and push them into the Partition

Key Cache

à Yes, currently SASI only keep partition key(s) so on wide partition it’s not very optimized ...

Page 38: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

OnDiskIndex files

38

SStable1

SStable2

user_id4 FR user_id1 US user_id5 FR

user_id3 UK user_id2 DE

OnDiskIndex1

FR US

OnDiskIndex2

UK DE

B+Tree-like data structures

Page 39: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

OnDiskIndex Layout

39

Header Block Data Block

Pointer Block Data Block Meta

Pointer Block Meta

Level Index Offset

4k Multiple of 4k

Multiple of 4k

Levels Count

Meta Data Info

Page 40: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Header Block Layout

40

Descriptor Version

Header Block layout

variable

Term Size

short

Index Mode

Min Term

Max Term

Min Pk

Max Pk

Has Partial

short short short short variable byte

Page 41: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

OnDiskIndex Layout

41

Header Block Data Block

Pointer Block Data Block Meta

Pointer Block Meta

Level Index Offset

4k Multiple of 4k

Multiple of 4k

Levels Count

Meta Data Info

Page 42: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Data Block layout

42

Terms Count

4k

Offset Array: [0, 10, 22, …] Term Block TokenTree Block

4k

Terms Count Offset Array: [0, 23, 35, …] Term Block TokenTree Block

Terms Count Offset Array: [0, 17, 34, …] Term Block TokenTree Block

Terms Count Offset Array: [0, 12, 28, …] Term Block TokenTree Block

Padding

Padding

Padding

Padding

Padding

Padding

Padding

Padding

Page 43: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

OnDiskIndex Layout

43

Header Block Data Block

Pointer Block Data Block Meta

Pointer Block Meta

Level Index Offset

4k Multiple of 4k

Multiple of 4k

Levels Count

Meta Data Info

Page 44: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Pointer Block building

44

Data Block1

Pointer Block1

4k

Data Block2 Data BlockN…

LastTerm1 LastTerm2 LastTermN…Pointer Block2 …

4k

Pointer BlockN

Pointer BlockN+1

LastTermM LastTermM+1 … LastTermO

Pointer BlockN+2

…Root Pointer Block

Pointer Level 1

Pointer Level 2

Pointer Root Level

Data Level

Page 45: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Binary search using OnDiskIndex files

45

Data Block1 Data Block2 Data BlockN

Pointer Block Pointer Block …Root Pointer Block

Pointer Level 2

Pointer Level 3

Pointer Root Level

Data Level

Pointer Block

Pointer Block Pointer Block Pointer Block…Pointer Block Pointer Block Pointer Block… Pointer Level 1

Data Block3 …

Page 46: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Term Block Binary Search

46

Term1 Term50 Term100

val < Term100 ?

Term25 Term75

Term50 Term75 Term100val > Term50 ?

Term75Term50 Term63 val < Term75 ?

…Term57 val = Term57 ?

Page 47: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Query Planner

Page 48: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Query planner

48

•  build predicates tree•  predicates push-down & re-ordering•  predicate fusions for != operator

Page 49: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Query optimization example

49

WHERE age < 100 AND fname LIKE 'p%' AND fname != 'pa%' AND age > 21

Page 50: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Query optimization example

50

AND is associative and commutative

Page 51: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Query optimization example

51

!= transformed to exclusion on range scan

Page 52: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Query optimization example

52

AND is associative and commutative

Page 53: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Some benchmarks

Page 54: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Hardware specs

13 bare-metal machines • 6 CPU HT (12 vcores)• 64Gb RAM• 4 SSDs in RAID0 for a total of 1.5Tb

Data set• 13 billions of rows• 1 numerical index with 36 distinct values • 2 text index with 7 distinct values • 1 text index with 3 distinct values

54

Page 55: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Benchmark results

Full table scan using co-located Spark (no LIMIT)

55

Predicate count Fetched rows Query time in sec 1 36 109 986 6092 2 781 492 3303 1 044 547 3724 360 334 116

Page 56: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Benchmark results

Full table scan using co-located Spark (no LIMIT)

56

Predicate count Fetched rows Query time in sec 1 36 109 986 6092 2 781 492 3303 1 044 547 3724 360 334 116

Page 57: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Benchmark results

Beware of disk space usage for full text search !!!Table albums with ≈ 110 000 records, 6.8Mb data size

57

Page 58: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

Take Away

Page 59: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

SASI vs search engines

SASI vs Solr/ElasticSearch ?

•  Cassandra is not a search engine !!! (database = durability) •  always slower because 2 passes (SASI index read + original Cassandra data)•  no scoring •  no ordering (ORDER BY)•  no grouping (GROUP BY) à Apache Spark for analytics

If you don’t need the above features, SASI is for you!

59

Page 60: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

SASI sweet spots

SASI is a relevant choice if

•  you need multi criteria search and you don't need ordering/grouping/scoring•  you mostly need 100 to 10000 of rows for your search queries•  you always know the partition keys of the rows to be searched for (this one applies to

native secondary index too)•  you want to index static columns (SASI has no penalty since it indexes the whole

partition)

60

Page 61: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai

SASI blind spots

SASI is a poor choice if

•  you have strong SLA on search latency, for example few millisecs requirement•  ordering of the search results is important for you

61

Page 62: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai 62

Q & A

! "

Page 63: Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016

@doanduyhai 63

Thank You @doanduyhai

[email protected]

https://academy.datastax.com/