solr & cassandra: searching cassandra with datastax enterprise

36
Searching Cassandra with Solr Rachel Pedreschi Lead Technical Evangelist- Datastax Enterprise @rachelpedreschi An Introductory Technical Overview of Datastax Enterprise Search

Upload: planet-cassandra

Post on 17-Aug-2015

252 views

Category:

Technology


4 download

TRANSCRIPT

Searching Cassandra with Solr

Rachel Pedreschi Lead Technical Evangelist- Datastax Enterprise

@rachelpedreschi

An Introductory Technical Overview of Datastax Enterprise Search

Confidential

What is Search?

2

Confidential 3

Confidential 4

The bright blue butterfly hangs on the breeze.

[the] [bright] [blue] [butterfly] [hangs] [on] [the] [breeze]

Tokens

Confidential 5

Credit: https://developer.apple.com/library/mac/documentation/userexperience/conceptual/SearchKitConcepts/searchKit_basics/searchKit_basics.html

Terms

Confidential

It can be lonely for Solr

6

Confidential

Cassandra

7

✓ Highly available

✓ Linear scalability

✓ Low latency OLTP queries

C *

Confidential 8

+ =

Confidential

Like… High Availability

9

Data Partitioning

Application

Data Center 1

hash(key) => token(43)

80

10

3050

70

60

40

20

Application

Data Center 1

Replication

hash(key) => token(43)

replication factor = 3

80

10

3050

70

60

40

20

Multi-Data Center Replication

Application

Data Center 1

hash(key) => token(43)

replication factor = 3

80

10

3050

70

60

40

20

Data Center 2

replication factor = 3

81

11

3151

71

61

41

21

Confidential

How does DSE integrate Solr?

13

C * C * /S O L R

Confidential 14

Confidential 15

SELECT * FROM killrvideo.videos WHERE solr_query='{ "q": "{!edismax qf=\"name^2 tags^1 description\"}datastax" }';

SELECT id, value FROM keyspace.table WHERE token(id) >= -3074457345618258601 AND token(id) <= 3074457345618258603 AND solr_query='id:*'

Confidential

Vocab

16

Cassandra term Solr term

Column Family / Table Core

Row Document

Column Field

SSTable Index

… … … …

… … … …

… … … …

… … … …… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

17

Node memory

Node file system

Client

partition key1 first:Oscar last:Orange level:42

partition key2 first:Ricky last:Red

Memtable (corresponds to a CQL table)

Coordinator

CommitLog

Append O

nly

… … … …

… … … …

… … … …

SSTables

Flush current state to SSTable

Compact relatedSSTables

Write <3, Betty, Blue, 63>

Acknowledge

partition key3 first:Betty last:Blue level:63

Compaction

Each write request …

Periodically …

Periodically …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

18

Node memory

Node file system

Client1 best 1

2 bright 2,3

Ram Buffer

Coordinator

… … … …

… … … …

… … … …

Segments

Flushes current state to Segment (Softcommit)

Write <1,blue, 2,3>

3 blue 2,3

Merge (STW)

Each write request …

Periodically …

On C* Memtable Flush, In memory segments hard commit to disk

Shard Router

… … … …… … … …

… … … …

… … … …… … … …

… … … …

… … … …… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

19

Node memory

Node file system

1 best 1

2 bright 2,3

Ram Buffer

… … … …

… … … …

… … … …

Segments

3 blue 2,3

Not Searchable

Searchable

Coordinator

Shard Router

… … … …… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

… … … …

Confidential

And… Scalability

20

Application

80

10

3050

70

60

40

20Data Center 1

Application

Data Center 1

80

10

30

50

70

60

40

20

Application

Data Center 1

80

8

32

56

72

64

48

16

24

4040

24

Confidential

Even… Improved Performance

24

Confidential 25

Standard Solr Indexing

DSE Search Live Indexing

Confidential 26

Confidential

Let’s go code diving!

27

Confidential

Behind the scenes…

28

// Videos by id CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );

// Index for tag keywords

CREATE TABLE videos_by_tag ( tag text, videoid uuid, added_date timestamp, userid uuid, name text, preview_image_location text, tagged_date timestamp, PRIMARY KEY (tag, videoid) );

Not a gre

at idea

Possible I

ndex

Confidential 29

// Videos by id CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid)

And th

is?This?

This?

Confidential 30

Confidential 31

1) Spin up a new C* Cluster with search enabled using the DSE installer.

$ sudo service dse cassandra -s

2) Run your schema DDL to create the C* keyspace and tables. 3) Run dse_tool on the videos table

$ dsetool create_core killrvideo.videos generateResources=true

4) Use the Solr Admin to check sanity and make sure you have a core. 5) Write a CQL query with a Solr Search in it.

SELECT * FROM killrvideo.videos WHERE solr_query='{ "q": "{!edismax qf=\"name^2 tags^1 description\"}datastax" }';

Confidential

Search all of the things in 5 easy steps…

32

Confidential

Resources

33

www.killrvideo.com

https://github.com/LukeTillman/killrvideo-csharp

www.datastax.com

Confidential 34

Questions?

35

50% off Priorty Pass: RachelP50

25% Certification: RachelPCert

Thank You @RachelPedreschi