cassandra summit 2014: huge online genealogical database driven by cassandra

53
1 © 2014 by Intellectual Reserve, Inc. All rights reserved. Huge Online Genealogical Database Driven By Cassandra Cassandra Summit 2014 John Sumsion

Upload: planet-cassandra

Post on 05-Dec-2014

266 views

Category:

Technology


1 download

DESCRIPTION

Presenter: Software Developer at Family Search FamilySearch hosts a collaborative family tree with over a billion editable records. The tree currently serves as many as 10,000 concurrent users at peak weekly load. These users come from across the globe and collectively maintain and enhance the tree around the clock. Recent efforts to port the tree from a relational database to Cassandra have resulted in drastically improved performance and scalability. The database consists of more than 5 billion records in journaled form, and we anticipate having over 10TB of live data available for user view & edit, with that data size growing significantly as our user base grows. The dataset has resisted sharding in the past, so the port involved rethinking the core data model. The model we chose retains the consistency that our users demand, and is able to be implemented without requiring ACID transactions. Specifically, the consistency model we chose combined a Convergent and Commutative Replicated Data Type (CvRDT and CmRDT) with Cassandra's atomic batch implementation to form the basis for a consistency model that met the demanding needs of the family tree application.

TRANSCRIPT

Page 1: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

1 © 2014 by Intellectual Reserve, Inc. All rights reserved.

Huge Online Genealogical Database

Driven By Cassandra

Cassandra Summit 2014

John Sumsion

Page 2: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

2

Outline

• Introduction to FamilySearch Family Tree

• Outline of Cassandra reimplementation

• Journal-based Consistency Model

• Experience with Cassandra

Page 3: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

3

What is FamilySearch?

Familysearch.org website

Very large single pedigree (Family Tree)

Largest collection of free genealogical records

Largest genealogical library

Family History Department of Church of Jesus

Christ of Latter-day Saints (known as Mormons)

Page 4: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

4

Why does FamilySearch exist?

Visit http://mormon.org/family-history/

Page 5: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

5

Family Tree

Records Indexing Family Tree

Memories

Community

Where it fits

Page 6: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

6

Record Preservation

Neglect

Time

Disasters (e.g. WWII)

Page 7: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

7

Record Preservation (continued)

• 100 million images published online / year

Page 8: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

8

Indexing

3.5 billion indexed records – 35M / month

Turns this… …into this!

Page 9: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

9

Memories

Page 10: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

10

Community

Page 11: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

11

Family Tree

Records Indexing Family Tree

Memories

Community

Where it fits

Page 12: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

12

Family Tree Data

Family Tree:

• 900M+ person records, open-edit

• 500M+ relationships, open-edit

• 8.4B change log entries, 100M+ per quarter

• Dynamic OLTP system

• Data-dependent performance issues

Page 13: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

13

Family Tree: Example 9 Gen Pedigree

up to 511 person slots Dynamic content!

Page 14: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

14

Family Tree: Example Pedigree App

31+ persons per section Dynamic content!

Page 15: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

15

Family Tree: Example Ancestor Page

10+ persons in families 100-1000+ changes Dynamic content!

Page 16: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

16

Family Tree: Example Change History

100-1000+ changes Dynamic content!

Page 17: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

17

Contents

• Introduction to FamilySearch Family Tree

• Outline of Cassandra reimplementation

• Journal-based Consistency Model

• Experience with Cassandra

Page 18: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

18

Performance & Scale

• Slow page views • pedigree (500-3000ms for 3 generations)

• change history (2000+ms for first page of changes)

• large family view

• Query problems • relationships connect persons, range scan by person id

• every person => person traversal is 200-300M btree scan

(global index)

• change history queries travers 8+B btree scan

(global index)

Page 19: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

19

Performance & Scale

• Query performance problems

Person Relationship

Person

Wide range scan

Pedigree

Change History Change History

Wide range scan

Page 20: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

20

Cassandra Reimplementation

• selected Cassandra after extensive testing

• full data scale proof-of-concept & tests

• required: new data model (performance)

• required: new consistency model (critical!)

Page 21: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

21

Cassandra Reimplementation

• event-sourced data model – journal / views

• new data model – no indexes

• new consistency model – satisfies consistency

JE #8

P1 P1 Views

A B

JE #6

P2 P2 Views

A B

Page 22: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

22

Cassandra Reimplementation

• denormalized relationships

P1 P2

R1

R2

R3

R5

R4

Page 23: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

23

Cassandra Reimplementation

• denormalized relationships

P1 P2

R1

R2

R3

R5

R4

R2

R3

Page 24: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

24

Cassandra Reimplementation

• denormalized relationships

• exact duplication allows biderectional traversal

Person/Rels

Person/Rels

Person Relationship

Person

Wide query P1 P2

R1

R2

R3

R5

R4

R2

R3

Page 25: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

25

Cassandra Reimplementation

• change history is a core feature

• denormalized change history

• optimizes for displaying recent changes

JE #8

P1 P1 Change History View

1000s of changes (spread over multiple Cassandra cells)

Last 100-1000 changes (local to a single Cassandra cell)

Page 26: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

26

Contents

• Introduction to FamilySearch Family Tree

• Outline of Cassandra reimplementation

• Journal-based Consistency Model

• Experience with Cassandra

Page 27: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

27

Journal-based Consistency Model

Command Journal View View

View

Rough Process Flow

captures edits safely

stores edits canonically

view-optimized summations

Page 28: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

28

Journal-based Consistency Model

Command

• write-once with quorum

• application to journal requires 3 tables:

pending / completed / aborted

• idempotent application to journal

Command Journal View View

View

Page 29: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

29

Journal-based Consistency Model

Command Schema

• key: command v1 uuid (as text)

• value: blob (binary json)

Command Journal View View

View

Page 30: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

30

Journal-based Consistency Model

Journal

• write-once with quorum & C* batch

• denormalized byte-exact across

affected persons & relationships

• each entry stored in separate cell

(compaction required for fast journal reads)

Command Journal View View

View

Page 31: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

31

Journal-based Consistency Model

Journal

• CmRDT (commutative replicated type)

• partitions converge without conflict

because of unique uuid

Command Journal View View

View

Page 32: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

32

Journal-based Consistency Model

Command Journal View View

View

Partition Key Command UUID Content (blob)

KWZ3-P71

KWZ3-P71

eda6f967-0955…

6af8d90c-8f3a…

{ "attribution": {}, … } (binary json)

{ "attribution": {}, … } (binary json)

KCDT-J59 fd35ac61-7def… { "attribution": {}, … } (binary json)

KCDT-J59 b2db2fa5-da5f… { "attribution": {}, … } (binary json)

Page 33: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

33

Journal-based Consistency Model

View

• multiple views for multiple uses (person, person card, change history)

• populated by applying journal entries

• incrementally updated in steady state

• not canonical data, can be recalculated

Command Journal View View

View

Page 34: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

34

Journal-based Consistency Model

Command Journal View View

View

P1 P1 Views

A B

Page 35: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

35

Journal-based Consistency Model

Command Journal View View

View

JE #8

P1 P1 Views

A B

JE #8 JE #8

Page 36: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

36

Journal-based Consistency Model

Command Journal View View

View

P1 P1 Views

A B

JE #8 JE #8

A (new)

B (new)

JE #8

Page 37: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

37

Journal-based Consistency Model

Command Journal View View

View

P1 P1 Views

A B

Page 38: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

38

Journal-based Consistency Model

View

• views have same schema as journal

• journal entries are written to view for

incremental refresh

• core of the consistency model

Command Journal View View

View

Page 39: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

39

Journal-based Consistency Model

View

• CvRDT (convergent replicated type)

• partitions converge with conflict; resolved

by full view refresh from canonical journal

• steady state: one view of a given type per

entity

Command Journal View View

View

Page 40: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

40

Journal-based Consistency Model

Command Journal View View

View

P1 P1 Views

A B

JE #8 JE #8

A (new)

B (new)

JE #8

Page 41: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

41

Journal-based Consistency Model

• Performance & Scale • lookup by partition key only, no indexes

• any cross-entity change happens in duplicate on all

• stored “current-state” views – cheapest possible read

• custom views – tunable to different use cases

• disposable views – able to tweak view over time

Page 42: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

42

Journal-based Consistency Model

• Business Rule Enforcement • Read / Write / Read & Revert

• pre-command checks prevent invalid changes

• write with appropriate quorum ensures consistent write

• post-command checks prevent business-rules conflicts

• administrative revert marks command as “not applicable”

and thereby causes full refresh which ignores changes

• appropriate quorum: depending on the change, either

LOCAL_QUORUM or EACH_QUORUM

Page 43: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

43

Journal-based Consistency Model

• Strong consistency • command store – atomic capture of a single user action

• command handling – idempotent writes to journal,

picked up later even if interrupted

• no global lock needed for optimistic concurrency

• Read after write • consistency ONE for normal reads

• quorum when the client knows it’s refreshing after write

Page 44: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

44

Journal-based Consistency Model

• Journal / View Concerns • native support for change history

• no journal tombstones in steady state – write-once

• blob schema implementable on any db engine that

supports two-level keys (partition, composite)

• consistency model implementable on any db engine that

supports batches & quorum writes/reads

• view tombstones on every write, biggest concern

• leveled compaction?

• WISH: size-tiered compaction with data locality hoisting

Page 45: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

45

Contents

• Introduction to FamilySearch Family Tree

• Outline of Cassandra reimplementation

• Journal-based Consistency Model

• Experience with Cassandra

Page 46: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

46

Experience with Cassandra

• tested Community 1.2 and 2.0

• fantastic performance

• easy cloud setup

• great developer response

• easy to bulk load through CQL3

• harder to get running inside AWS VPC

Page 47: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

47

Experience with Cassandra

• Bulk import experience • 8.4B change log records => 5.8B journal entries (2.5TB lzo)

• ‘hi1.4xlarge’ cluster (2x 1TB SSDs)

• import through CQL was fast enough

• 11h to import 5-node cluster (5h on 30-node cluster)

• 140k writes / sec, fed from 128 writer threads

• 20 records / unlogged batch write, 1-2k record size

• minimal post-import compaction (size-tiered)

• ended up with 3.5-4TB on C* disk after import

• OpsCenter – great visibility for tuning

• Community – harder to automate repairs, etc.

Page 48: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

48

Experience with Cassandra

• Full-scale load test experience • got to 25x our peak hourly load on 25-28-node cluster

• production peak load included significant write load

• working-set size was about 2M persons in a month

• enabled row cache, ran almost entirely without disk access

• bottlenecked on interconnect socket w/ round robin client

• got 50% boost from token-aware, round robin client

• OpsCenter – great visibility for tuning

• Large SSD cluster – able to handle repair

during scale tests

Page 49: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

49

Experience with Cassandra

current system

cassandra impl (1x, 10x, 20x)

Page 50: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

50

Experience with Cassandra

current system

cassandra impl (1x, 10x, 20x)

LOG SCALE!

Page 51: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

51

Current Status

• still working on implementation & rollout

• migration, reconciliation, integration…

• consistency model code separate

Page 52: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

52

Contents

• Introduction to FamilySearch Family Tree

• Outline of Cassandra reimplementation

• Journal-based Consistency Model

• Experience with Cassandra

Questions?

Page 53: Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra

53

Contact Info

John Sumsion

Sr. Software Engineer

[email protected]

@jdsumsion

Thanks to the team at FamilySearch! esp. Randy & James for doing the model

Thanks to the awesome presenters & organizers at #CassandraSummit!