cassandra summit 2014: huge online genealogical database driven by cassandra

Post on 05-Dec-2014

266 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presenter: Software Developer at Family Search FamilySearch hosts a collaborative family tree with over a billion editable records. The tree currently serves as many as 10,000 concurrent users at peak weekly load. These users come from across the globe and collectively maintain and enhance the tree around the clock. Recent efforts to port the tree from a relational database to Cassandra have resulted in drastically improved performance and scalability. The database consists of more than 5 billion records in journaled form, and we anticipate having over 10TB of live data available for user view & edit, with that data size growing significantly as our user base grows. The dataset has resisted sharding in the past, so the port involved rethinking the core data model. The model we chose retains the consistency that our users demand, and is able to be implemented without requiring ACID transactions. Specifically, the consistency model we chose combined a Convergent and Commutative Replicated Data Type (CvRDT and CmRDT) with Cassandra's atomic batch implementation to form the basis for a consistency model that met the demanding needs of the family tree application.

TRANSCRIPT

1 © 2014 by Intellectual Reserve, Inc. All rights reserved.

Huge Online Genealogical Database

Driven By Cassandra

Cassandra Summit 2014

John Sumsion

2

Outline

• Introduction to FamilySearch Family Tree

• Outline of Cassandra reimplementation

• Journal-based Consistency Model

• Experience with Cassandra

3

What is FamilySearch?

Familysearch.org website

Very large single pedigree (Family Tree)

Largest collection of free genealogical records

Largest genealogical library

Family History Department of Church of Jesus

Christ of Latter-day Saints (known as Mormons)

4

Why does FamilySearch exist?

Visit http://mormon.org/family-history/

5

Family Tree

Records Indexing Family Tree

Memories

Community

Where it fits

6

Record Preservation

Neglect

Time

Disasters (e.g. WWII)

7

Record Preservation (continued)

• 100 million images published online / year

8

Indexing

3.5 billion indexed records – 35M / month

Turns this… …into this!

9

Memories

10

Community

11

Family Tree

Records Indexing Family Tree

Memories

Community

Where it fits

12

Family Tree Data

Family Tree:

• 900M+ person records, open-edit

• 500M+ relationships, open-edit

• 8.4B change log entries, 100M+ per quarter

• Dynamic OLTP system

• Data-dependent performance issues

13

Family Tree: Example 9 Gen Pedigree

up to 511 person slots Dynamic content!

14

Family Tree: Example Pedigree App

31+ persons per section Dynamic content!

15

Family Tree: Example Ancestor Page

10+ persons in families 100-1000+ changes Dynamic content!

16

Family Tree: Example Change History

100-1000+ changes Dynamic content!

17

Contents

• Introduction to FamilySearch Family Tree

• Outline of Cassandra reimplementation

• Journal-based Consistency Model

• Experience with Cassandra

18

Performance & Scale

• Slow page views • pedigree (500-3000ms for 3 generations)

• change history (2000+ms for first page of changes)

• large family view

• Query problems • relationships connect persons, range scan by person id

• every person => person traversal is 200-300M btree scan

(global index)

• change history queries travers 8+B btree scan

(global index)

19

Performance & Scale

• Query performance problems

Person Relationship

Person

Wide range scan

Pedigree

Change History Change History

Wide range scan

20

Cassandra Reimplementation

• selected Cassandra after extensive testing

• full data scale proof-of-concept & tests

• required: new data model (performance)

• required: new consistency model (critical!)

21

Cassandra Reimplementation

• event-sourced data model – journal / views

• new data model – no indexes

• new consistency model – satisfies consistency

JE #8

P1 P1 Views

A B

JE #6

P2 P2 Views

A B

22

Cassandra Reimplementation

• denormalized relationships

P1 P2

R1

R2

R3

R5

R4

23

Cassandra Reimplementation

• denormalized relationships

P1 P2

R1

R2

R3

R5

R4

R2

R3

24

Cassandra Reimplementation

• denormalized relationships

• exact duplication allows biderectional traversal

Person/Rels

Person/Rels

Person Relationship

Person

Wide query P1 P2

R1

R2

R3

R5

R4

R2

R3

25

Cassandra Reimplementation

• change history is a core feature

• denormalized change history

• optimizes for displaying recent changes

JE #8

P1 P1 Change History View

1000s of changes (spread over multiple Cassandra cells)

Last 100-1000 changes (local to a single Cassandra cell)

26

Contents

• Introduction to FamilySearch Family Tree

• Outline of Cassandra reimplementation

• Journal-based Consistency Model

• Experience with Cassandra

27

Journal-based Consistency Model

Command Journal View View

View

Rough Process Flow

captures edits safely

stores edits canonically

view-optimized summations

28

Journal-based Consistency Model

Command

• write-once with quorum

• application to journal requires 3 tables:

pending / completed / aborted

• idempotent application to journal

Command Journal View View

View

29

Journal-based Consistency Model

Command Schema

• key: command v1 uuid (as text)

• value: blob (binary json)

Command Journal View View

View

30

Journal-based Consistency Model

Journal

• write-once with quorum & C* batch

• denormalized byte-exact across

affected persons & relationships

• each entry stored in separate cell

(compaction required for fast journal reads)

Command Journal View View

View

31

Journal-based Consistency Model

Journal

• CmRDT (commutative replicated type)

• partitions converge without conflict

because of unique uuid

Command Journal View View

View

32

Journal-based Consistency Model

Command Journal View View

View

Partition Key Command UUID Content (blob)

KWZ3-P71

KWZ3-P71

eda6f967-0955…

6af8d90c-8f3a…

{ "attribution": {}, … } (binary json)

{ "attribution": {}, … } (binary json)

KCDT-J59 fd35ac61-7def… { "attribution": {}, … } (binary json)

KCDT-J59 b2db2fa5-da5f… { "attribution": {}, … } (binary json)

33

Journal-based Consistency Model

View

• multiple views for multiple uses (person, person card, change history)

• populated by applying journal entries

• incrementally updated in steady state

• not canonical data, can be recalculated

Command Journal View View

View

34

Journal-based Consistency Model

Command Journal View View

View

P1 P1 Views

A B

35

Journal-based Consistency Model

Command Journal View View

View

JE #8

P1 P1 Views

A B

JE #8 JE #8

36

Journal-based Consistency Model

Command Journal View View

View

P1 P1 Views

A B

JE #8 JE #8

A (new)

B (new)

JE #8

37

Journal-based Consistency Model

Command Journal View View

View

P1 P1 Views

A B

38

Journal-based Consistency Model

View

• views have same schema as journal

• journal entries are written to view for

incremental refresh

• core of the consistency model

Command Journal View View

View

39

Journal-based Consistency Model

View

• CvRDT (convergent replicated type)

• partitions converge with conflict; resolved

by full view refresh from canonical journal

• steady state: one view of a given type per

entity

Command Journal View View

View

40

Journal-based Consistency Model

Command Journal View View

View

P1 P1 Views

A B

JE #8 JE #8

A (new)

B (new)

JE #8

41

Journal-based Consistency Model

• Performance & Scale • lookup by partition key only, no indexes

• any cross-entity change happens in duplicate on all

• stored “current-state” views – cheapest possible read

• custom views – tunable to different use cases

• disposable views – able to tweak view over time

42

Journal-based Consistency Model

• Business Rule Enforcement • Read / Write / Read & Revert

• pre-command checks prevent invalid changes

• write with appropriate quorum ensures consistent write

• post-command checks prevent business-rules conflicts

• administrative revert marks command as “not applicable”

and thereby causes full refresh which ignores changes

• appropriate quorum: depending on the change, either

LOCAL_QUORUM or EACH_QUORUM

43

Journal-based Consistency Model

• Strong consistency • command store – atomic capture of a single user action

• command handling – idempotent writes to journal,

picked up later even if interrupted

• no global lock needed for optimistic concurrency

• Read after write • consistency ONE for normal reads

• quorum when the client knows it’s refreshing after write

44

Journal-based Consistency Model

• Journal / View Concerns • native support for change history

• no journal tombstones in steady state – write-once

• blob schema implementable on any db engine that

supports two-level keys (partition, composite)

• consistency model implementable on any db engine that

supports batches & quorum writes/reads

• view tombstones on every write, biggest concern

• leveled compaction?

• WISH: size-tiered compaction with data locality hoisting

45

Contents

• Introduction to FamilySearch Family Tree

• Outline of Cassandra reimplementation

• Journal-based Consistency Model

• Experience with Cassandra

46

Experience with Cassandra

• tested Community 1.2 and 2.0

• fantastic performance

• easy cloud setup

• great developer response

• easy to bulk load through CQL3

• harder to get running inside AWS VPC

47

Experience with Cassandra

• Bulk import experience • 8.4B change log records => 5.8B journal entries (2.5TB lzo)

• ‘hi1.4xlarge’ cluster (2x 1TB SSDs)

• import through CQL was fast enough

• 11h to import 5-node cluster (5h on 30-node cluster)

• 140k writes / sec, fed from 128 writer threads

• 20 records / unlogged batch write, 1-2k record size

• minimal post-import compaction (size-tiered)

• ended up with 3.5-4TB on C* disk after import

• OpsCenter – great visibility for tuning

• Community – harder to automate repairs, etc.

48

Experience with Cassandra

• Full-scale load test experience • got to 25x our peak hourly load on 25-28-node cluster

• production peak load included significant write load

• working-set size was about 2M persons in a month

• enabled row cache, ran almost entirely without disk access

• bottlenecked on interconnect socket w/ round robin client

• got 50% boost from token-aware, round robin client

• OpsCenter – great visibility for tuning

• Large SSD cluster – able to handle repair

during scale tests

49

Experience with Cassandra

current system

cassandra impl (1x, 10x, 20x)

50

Experience with Cassandra

current system

cassandra impl (1x, 10x, 20x)

LOG SCALE!

51

Current Status

• still working on implementation & rollout

• migration, reconciliation, integration…

• consistency model code separate

52

Contents

• Introduction to FamilySearch Family Tree

• Outline of Cassandra reimplementation

• Journal-based Consistency Model

• Experience with Cassandra

Questions?

53

Contact Info

John Sumsion

Sr. Software Engineer

sumsionjg@familysearch.org

@jdsumsion

Thanks to the team at FamilySearch! esp. Randy & James for doing the model

Thanks to the awesome presenters & organizers at #CassandraSummit!

top related