building a flexible, real-time big data applications platform on cassandra - kiji - cassandra meetup...

196
Building a Flexible, Real-time Big Data Applications Platform on Cassandra with Kiji Clint Kelly Member of Technical Staff WibiData Cassandra Meetup 23 April 2014

Upload: clint-kelly

Post on 03-Jul-2015

249 views

Category:

Software


0 download

DESCRIPTION

The Kiji Project is a modular, open-source framework that enables developers to efficiently build real-time Big Data applications. Kiji is built upon popular open-source technologies such as Cassandra, HBase, Hadoop, and Scalding, and contains components that implement functionality critical for Big Data applications, including the following: • Support for evolvable schemas of complex data types • Batch training of machine learning models with Hadoop • Real-time scoring with trained modelsIntegration with Hive and R • A REST endpoint Recently, we have updated Kiji to use Cassandra as a backing data store (previously, Kiji worked only with HBase). In this talk, we describe the process of integrating Cassandra and Kiji. Topics we cover include the following: • The Kiji architecture and data model • Implementing the Kiji data model in Cassandra using the Java driver and CQL3 • Integrating Cassandra with Hadoop 2.x • Building a flexible middleware platform that supports Cassandra and HBase (including projects that use both simultaneously) • Exposing unique features of Cassandra (e.g., variable consistency) to Kiji users

TRANSCRIPT

Page 1: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Building a Flexible, Real-time Big Data Applications Platform

on Cassandra with Kiji

Clint KellyMember of Technical StaffWibiData

Cassandra Meetup23 April 2014

Page 2: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Agenda

Page 3: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Agenda

The problem

Page 4: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Agenda

The problemHow Kiji works

Page 5: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Agenda

The problemHow Kiji works

Kiji on Cassandra

Page 6: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

!

Page 7: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 8: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 9: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 10: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 11: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 12: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 13: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 14: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

!

Page 15: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

!Open source

software

Page 16: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

!

Page 17: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

!

Page 18: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

!

Page 19: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

!

Page 20: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

!

Page 21: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

!

?

Page 22: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Data in

Page 23: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Data in

Page 24: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Data in

REST

Page 25: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Inspect

Page 26: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Inspect

Page 27: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Inspect

Page 28: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Inspect

Page 29: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Inspect

Page 30: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Train

Page 31: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Train

Page 32: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Train

“Trained model”

Page 33: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Train

“Trained model”

Page 34: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Train

“Trained model”

Page 35: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Train

“Trained model”

Page 36: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Train

“Trained model”

Page 37: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Model

Page 38: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Model

AaBb

Page 39: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Model

AaBb

Page 40: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Model

Page 41: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Model

Page 42: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Model

Page 43: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Apply

Page 44: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Apply

Page 45: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

ApplyAaBbAaBbAaBbAaBbAaBbAaBbAaBbAaBbAaBb

Page 46: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

ApplyAaBbAaBbAaBbAaBbAaBbAaBbAaBbAaBbAaBb

Page 47: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Apply

Batch

AaBbAaBbAaBbAaBbAaBbAaBbAaBbAaBbAaBb

Page 48: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Data out

Page 49: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Data out

Page 50: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Data out

REST

Page 51: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Data out

REST

Page 52: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 53: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 54: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 55: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 56: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 57: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

REST

Page 58: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

REST

Page 59: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

REST

Page 60: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 61: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 62: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 63: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 64: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 65: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014
Page 66: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

AaBb

Page 67: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

AaBb

Page 68: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

AaBb

Page 69: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Experiments / Deployment

Page 70: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Experiments / Deployment

Page 71: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Experiments / Deploymentc

d

c

d

Page 72: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Experiments / Deploymentc

d

c

d

Page 73: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

3

Page 74: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Data in / out

Page 75: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Data in / out(REST)

Page 76: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Inspect and train

Page 77: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Apply

Page 78: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Apply(real-time)

Page 79: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

!

?

Page 80: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

!!

Kiji

Page 81: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

How Kiji works

Page 82: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Kiji History

Page 83: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Kiji History

Page 84: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Kiji History

Page 85: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Kiji History

Page 86: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Kiji History

Page 87: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Kiji History

Page 88: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Kiji History

Page 89: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Kiji History

Page 90: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

In production now

Fortune 500 retailer : Personalized recommendations

Opower: Energy usage and analytics reporting

Page 91: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

How does it work?

Kiji

Page 92: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

How does it work?

Kiji

EngineeringData

Science

Page 93: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

How does it work?

Kiji

Data Science

Write

Engineering

Page 94: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

How does it work?

Kiji

Data Science

Write

Channels Engineering

Page 95: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

How does it work?

Kiji

Data Science

WriteLogs

DBs

EngineeringChannels

Page 96: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

How does it work?

Kiji

Data Science

WriteLogs

DBs

Kij

iMR

EngineeringChannels

Page 97: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

How does it work?

Kiji

Data Science

Write

Kij

iRE

ST

Stream

EngineeringChannels

Page 98: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

How does it work?

Kiji

Data Science

Write

Read

Kij

iRE

ST

Stream

EngineeringChannels

Page 99: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

How does it work?

KijiSchema(Cassandra)

Data Science

Write

Read

Kij

iRE

ST

Stream

EngineeringChannels

Page 100: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

EngineeringChannels

Page 101: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

C

C

C

EngineeringChannels

Page 102: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

C

C

C

EngineeringChannels

Page 103: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

C

C

C

EngineeringChannels

Page 104: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

C

C

C

EngineeringChannels

Page 105: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

Data

C

C

C

EngineeringChannels

Page 106: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

Data

C

C

C

EngineeringChannels

Page 107: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

Data

C

C

C

EngineeringChannels

Page 108: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

Data

C

C

C

EngineeringChannels

Page 109: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiMR

C

C

C

EngineeringChannels

Data

Page 110: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMR

C

C

C

EngineeringChannels

Data

Page 111: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMR

Scorer

C

C

C

EngineeringChannels

Data

Page 112: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMR

Scorer

C

C

C

EngineeringChannels

Data

Page 113: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMR

Scorer

C

C

C

R

EngineeringChannels

Data

Page 114: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMR

Scorer

C

C

C

EngineeringChannels

Data

Page 115: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMR

Scorer

C

C

C

EngineeringChannels

Data

Page 116: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMR

Scorer

C

C

C

R

R

R

EngineeringChannels

Data

Page 117: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMRK

ijiS

cori

ng

C

C

C

R

Kiji Model Repository

EngineeringChannels

Data

Scorer

Page 118: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMRK

ijiS

cori

ng

C

C

C

R

Kiji Model Repository

EngineeringChannels

Data

Scorer

Page 119: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMRK

ijiS

cori

ng

C

C

C

R

Kiji Model Repository

EngineeringChannels

Data

Scorer

Page 120: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMRK

ijiS

cori

ng

C

C

C

R

Kiji Model Repository

EngineeringChannels

Data

Scorer

Page 121: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMRK

ijiS

cori

ng

C

C

C

R

Kiji Model Repository

EngineeringChannels

Data

Scorer

Page 122: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMRK

ijiS

cori

ng

C

C

C

R

Kiji Model Repository

EngineeringChannels

Data

Scorer

Page 123: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMRK

ijiS

cori

ng

C

C

C

R

Kiji Model Repository

EngineeringChannels

Data

Scorer

Page 124: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMRK

ijiS

cori

ng

C

C

C

R

Kiji Model Repository

EngineeringChannels

Data

Scorer

R

Page 125: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMRK

ijiS

cori

ng

C

C

C

R

Kiji Model Repository

EngineeringChannels

Data

Scorer

R

R

Page 126: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMRK

ijiS

cori

ng

C

C

C

R

Kiji Model Repository

EngineeringChannels

Data

Scorer

R

R

R

Page 127: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMRK

ijiS

cori

ng

C

C

C

R

Kiji Model Repository

EngineeringChannels

Data

Scorer

R

R

R

c

d

c

d

Page 128: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

KijiSchema(Cassandra)

How does it work?Data

Science

Write

Read

Kij

iRE

ST

Stream

User 1

User 2

User 3

QueryKijiHive

KijiExpress

KijiMR

Kiji Model Repository

Kij

iSco

rin

g

Freshness Policy

C

C

C

R

EngineeringChannels

Data

Page 129: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

3

Page 130: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Data in / outKijiRESTKijiMR

Page 131: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Inspect and trainKijiHiveKijiMR

KijiExpress

Page 132: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Apply(real-time)

KijiModelRepositoryKijiScoring

Page 133: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Modular

Page 134: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Kiji on Cassandra

Page 135: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Kiji ~ BigTable

Page 136: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

table

Page 137: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

table

rowrowrowrowrowrowrowrowrowrowrowrow

Page 138: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

row

Page 139: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Row key = entity ID

entity ID data

Page 140: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Composite entity IDs

data0xfa “bob”

Page 141: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Column families

payment0xfa “bob” interactions recommendations

Page 142: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

inter:clicks

inter:search0xfa “bob” payment:

cardnumpayment:address

rec:scorer1

rec:scorer2

Columns

Page 143: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Timestamped versions

songs:let it be

inter:search0xfa “bob” songs:

let it besongs:let it besongs:

let it beinter:clicks

1396560123

payment:cardnum

payment:address

rec:scorer2

rec:scorer3rec:

scorer3rec:scorer3

rec:scorer1

1395650231

Page 144: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Complex data types

record Search { string search_term; long session_id; device_type device;}

songs:let it be

inter:search0xfa “bob” songs:

let it besongs:let it besongs:

let it beinter:clicks

1396560123

payment:cardnum

payment:address

rec:scorer2

rec:scorer3rec:

scorer3rec:scorer3

rec:scorer1

1395650231

Page 145: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Locality group

Page 146: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Locality group

Column families

Page 147: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Locality group

Page 148: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Locality group

Batch Batch Batch

Page 149: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Locality group

Batch Batch BatchReal-time

Real-time

Real-time

Page 150: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Locality group

Batch BatchReal-time

Real-time

Real-time

Batch

Page 151: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

locality_group_real_timelocality_group_batch

Locality group

Batch BatchReal-time

Real-time

Real-time

Batch

Page 152: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

locality_group_real_timelocality_group_batch

Locality group

Batch Batch

Real-time

Real-time

Real-time

Batch

Page 153: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

locality_group_real_timelocality_group_batch

Locality group

Batch Batch Real-time

Real-time

Real-timeBatch

Page 154: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

locality_group_real_timelocality_group_batch

Locality group

Batch Batch Real-time

Real-time

Real-timeBatch

On disk.Compressed.

Page 155: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

locality_group_real_timelocality_group_batch

Locality group

Batch Batch Real-time

Real-time

Real-timeBatch

On disk.Compressed. In memory.

Page 156: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Row ➔ transactional consistency

Page 157: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Locality group ➔ Column family

CREATE TABLE loc_grp

songs:let it be

inter:search0xfa “bob” songs:

let it besongs:let it besongs:

let it beinter:clicks

1396560123

payment:cardnum

payment:address

rec:scorer2

rec:scorer3rec:

scorer3rec:scorer3

rec:scorer1

1395650231

Page 158: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Entity ID ➔ Primary key

CREATE TABLE loc_grp (city text, user text,

PRIMARY KEY (city, user) )

WITH CLUSTERING ORDER BY (user ASC);

songs:let it be

inter:search0xfa “bob” songs:

let it besongs:let it besongs:

let it beinter:clicks

1396560123

payment:cardnum

payment:address

rec:scorer2

rec:scorer3rec:

scorer3rec:scorer3

rec:scorer1

1395650231

Page 159: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Family, Qualifier, Version ➔ Clustering Columns

CREATE TABLE loc_grp (city text, user text,

family text, qualifier text, version bigint,

PRIMARY KEY (city, user, family, qualifier, version) )

WITH CLUSTERING ORDER BY (user ASC, family ASC, qualifier ASC, version DESC);

songs:let it be

inter:search0xfa “bob” songs:

let it besongs:let it besongs:

let it beinter:clicks

1396560123

payment:cardnum

payment:address

rec:scorer2

rec:scorer3rec:

scorer3rec:scorer3

rec:scorer1

1395650231

Page 160: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Column values ➔ Blobs

CREATE TABLE loc_grp (city text, user text,

family text, qualifier text, version bigint, value blob,

PRIMARY KEY (city, user, family, qualifier, version) )

WITH CLUSTERING ORDER BY (user ASC, family ASC, qualifier ASC, version DESC);

songs:let it be

inter:search0xfa “bob” songs:

let it besongs:let it besongs:

let it beinter:clicks

1396560123

payment:cardnum

payment:address

rec:scorer2

rec:scorer3rec:

scorer3rec:scorer3

rec:scorer1

1395650231

Page 161: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

bob:pay:cardnum:t

AMEX1234...

bob:pay:addr:t5

1234 Main St, SF

bob:inter:clicks:t9

...

bob:inter:clicks:t7

...

bob:inter:clicks:t6

...

0xfa

Page 162: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Implementation notes

Page 163: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Implementation notes

DataStax Java driver

Page 164: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Implementation notes

DataStax Java driverCassandra 2.0.6

Page 165: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Implementation notes

DataStax Java driverCassandra 2.0.6

Async API

Page 166: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Implementation notes

DataStax Java driverCassandra 2.0.6

Async APINew MapReduce InputFormat

Page 167: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Issues

Page 168: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Operations across locality groups

Page 169: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Operations across locality groupsKiji locality group ➔ C* column family

Page 170: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Operations across locality groupsKiji locality group ➔ C* column family

Page 171: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Operations across locality groupsKiji locality group ➔ C* column family

Read across locality groups

Page 172: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Operations across locality groupsKiji locality group ➔ C* column family

Read across locality groups➔ multiple C* reads (async API!)

Page 173: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Operations across locality groupsKiji locality group ➔ C* column family

Read across locality groups➔ multiple C* reads (async API!)

Page 174: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Operations across locality groupsKiji locality group ➔ C* column family

Read across locality groups➔ multiple C* reads (async API!)

Compare-and-set across locality groups

Page 175: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Operations across locality groupsKiji locality group ➔ C* column family

Read across locality groups➔ multiple C* reads (async API!)

Compare-and-set across locality groups➔ not allowed in C* Kiji

Page 176: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Operations across locality groupsKiji locality group ➔ C* column family

Read across locality groups➔ multiple C* reads (async API!)

Compare-and-set across locality groups➔ not allowed in C* Kiji

Page 177: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Operations across locality groupsKiji locality group ➔ C* column family

Read across locality groups➔ multiple C* reads (async API!)

Compare-and-set across locality groups➔ not allowed in C* Kiji

Lose transactional consistency

Page 178: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Filters

HBase ➔ Rich server-side filtersCassandra ➔ WHERE clauses

Page 179: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Filters

HBase ➔ Rich server-side filtersCassandra ➔ WHERE clauses

Client-side filtering

Page 180: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Entity IDs with unhashed components

Page 181: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

EntityId(state, city, username)

Page 182: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

EntityId(state, city, username)

hashed

Page 183: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

EntityId(state, city, username)

hashed unhashed

Page 184: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

EntityId(state, city, username)

hashed unhashed

0x235af-alice

0x235af-bob

0x235af-cathy

0x235af-dave

0x38e0a-andy

0x38e0a-jane

0x38e0a-lucy

0x38e0a-nancy

HBase

Page 185: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

EntityId(state, city, username)

hashed unhashed

0x235af-alice

0x235af-bob

0x235af-cathy

0x235af-dave

0x38e0a-andy

0x38e0a-jane

0x38e0a-lucy

0x38e0a-nancy

HBase0x235af | alice | bob | cathy | dave

0x38e0a | andy | jane | lucy | nancy

Cassandra

Page 186: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

EntityId(state, city, username)

hashed unhashed

0x235af-alice

0x235af-bob

0x235af-cathy

0x235af-dave

0x38e0a-andy

0x38e0a-jane

0x38e0a-lucy

0x38e0a-nancy

HBase0x235af | alice | bob | cathy | dave

0x38e0a | andy | jane | lucy | nancy

Cassandra

Limited to width of C* wide row!

Page 187: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Project status

Page 189: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Next quarterCassandra in all Kiji components

Run MapReduce jobs with KijiExpressExpose Cassandra-specific features

Page 190: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

3

Page 191: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Data in / outKijiRESTKijiMR

Page 192: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Inspect and trainKijiHiveKijiMR

KijiExpress

Page 193: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Apply(real-time)

KijiModelRepositoryKijiScoring

Page 194: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014

Thanks to Cassandra community

Mailing listsMeetups, webinars, conferences

Page 196: Building a flexible, real-time Big Data Applications platform on Cassandra - Kiji - Cassandra Meetup 23 April 2014