real world nosql (by chris yuen)

29
Real World NoSQL x Big Data

Upload: orcsab

Post on 27-Jan-2015

111 views

Category:

Technology


0 download

DESCRIPTION

The Hong Kong Big Data community had a guest speaker at our Tuesday, 18 February meeting. Chris Yuen from Demyst Data discussed his experience with three NoSQL solutions: Cassandra, MongoDB, and HBase. For more information see http://www.infoincog.com/hong-kong-big-data-meeting-tuesday-18-february/.

TRANSCRIPT

Page 1: Real World NoSQL (by Chris Yuen)

Real World NoSQL x Big Data

Page 2: Real World NoSQL (by Chris Yuen)
Page 3: Real World NoSQL (by Chris Yuen)

OverviewIntroduction

Motivation for NoSQLThe NoSQL landscape

Experience sharingHBaseMongoDBCassandra

Tying it up – how does it really matter

Page 4: Real World NoSQL (by Chris Yuen)

MotivationToo much data – the need to “scale out”

CAP theorem

Page 5: Real World NoSQL (by Chris Yuen)
Page 6: Real World NoSQL (by Chris Yuen)

MotivationToo much data – the need to “scale out”

CAP theorem

PerformanceRDMBS joining is slowDenormalization

Key value data store

Alternative data representationSchemaless “No SQL”

Page 7: Real World NoSQL (by Chris Yuen)
Page 8: Real World NoSQL (by Chris Yuen)

MotivationToo much data – the need to “scale out”

CAP theorem

PerformanceRDMBS joining is slowDenormalization

Key value data store

Alternative data representationSchemaless “No SQL”

Document data store

Page 9: Real World NoSQL (by Chris Yuen)

HBaseBuilds on top of HDFS

Consistent “big-data” database

Automatically scales out

Page 10: Real World NoSQL (by Chris Yuen)

HBase… but we didn’t use it in the end

Page 11: Real World NoSQL (by Chris Yuen)

HBaseA nightmare to set up and maintain

Depends on Hadoop, HDFS, Zookeeper

Page 12: Real World NoSQL (by Chris Yuen)
Page 13: Real World NoSQL (by Chris Yuen)

HBaseA nightmare to set up and maintain

Depends on Hadoop, HDFS, Zookeeper

No secondary index

“Table” alteration requires downtime

Not spectacular latency for OLTP usage

Page 14: Real World NoSQL (by Chris Yuen)

MongoDBDe-facto “big-data” “NoSQL” database

Document based data representation

Page 15: Real World NoSQL (by Chris Yuen)

MongoDBDe-facto “big-data” “NoSQL” database

Document based data representation

Page 16: Real World NoSQL (by Chris Yuen)

MongoDBA good balance of “traditional” usage and

“NoSQL” usageSupports secondary indexRange query

Can do table scan

Page 17: Real World NoSQL (by Chris Yuen)

MongoDB“Big-data” features: sharding, replica set

Page 18: Real World NoSQL (by Chris Yuen)
Page 19: Real World NoSQL (by Chris Yuen)

MongoDB… but it got ugly pretty fast

Devil’s in the detailsReplica set management fiascoSharding is difficult to set up and poorly

implementedhttps://github.com/kizzx2/mongolab

Page 20: Real World NoSQL (by Chris Yuen)

MongoDB

Page 21: Real World NoSQL (by Chris Yuen)

MongoDBReality – it doesn’t scale beyond one machine

Replica set

Page 22: Real World NoSQL (by Chris Yuen)

CassandraColumn Family data store

Page 23: Real World NoSQL (by Chris Yuen)

CassandraColumn Family data store

Page 24: Real World NoSQL (by Chris Yuen)

CassandraColumn Family data store

More “NoSQL” than MongoDB. Less features

Column data store – strictly key/value query

Page 25: Real World NoSQL (by Chris Yuen)

CassandraAuto-sharding just works

Replica set requires 0 configuration

Append only, LSM-tree based storage formatGood for SSDHigh insert throughput

For storing analytic data

Page 26: Real World NoSQL (by Chris Yuen)

CassandraHas rudimentary support for secondary index

Difficult to do table scan or range scan

Require substantial application / paradigm shift

Page 27: Real World NoSQL (by Chris Yuen)

Real World ImplicationsWhy does NoSQL matter to Big Data?

Schemaless storage modelPerformanceScalability

Rapidly incorporate unstructured new data sources without extensive planning

Page 28: Real World NoSQL (by Chris Yuen)

How to ChooseMaintenance / Scalability

Supported operations

OLAP vs. OLTP

Page 29: Real World NoSQL (by Chris Yuen)

Thank YouChris Yuen

http://cfc.kizzx2.com

http://github.com/kizzx2

@kizzx2

[email protected]