no sql distilled-distilled

21
NoSQL Distilled (Distilled) demystifying that which should have never entered “mystified” status rICh Morrow, quicloud LLC

Upload: rich-morrow

Post on 14-Nov-2014

541 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: No sql distilled-distilled

NoSQL Distilled (Distilled)demystifying that which should have never entered “mystified” status

rICh Morrow, quicloud LLC

Page 2: No sql distilled-distilled

NoSQL Distilled

This talk is essentially the first couple chapters of “NoSQL Distilled” (Sadalage, Fowler)

Highly recommend this book!

Page 3: No sql distilled-distilled

2 reasons to like NoSQL

App development productivity Fixes “impedance mismatch”

Large scale Happily handles the “three Vs” of “big

data”▪ Volume▪ Velocity▪ Variety

Page 4: No sql distilled-distilled

A brief history of Storage

You’ve always needed a “backing store” …could be files

great for a single user or application …could be databases

great for multiple users/applications …and on the DB side, could be:

Application Database (used by single app)

Integration Database (used by several apps)

Page 5: No sql distilled-distilled

Multi-user adds complexity

Concurrency Simple problem, very tough to solve

Application Datastores One app, many users

Integration Datastores One set of data, many apps, lots of

potential for headbanging

Page 6: No sql distilled-distilled

Impedance mismatch?

{ “id”: “1001”, "firstName": ”Ann", "lastName": "Williams", "age": 55,“purchasedItems”: { 0321290533 {qty, price… } 0321601912 {qty, price… } 0131495054 {qty, price… } }“paymentDetails”: { cc info… }"address": { "street": "1234 Park", "city": "San Francisco", "state": "CA", "zip": "94102" }} 1 object = 10, 20, 100? Tables. Ugh…

Your code has one structure, but your RDBMS stores in another…

Page 7: No sql distilled-distilled

The long reign of RDBMS

A great "all purpose" storage + query toolACID compliant

Supports many users Supports many apps

3NF stores data efficiently Disk wasn't always cheap

Fast and tunable Introduced a common interface (SQL)

Which every vendor quickly then “broke”

Page 8: No sql distilled-distilled

...but RDBMS != all unicorns and rainbows

Impedance mismatch Many teams build (then have to maintain)

custom ORM or SOA proxies Weren't build to be distributed

Google, Amazon, et al hit hard walls on RDBMS capabilities

Often required expensive, proprietary hardware

Ooops, I sharded myself! Additional complexity Cross shard joins now extremely expensive

Page 9: No sql distilled-distilled

“Web Scale” brings on the three Vs

Velocity Faster responses required

Volume 100s of TB, PB now common “Web Scale” can mean 100s of thousands

of concurrent transactions Both of those increasing rapidly

Variety Mixed structure, semi-structured,

unstructured

Page 10: No sql distilled-distilled

Google and Amazon drive the space

Bigtable paper (by Google) Heavily influenced the “Columnar” branch of NoSQL

Dynamo paper (by Amazon) Heavily influenced the “Ke Value” branch of NoSQL This is NOT DynamoDB!!!

Design considerations: Distributed from the start Clusters of inexpensive commodity hardware are

cheaper & more fault tolerant at scale Relaxed and/or tunable C&A (from CAP theorem) Deal with unheard of volume & velocity Schemaless (bye bye impedance mismatch)

Page 11: No sql distilled-distilled

CAP (Theorem)?

Consistency How consistent the data looks to 2 or more

viewers “Eventual” consistency possible (and

common)! Availability

Responsiveness of the system Partition Tolerance

How well does the system respond to partition failures?

This is normally “untunable”, unlike the C&A

Page 12: No sql distilled-distilled

NoSQL is Born

Because “Cloud” and “Big Data” were just not confusing enough people in IT

"Not ONLY SQL" - incredibly unfortunate "little o"

Name born out of a Bay Area meetup in 2009 …and regretted / derided ever since

Page 13: No sql distilled-distilled

“Polyglot Persistence”

Fancy term for “multiple datastores” ...you're already doing it

Browser side cache Memcache Query cache OLAP systems ...just add NoSQL

Tell your RDBMS not to worry – it will (probably) still live a long, happy life

Page 14: No sql distilled-distilled

"NoSQL” Datastores share

Generally Open Source Schemaless

Easily change schema or do 'schema on read' Cluster-oriented

With the exception of Graph DBs Generally favor "Web Scale" over ACID Generally better for APPLICATION

Databases Aggregate data models

Let you treat a group of data as a unit Again, graph DBs are an exception here…

Page 15: No sql distilled-distilled

The 4 Flavors of NoSQL

Key Value Fast lookup on a single “hashed” key

Document Each “Document” self-defines it’s own structure

Columnar (or Column-Family) Great for “sparse” data (millions of columns)

Graph [bit of a black sheep in the NoSQL family] Specialized to crawl graph relations like social

networks, resource flows, etc Less popular at the moment, but gaining steam fast

Page 16: No sql distilled-distilled

Key Value

Can only look up by (normally a single) Key

Extremely fast for that key Value can be anything

Example: DynamoDB, Riak

Page 17: No sql distilled-distilled

Document

Document can contain anything json extremely popular But can also be XML, CSV, semi-structured,

unstructured, custom… literally anything Can query on aggregates inside of document Can even index on aggregates Can retrieve part of the document Extremely memory intensive

Example: MongoDB, CouchDB

Page 18: No sql distilled-distilled

Column (or Columnar Family) Great for “sparse” data (populated

columns vary greatly between rows) Group columns into families Think of it as a “two level” aggregate

First level “key” is rowID or aggregate of interest

2nd level values are the columns You can visualize the data as row or

column-oriented

Example: Hbase, Cassandra

Page 19: No sql distilled-distilled

Graph Databases

Built to efficiently crawl & search graph trees Social Networks Resource flows “people of interest”

Don’t run well on clusters

Example: Neo4J (and not much else right now)

Page 20: No sql distilled-distilled

Takeaways

RDBMS were not designed with many of today’s problems in mind

NoSQL DBs were built from the ground up to deal with these “Three V” issues

NoSQL can either replace or (more commonly) supplement existing RDBMS functions Move hot tables out to DynamoDB Write a greenfield app from ground up with only a

NoSQL datastore Consistency & Availability are often tunable Many flavors exist & each have their own best use

cases Research heavily before deciding upon a platform

Page 21: No sql distilled-distilled

Questions? Answers?

Thanks!