no sql

38
RDBMS ? Prateek Jain 12-Jul-2012

Upload: prateek-jain

Post on 14-Jun-2015

267 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: No sql

RDBMS ?

Prateek Jain

12-Jul-2012

Page 2: No sql

Solutions…

Page 3: No sql

Web server Web server

App server App server

Web server

Cache server

RDBMS

CMS Data Feeds

Common Architecture

Page 4: No sql

SQL - Story till now…

Stable environment. No more discussions on Data stores. Easy to train and employ people. SQL running effectively at core.

Page 5: No sql

SQL - Story till now…

For dealing with lists (as tables) it’s a great language,dynamic and relatively fast

• Sure it has a few problems but give me a language that doesn’t

Page 6: No sql

What Next…?

We need to fast, scale and be part of web

Page 7: No sql

ORM - OMG!

The effort of trying to convert something inherently hierarchical into something relational

Probably the biggest waste of programming time, lines of code and source of bugs and latency is ORM

Page 8: No sql

Challenges

Data grows exponentially. Data is unstructured. Data is huge and spread across 100’s/1000’s

of nodes. SQL is useful - when things are flat

Page 9: No sql

Lots of data

In the banking world we have a lot of data Today 50-100,000 quotes a second isn’t

unusual It gets more complex...

• 10,000 portfolios, each with 1,000 buy/sell orders at specific prices

• We now have 100,000 prices coming in every second and 10 million orders to watch

Page 10: No sql

Time is critical

In the world of trading only the first one gets the deal, there is no second place.

While being first to have the order is what makes the money banks now have a “new” problem

“RISK”

Page 11: No sql

Lots of data, lots of calculations

There are two main flavors of distributed computing• Data• Computation

Often they are closely related but not always. To achieve either we usually need lots of memory and CPUs We don’t stack them or put them in clusters these days, we

distribute them

Page 12: No sql

Why not RDBMS?

Not designed to scale out. Strongly ACID complaint. Slower running queries (specially in joins). Schema based. Not suited for changing data structure.

Page 13: No sql
Page 14: No sql

CAP Theorem

C – consistency A – availability P – partition tolerance

** You must make trade-offs and sacrifice at least one in favor of the other two.

Page 15: No sql

NoSql

Not Just Sql

Page 16: No sql

Categories

Document BasedDocument Based

Column BasedColumn Based

Key/Value BasedKey/Value Based

Graph BasedGraph Based

Data Structure BasedData Structure Based

Page 17: No sql

Example ProductsExample Products

Page 18: No sql

Eventual Consistency

Page 19: No sql

Given a sufficiently long period of time, over which no updates are sent, one can expect that all updates will, eventually, propagate through the system and all the replicas will be consistent.

In the presence of continuing updates, an accepted update eventually either reaches a replica or the replica retires from service.

Eventual Consistency

Page 20: No sql

Scalability

Page 21: No sql

Scalability

Scalability is the ability of a system to increase throughput with addition of resources to address load increases.

Scalability can be achieved by:– Provisioning a large and powerful resource to meet the additional

demands. – It can be achieved by relying on a cluster of ordinary machines to

work as a unit.

Page 22: No sql

How to choose ?

Scalability Transactional integrity and consistency Data modeling Query support Access and interface availability

Page 23: No sql

Scalability

column-family-centric NoSQL databases are a good choice if extreme scalability is a requirement.

Not well suited for real-time transaction processing. (RDBMS is best)

Eventually consistent NoSQL options, like Cassandra or Riak, may be workable.

Page 24: No sql

Transactional Integrity and Consistency

Batch-centric analytics on warehoused data is also not subject to transactional requirements.

Data sets that are written once for e.g., web traffic log files, social networking status updates, advt. click-through imprints, road-traffic data, stock market tick data, game scores etc.

Page 25: No sql

Transactional Integrity and Consistency

If range operations are common and integrity of updates is required, an RDBMS is the best choice.

If atomicity at an individual item level is sufficient, then column-family databases, document databases.

Page 26: No sql

Data Modeling

RDBMS offers a consistent way of modeling data. Relational algebra underlies the data model.

In the NoSQL world there is no such standardized and well-defined data model.

Page 27: No sql

Data Modeling

If relaxed schema is your primary reason for using NoSQL, then MongoDB is a great option for getting started with NoSQL.

MongoDB is used by many web-centric businesses.

Page 28: No sql

Querying Support

An RDBMS thrives on SQL support, which makes accessing and querying data easy.

Among document databases, MongoDB provides the best querying capabilities.

For key/value pairs and in-memory stores, nothing is more feature-rich than Redis as far as querying capabilities go.

Page 29: No sql

Querying Support

Column-family stores like HBase have little to offer as far as rich querying capabilities go.

Project called Hive makes it possible to query HBase using SQL-like syntax and semantics.

Page 30: No sql

Access and Interface Availability

MongoDB has the notion of drivers. CouchDB always has the RESTful HTTP

interface available. Redis, Membase, Riak, HBase, Hypertable,

Cassandra, and Voldemort have support for language bindings to connect from most mainstream languages.

Page 31: No sql

Performance

Page 32: No sql

50/50 Read and Update

Results showthat under this test case Apache Cassandra outperforms the competition on both read and update latencies.

HBase comes close but stays behind Cassandra.

Page 33: No sql

95/5 Read and Update

The sorted ordered column-family stores perform best for contiguous range reads.

HBase seems to deliver consistent performance for reads, irrespective of the number of operations per second.

MySQL delivers the best performance for read-only cases.

Page 34: No sql

Future?

Coexistence

Page 35: No sql

Future

Getting ready for polyglot persistence. Understanding the database technologies

suitable for immutable data sets. Choosing the right database to facilitate ease

of application development.

Page 36: No sql

Examples

Linked In uses Hadoop for many large-scale analytics jobs like probabilistically predicting people you may know.

Facebook (mysql + HBase, cassandra, ZooKeeper) Twitter (mysql + Cassandra + FlockDB)

Page 37: No sql

Questions?

Page 38: No sql

Feedback

[email protected]