Download - No sql (not only sql)
1980
1990
2000
2010
Rise of Relational DatabasePros: Persistent, Concurrency Cons: Impedance Mismatch Problem
Rise of Object Database
Dominance of Relational DatabaseCons: Data needs increased, distributed database started, SQL not designed for DDBMS
Google BigTable
Amazon Dynamo
NoSQL is a term for a loosely defined class of non-relational
data stores that breaks the long history of relational databases
and ACID guarantees.
Data stores that fall under this term may not require fixed
table schemas, and usually avoid join operations.
The term was first popularised in early 2009.
Three properties of a system: consistency, availability and partitions
We can have at most two of these three properties for any shared-data system.
Consistency-all clients see
current data regardless of updates or
deletes
Availability-the system
continues to operate as expected
even with node failures
Partition Tolerance-the system continues to operate as
expected despite network or
message failure
CA
CP AP
A consistency model determines rules for visibility and apparent order of updates.
For example:
Row X is replicated on nodes M and N
Client A writes row X to node N
Some period of time t elapses.
Client B reads row X from node M
Does client B see the write from client A?
Consistency is a continuum with tradeoffs
For NoSQL, the answer would be: maybe
CAP Theorem states: Strict Consistency can't be achieved at the same time as availability and
partition-tolerance.
X X
M N
A WRITES B READS
X*
X or X*?
When no updates occur for a long period of time, eventually all
updates will propagate through the system and all the nodes will be
consistent
Known as BASE (Basically Available, Soft state, Eventual consistency),
as opposed to ACID
* Basically Available - system seems to work all the time
* Soft State - it doesn't have to be consistent all the time
* Eventually Consistent - becomes consistent at some later time
123
564
789
Databases
Pros:very fastvery scalablesimple modelable to distribute horizontally
Cons: - many data structures (objects) can't be easily modeled as key value pairs
Document Data Model:
-Each document is a complex structure-Represented in XML,Jason-Query into the document structure to retrieve portions of the database
metadata
key
Cheap, easy to implement (open source)
Data are replicated to multiple nodes (therefore identical and fault-tolerant)
and can be partitioned
◦ Down nodes easily replaced
◦ No single point of failure
Easy to distribute
Don't require a schema
Can scale up and down
Relax the data consistency requirement (CAP)
What we are giving up…
• joins• group by• order by• ACID transactions• SQL as a sometimes frustrating but still powerful query
language• Easy integration with other applications that support
SQL
Originally developed at Facebook
It is a distributed, extreme scalable,
fault tolerant post-relational database solution
Data Model : column-oriented
Uses the Dynamo Eventual Consistency model
Written in Java
Open-sourced and exists within the Apache family
Uses Apache Thrift as it’s API
Cassendra was designed with the understanding that
system/hardware failures can and do occur.
Peer-to-peer ,distributed system
All nodes are the same
Read/Write-anywhere design
Data center 1
Data center 2
The coordinator sends the write
to all replicas that own the row
being written.
As long as all replica nodes are
up and available, they will get
the write regardless of
the consistency level (Tunable)
specified by the client. (LOCAL_QUORUM)
Multiple Data Center Write Requests
There are two types of read requests :
1) direct read request
2) background read repair request.
The number of replicas contacted by a direct read request is determined by
the consistency level specified by the client.
Background read repair requests are sent to any additional replicas that did
not receive a direct request.
Read repair requests ensure that the requested row is made consistent on
all replicas.
The coordinator first contacts the replicas specified by the consistency
level.
If multiple nodes are contacted, the rows from each replica are compared
for consistency in memory.
If replicas are inconsistent, the following events occur:
◦ The coordinator uses the replica that has the most
recent data (based on the timestamp) to forward
the result back to the client.
◦ In the background, the coordinator
compare the data from all the
remaining replicas that own
the row.
Created at Facebook along with Cassandra
Is a cross-language, service-generation framework
Binary Protocol (like Google Protocol Buffers)
Compiles to: C++, Java, PHP, Ruby, Erlang, Perl, ...
Relational (SQL)
◦ SELECT `column` FROM `database`,`table` WHERE `id` = key;
Cassandra (standard) (CQL)
◦ keyspace.getSlice(key, “column_family”, "column")