Download - 9. Beyond Traditional RDBMS
![Page 1: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/1.jpg)
1
9. Beyond Traditional RDBMS The Big Data Era NoSQL (Not Only SQL) Databases New SQL Databases
![Page 2: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/2.jpg)
2
8. Beyond Traditional RDBMS The Big Data Era NoSQL (Not Only SQL) Databases New SQL Databases
![Page 3: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/3.jpg)
3
History of Data Management Business processing Relational Database
Management Systems (RDBMS) Oracle, IBM, Sybase
Internet blooming low-cost RDBMS alternatives MySQL, PostgreSQL
Today’s big data DBMS
![Page 4: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/4.jpg)
4
Some FiguresFacebook 有 8.45 亿活跃用户,每个用户平均 130 位朋友• 平均日手机访问人数为 2 亿次,平均没人每月访问 40 次,平均每次访问时间 23 分 20秒;• 需处理 400 亿的来自用户的照片,每月存储 1 亿 3 千 5 百万条以上的消息, 每天共处理
10TB
Twitter 有 1.27 亿活跃用户, 13% 的网络用户使用 twitter• 36% 的用户每天要发推文 , 平均每次访问时长 11 分 50 秒• 每天需处理 7 Terabytes 的数据
新浪微博,注册用户 3 亿• 日活跃用户比例 9% ( 2700 万)• 每日发博量超过 1 亿条
人人网, 注册用户近 1.37 亿• 月活跃用户 3700 万• 平均每年所有用户共发布日志 4.5 亿篇,更换 4.5 亿次头像, 120 亿张照片, 150 亿条状态
![Page 5: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/5.jpg)
5
Big Data Era Companies leveraging big data in their decision
making are over two times more likely to “substantially outperform their industry peers,” and see a x1.6 increase in revenue growth, doubled profit increase, and an even greater rise in stock appreciation (from IBM).
US "Big Data Research and Development Initiative" will see at least six government agencies making $200 million in additional investments to "greatly improve the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data."
![Page 6: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/6.jpg)
6
Characteristics of BigData
Big Data
大数据
Volume数据海量
Variety多类型
Uncertainty非确定
Velocity流速
Scale from terabytes topetabytesStreaming data and large
volume data movement
Manage the complexity ofmultiple relational and non-relational data types and schemas
Data redundancy, missing, error, complementary
![Page 7: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/7.jpg)
7
Driven Factors - Dataset Explosion of social media sites with large
(un)structured data in terrabyte/petabyte needs
Unstructured data is pervasive (over 80% of the world data) appears in many forms
– emails, Web pages, reports, research paper repositories, memos, enterprise records, etc.
exists mostly in electronic documents and understood as self-contained content items
![Page 8: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/8.jpg)
8
Driven Factors – Dataset (cont.)
Structured, semi-structured, and unstructured data are often mixed together Personal homepage, Web, science report, discussion
forums, trading, etc. These datasets have high read/write rates. Just as moving to dynamically-typed languages
(Ruby/Groovy), a shift to dynamically-typed data with frequent schema changes.
Google, Facebook, Twitter began to explore alternative ways to store data in 2008/2009.
![Page 9: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/9.jpg)
9
Driven Factors – Cloud Computing Rising of cloud-based computing environment
Amazon S3 (simple storage solution)
Open-source community provides a low-cost entry point to “kick the tires”
Ready to look at alternative storage solutions other that relational.
![Page 10: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/10.jpg)
10
Challenge 1: Scaling Up Datasets are just too big Hundreds of thousands of visitors in a short-time
span a massive increase in traffic Developers begin to front RDBMS with a read-only cache
to offload a considerable amount of the read traffic Memcache or integrate other caching mechanisms within
the application (i.e., Ehcache)– In-memory indexes, distributing and replicating objects over
multiple nodes As datasets grow, the simple memcache/MySQL model
(for lower-cost startups) started to become problematic.
![Page 11: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/11.jpg)
11
Possible Solutions to Scalability RDBMS were not designed to be distributed Began to look at multi-node database solutions
Distributed Database Systems – Basic principles and implementation techniques have been
covered in the course More techniques
– To be covered by the next few slides
![Page 12: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/12.jpg)
12
Scaling RDBMS – Master/Slave Master-Slave
All writes are written to the master. All reads performed against the replicated slave databases
Critical reads may be incorrect as writes may not have been propagated down
Large data sets can pose problems as master needs to duplicate data to slaves
Multi-Master replication
![Page 13: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/13.jpg)
13
Scaling RDBMS - Partitioning Partition or sharding
Scales well for both reads and writes Not transparent, application needs to be partition-
aware (in contrast to DDB) Can no longer have relationships/joins across
partitions Loss of referential integrity across shards
![Page 14: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/14.jpg)
14
Challenge 2: Availability A web-site is most likely to be unavailable when it is
most needed a huge volume of revenue loss
Goal of web services today is to be as available as long as the network is on. When some nodes crash or some communication links
fail, the service still performs as expected One desirable fault tolerance capability is to
survive a network partitioning into multiple parts. – Distributed DBMSs (covered in the course) provides no solutions
yet …
![Page 15: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/15.jpg)
15
Beyond Traditional RDBMS “… the whole point of seeking alternatives is that
you need to solve a problem that relational databases are a bad fit for …”
- Eric Evans Class of non-relational data storage systems
Usually do not require a fixed table schema nor do they use the concept of joins
Relax one or more of the ACID properties Brewer’s CAP theorem
![Page 16: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/16.jpg)
16
Other Ways to Scale DBMS NoSQL (Not Only SQL) Databases
designed to meet the scalability requirements of distributed architectures, and/or schemaless data management requirements
NewSQL Databases designed to meet the requirements of distributed
architectures or to improve performance such that horizontal scalability is supported.
![Page 17: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/17.jpg)
17
NoSQL, NewSQL, and Beyond (by 451 Group)
“NoSQL, NewSQL and Beyond: The answer to SPRAINed relational databases”
![Page 18: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/18.jpg)
18
8. Beyond Traditional RDBMS The Big Data Era NoSQL (Not Only SQL) Databases New SQL Databases
![Page 19: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/19.jpg)
19
NoSQL (Not Only SQL) INSERT only, no UPDATE/DELETE No JOIN, thereby reducing query time
This involves de-normalizing data Lack of SQL support Non-adherence to ACID (Atomicity, Consistency,
Isolation and Durability) properties
![Page 20: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/20.jpg)
20
Three Seeds of NoSQL BigTable (Google) Dynamo (Amazon)
Distributed key-value data store Gossip protocol (discovery and error detection)
CAP Theorem (Eric A. Brewer) BASE vs ACID
![Page 21: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/21.jpg)
21
The Perfect Storm Large datasets, acceptance of alternatives, and
dynamically-typed data has come together in a perfect storm;
Not a backlash/rebellion against RDBMS; SQL is a rich query language that cannot be
rivaled by the current list of NoSQL (Not Only SQL) offerings.
![Page 22: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/22.jpg)
22
Google’s BigTable A distributed storage system for managing
structured data. Designed to scale to a very large size
Petabytes of data across thousands of servers Used for many Google projects
Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, …
Flexible, high-performance solution for all of Google’s products
![Page 23: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/23.jpg)
23
Motivation for BigTable Lots of (semi-)structured data at Google
URLs:– Contents, crawl metadata, links, anchors, pagerank, …
Per-user data:– User preference settings, recent queries/search results, …
Geographic locations:– Physical entities (shops, restaurants, etc.), roads, satellite image
data, user annotations, … Scale is large
Billions of URLs, many versions per page Hundreds of millions of users, thousands of queries per sec 100TB+ of satellite image data
![Page 24: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/24.jpg)
24
Why Not Just Use Commercial DB? Scale is too large for most commercial databases Even if it weren’t, cost would be very high Low-level storage optimizations help performance
significantly Building internally means system can be applied across
many projects for low incremental cost Much harder to do when running on top of a database
layer
![Page 25: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/25.jpg)
25
Google’s Goals Want asynchronous processes to be continuously
updating different pieces of data Want access to most current data at any time
Need to support: Very high read/write rates (millions of ops per second) Efficient scans over all or interesting subsets of data Efficient joins of large one-to-one and one-to-many
datasets Often want to examine data changes over time
E.g. Contents of a web page over multiple crawls
![Page 26: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/26.jpg)
26
Basic Data Model - BigTable A sparse, distributed, persistent, multi-dimensional
sorted map(row, column, timestamp) -> cell contents
Good match for most Google applications
![Page 27: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/27.jpg)
27
WebTable Example
Want to keep a copy of a large collection of web pages and related information
Use URLs as row keys Various aspects of a web page as column names Store contents of a web page in the contents: column
under the timestamps when they were fetched.
![Page 28: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/28.jpg)
28
Rows
Name is an arbitrary string Access to data in a row is atomic Row creation is implicit upon storing data
Rows ordered lexicographically Rows close together lexicographically usually on one
or a small number of machines
![Page 29: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/29.jpg)
29
Rows (cont.) Reads of short row ranges are efficient and
typically require communication with a small number of machines.
Can exploit this property by selecting row keys so they get good locality for data access.
Example: math.gatech.edu, math.uga.edu, phys.gatech.edu, phys.uga.edu VS edu.gatech.math, edu.gatech.phys, edu.uga.math, edu.uga.phys
![Page 30: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/30.jpg)
30
Column Families
Column keys are grouped into sets called column families.
A column family must be created before data can be stored in a column key.
Hundreds of static column families. Syntax is family:key
e.g., anchor: cnnsi.com , anchor: my.look.ca; language:English, language:German, etc.
![Page 31: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/31.jpg)
31
Timestamps
Used to store different versions of data in a cell New writes default to current time, but timestamps for
writes can also be set explicitly by clients Items in a cell are stored in decreasing
timestamp order. Application specifies how many versions (n) of
data items are maintained in a cell. Bigtable garbage collects obsolete versions.
![Page 32: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/32.jpg)
32
Advantages of BigTable Distributed multi-level map Fault-tolerant, persistent Scalable
Thousands of servers Terabytes of in-memory data Petabyte of disk-based data Millions of reads/writes per second, efficient scans
Self-managing Servers can be added/removed dynamically Servers adjust to load imbalance
![Page 33: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/33.jpg)
33
BigTables in Google’s Applications
“Every day more than 3,000 businesses sign up for Google Apps and move to the cloud”
![Page 34: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/34.jpg)
34
Application 1: Google Analytics
Enable webmasters to analyze traffic patterns at their web sites. Statistics such as: Number of unique visitors per day and the page views per
URL per day Percentage of users that made a purchase given that they
earlier viewed a specific page. How?
A small JavaScript program that the webmaster embeds in their web pages.
Every time the page is visited, the program is executed. Program records information about each request:
– user identifier and the pages being fetched
![Page 35: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/35.jpg)
35
Application 1: Google Analytics (cont.) Raw-Click BigTable (~ 200 TB)
A row for each end-user session. Row name includes website’s name and the time at which the
session was created. Clustering of sessions that visit the same web site in a sorted
chronological order. Compression factor: 6-7.
Summary BigTable (~ 20 TB) Stores predefined summaries for each web site. Generated from the raw click table by periodically scheduled
MapReduce jobs. Each MapReduce job extracts recent session data from the raw click
table. Row name includes website’s name and the column family is the
aggregate summaries. Compression factor is 2-3.
![Page 36: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/36.jpg)
36
Application 2: Google Earth & Maps Functionality: Pan, view, and annotate satellite
imagery at different resolution levels. One BigTable stores raw imagery (~ 70 TB):
Row name is a geographic segments. Names are chosen to ensure adjacent geographic segments are clustered together.
Column family maintains sources of data for each segment.
There are different sets of tables for serving client data (e.g., index table).
![Page 37: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/37.jpg)
37
Application 3: Personalized Search
Records user queries and clicks across Google properties.
Users browse their search histories and request for personalized search results based on their historical usage patterns.
![Page 38: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/38.jpg)
38
Application 3: Personalized Search (cont.) One BigTable:
Row name is userid A column family is reserved for each action type, e.g.,
web queries, clicks. User profiles are generated using MapReduce.
– These profiles personalize live search results. Replicated geographically to reduce latency and increase
availability.
![Page 39: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/39.jpg)
39
BigTable API Implementation interfaces to
create and delete tables and column families, modify cluster, table, and column family metadata
(e.g., access control rights), write or delete values in Bigtable, look up values from individual rows, iterate over a subset of the data in a table, atomic R-M-W sequences on data stored in a single
row key.
![Page 40: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/40.jpg)
40
Amazon Huge Infrastructure Customer oriented business Reliability is key Guarantee Service Level Agreements
e.g., providing a response within 300ms for 99.9% of its requests for a peak client load of 500 requests per second.
![Page 41: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/41.jpg)
41
Amazon’s Dynamo A distributed key-value storage system
Simple Scale Highly available
![Page 42: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/42.jpg)
42
Requirements and Assumptions Query Model
simple read and write operations to a data item that is uniquely identified by a key.
ACID Properties Atomicity, Consistency, Isolation, Durability.
Efficiency latency requirements which are in general measured at
the 99.9th percentile of the distribution. Other Assumptions
operation environment is assumed to be friendly and there are no security related requirements such as authentication and authorization.
![Page 43: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/43.jpg)
43
Amazon SimpleDB A web service based on Amazon Simple Storage Service
(Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2)
It stores, processes, and queries structured data in real time without operational complexity.
It requires no schema, automatically indexes data, and provides a simple API for storage and access. eliminating the administrative burden of data modeling, index
maintenance, and performance tuning. Developers gain access to its functionality within Amazon's
computing environment, are able to scale instantly, and pay for what they use.
![Page 44: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/44.jpg)
44
Features of SimpleDB Simple to use Flexible Scalable Fast Reliable Inexpensive Designed for use with other Amazon Web services
![Page 45: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/45.jpg)
45
SimpleDB – Simple to Use Allowing users to quickly add data and easily
retrieve or edit that data through a simple set of web service based API calls.
Eliminating the complexity of maintaining and scaling users’ operations.
![Page 46: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/46.jpg)
46
SimpleDB - Flexible Unnecessary to pre-define all of the data formats
one will need to store; simply add new attributes to the data set when needed, and the system will automatically index the data accordingly.
Storing structured data without first defining a schema provides developers with greater flexibility when building applications.
![Page 47: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/47.jpg)
47
SimpleDB - Scalable Allowing one to easily scale applications. Users
can quickly create new domains as the data grows or your request throughput increases.
Currently, users can store up to 10 GB per domain and can create up to 250 domains.
![Page 48: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/48.jpg)
48
SimpleDB - Fast Providing quick, efficient storage and retrieval of
data to support high performance web applications.
![Page 49: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/49.jpg)
49
SimpleDB - Reliable The service runs within Amazon's high-availability
data centers to provide strong and consistent performance.
To prevent data from being lost or becoming unavailable, users’ fully indexed data is stored redundantly across multiple servers and data centers.
![Page 50: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/50.jpg)
50
SimpleDB - Inexpensive Users pay only for resources they consume. Avoiding significant up-front expenditures
traditionally required to obtain software licenses and purchase and maintain hardware, either in-house or hosted.
Freeing users from many of the complexities of capacity planning, transforms large capital expenditures into much smaller operating costs, and eliminating the need to over-buy "safety net" capacity to handle periodic traffic spikes.
![Page 51: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/51.jpg)
51
SimpleDB – Integration with other Amazon Web Services
Integrating with other Amazon web services such as Amazon EC2 compute cloud and Amazon S3 storage. E.g., developers can query the object metadata from
within the application in Amazon EC2 and return pointers to the objects stored in Amazon S3.
![Page 52: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/52.jpg)
52
Eric A. Brewer’s CAP Theorem for Availability
Traditionally, thought of as the server/process available five 9’s (99.999 %).
However, for large node systems, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes. Want a system that is resilient in the face
of network disruption
![Page 53: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/53.jpg)
53
Consistency Problem For example:
Row V0 is replicated on nodes N1 and N2
Client A writes row V0 to node N1 Some period of time t elapses. Client B reads row V0 from node N2 Does client B see the write from client A?
![Page 54: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/54.jpg)
54
Consistency Problem (cont.)
![Page 55: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/55.jpg)
55
Consistency A consistency model determines rules for visibility
and apparent order of updates Locking-based or Timestamp order-based For Distributed DBMS we learned, the answer is: yes Could the answer maybe be acceptable?
Consistency is a continuum with tradeoffs CAP Theorem states that
Strict Consistency can't be achieved at the same time as availability and partition-tolerance.
![Page 56: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/56.jpg)
56
Eventual Consistency* When no updates occur for a long period of time,
eventually all updates will propagate through the system and all the nodes will be consistent.
For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service.
http://en.wikipedia.org/wiki/Eventual_consistencyhttp://www.allthingsdistributed.com/2008/12/eventually_consistent.html
![Page 57: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/57.jpg)
57
Brewer’s CAP Theorem Born at the talk on Principles of Distributed
Computing (PODS) Conference, July 2000 Three properties of a system -
availability, consistency, and partitions
Theorem: You can have at most two of these three properties for any shared-data system.
![Page 58: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/58.jpg)
58
Brewer’s CAP Theorem (cont.)
To scale out, you have to partition. That leaves either consistency or availability to choose from In almost all cases, you would choose availability over
consistency It is impossible to achieve all three.
![Page 59: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/59.jpg)
59
ACID vs. BASE DBMS research is about ACID (mostly) But we loss “C” and “I” for availability, graceful
degradation, and performance This tradeoff is fundamental.BASE: – Basically Available (system seems to work all the time) – Soft-state (it doesn't have to be consistent all the time) – Eventual consistency (it becomes consistent at some later
time)
![Page 60: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/60.jpg)
60
ACID vs. BASE (cont.) ACID Strong consistency Isolation Focus on “commit” Nested transaction Availability? Conservative
(pessimistic) Difficult evolution (e.g., schema)
BASE Weak consistency
Stale data OK Availability first Approximate answers
OK Aggressive (optimistic) Simpler! Faster Easier evolution
![Page 61: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/61.jpg)
61
Forfeit Partitions in CAP Theorem
Traits: - 2-phase commit - Cache validation protocols
Examples: - single site databases - cluster databases - LDAP - xFS file systems
![Page 62: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/62.jpg)
62
Forfeit Availability in CAP Theorem
Traits: - Pessimistic locking - Making minority partitions unavailable
Examples: - Distributed databases - Distributed locking
![Page 63: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/63.jpg)
63
Forfeit Consistency in CAP Theorem
Traits: - Expirations/leases - Conflict resolution - Optimistic
Examples: - Web Caching - DNS (Domain Name System ) - Coda file systems
![Page 64: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/64.jpg)
64
Tradeoffs in Reality The whole space is useful Real internet systems are a careful mixture of
ACID and BASE Use ACID for user profiles and logging (for revenue)
Symptom of a deeper problem: systems and database communities are separate but overlapping (with distinct vocabularies)
Big applications like Google, Yahoo, Facebook, Amazon, eBay, etc. adopt CAP and BASE
![Page 65: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/65.jpg)
65
Tradeoffs in Reality (cont.) Can have consistency & availability within a cluster,
but it is still hard in practice OS/Networking good at BASE/Availability, but
terrible at consistency Databases better at Consistency than Availability Wide-area databases can’t have both Disconnected clients can’t have both Parallel programming is very relevant, except…
historically avoids availability no notion of online evolution best for CPU-bound tasks
![Page 66: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/66.jpg)
66
Tradeoffs in Reality (cont.) All systems are probabilistic
no such thing as a 100% working system no such thing as 100% fault tolerance partial results are often OK (and better than none)
Enterprises cannot afford to lose the ACID properties
Most current enterprise applications require SQL support
![Page 67: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/67.jpg)
67
8. Beyond Traditional RDBMS The Big Data Era NoSQL (Not Only SQL) Databases New SQL Databases
![Page 68: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/68.jpg)
68
NewSQL Solutions SQL as the primary mechanism for application
interaction. ACID support for transactions. A non-locking concurrency control mechanism, so
real-time reads will not conflict with writes, and thus cause them to stall.
An architecture providing much higher per-node performance than available from traditional RDBMS solutions.
A scale-out, shared-nothing architecture, capable of running on a large number of nodes without suffering bottlenecks.
![Page 69: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/69.jpg)
69
Categorization of NewSQL Solutions1) New databases2) New MySQL storage engines 3) Transparent clustering
![Page 70: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/70.jpg)
70
New Databases Newly designed from scratch to achieve scalability and
performance. Some (hopefully minor) changes to the code Data migration is needed. One of the key considerations in improving the performance
is making non-disk (memory) or new kinds of disks (flash/SSD) the primary data store.
Solutions can be software-only (VoltDB, NuoDB and Drizzle) or supported as an appliance (Clustrix, Translattice).
Examples: Clustrix, NuoDB and Translattice (commercial); and VoltDB, Drizzle, etc., (open source).
![Page 71: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/71.jpg)
71
New MySQL Storage Engines MySQL is part of the LAMP stack and is used extensively
in OLTP. To overcome MySQL’s scalability problems, a set of
storage engines are developed Xeround, Akiban, MySQL NDB cluster, GenieDB, Tokutek, etc.
The good part is the usage of the MySQL interface, but the downside is data migration from other databases (including old MySQL) is not supported. Xeround, GenieDB and TokuTek (commercial); and Akiban,
MySQL NDB Cluster and others in open source.
![Page 72: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/72.jpg)
72
Transparent Clustering Retain the OLTP databases in their original format, but
provide a pluggable feature to cluster transparently, to ensure scalability. Schooner MySQL, Continuent Tungsten, ScalArc
Provide transparent sharding to improve scalability. ScaleBase, dbShards
Both approaches allow reuse of existing skill sets and ecosystem, and avoid the need to rewrite code or perform any data migration. Examples of offerings are ScalArc, Schooner MySQL, dbShards
and ScaleBase (commercial); and Continuent Tungsten (open source).
![Page 73: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/73.jpg)
73
Summary The most powerful technologies take a while to
mature. But when they do, they can rapidly retire mainstays that are decades old.
Gartner Inc.’s hype cycle: a graphic representation of the maturity, adoption, and social application of specific technologies
![Page 74: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/74.jpg)
74
References BigTable
http://labs.google.com/papers/bigtable.html Dynamo
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html http://www.allthingsdistributed.com/files/amazon-dynamo-
sosp2007.pdf Amazon and consistency
http://www.allthingsdistributed.com/2010/02 http://www.allthingsdistributed.com/2008/12
Brewer’s CAP Theorem and BASE http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
(English) http://pt.alibaba-inc.com/wp/dev_related_728/brewers-cap-theorem
.html (in Chinese)
![Page 75: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/75.jpg)
75
References (cont.) NoSQL
http://cloudera-todd.s3.amazonaws.com/nosql.pdf http://nosql-database.org/
NewSQL http://www.infoq.com/news/2011/04/newsql http://simpleframework.net/blog/v/38088.html http://www.linuxforu.com/2012/01/newsql-handle-big-data/ http://en.wikipedia.org/wiki/Graph_database
Comparison of Scalable SQL and NoSQL Data Stores (SIGMOD Record, 39(4), 2010)
![Page 76: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/76.jpg)
76
Question & Answer
![Page 77: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/77.jpg)
77
Top 11 Technologies of the Decade
#1 Smartphones Finally, all pocketable gadgets have converged in a single
device that goes everywhere and does everything#2 Social Networking
Eavesdropping on friends’ private lives has never been so easy
#3 Voice Over IP Say good-bye to switching circuits, hello to digital
telephony.#4 LED Lighting
Solid-state lighting got white hot only when engineers mastered the blue arts.
![Page 78: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/78.jpg)
78
Top 11 Technologies of the Decade#5 Multicore CPUs
Processors have gone from having a single core to dozens. Where will it end?
#6 Cloud Computing Your data can now wander the global without you.
#7 Drone Aircraft Unmanned aerial vehicles have given war fighters
remote eyes – and arms.
#8 Planetary Rovers Robotic rovers are expanding our knowledge of the
universe by exploring strange new worlds.
![Page 79: 9. Beyond Traditional RDBMS](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816956550346895de10397/html5/thumbnails/79.jpg)
79
Top 11 Technologies of the Decade#9 Flexible AC Transmission
At last, engineers can make alternating current go exactly where they want it.
#10 Digital Photography When cameras abandoned film for pixels, they changed
the way we communicate.
#11 Class-D Audio Now you can annoy your neighbors at higher fidelity –
and with stunning efficiency.