preparing your data for the cloud
DESCRIPTION
More and more Enterprises are moving their IT infrastructure to Cloud platforms. Out of the entire components, Data Storage still remains a tricky part of the puzzle. I would like to present an overview of the choices, their advantages and limitations, we as Software Developers have currently. Based upon the choices, we may need to think about the design and architecture of the data-manipulation components of the application, we plan to put on Cloud. Following is an overview of the proposed agenda: Existing “Cloud Capable” and “Cloud Native” Relational DBMS Existing “Cloud Capable” and “Cloud Native” Non-Relational DBMS Main differences between Relational and Non-Relational DBMS’s Advantages and Limitations of Relational DBMS on Cloud Platforms Advantages and Limitations of Non-Relational DBMS on Cloud Platforms Design Patterns while using Non-Relational DBMS in the application Code Walk-through showing Integration of “Cloud Capable” and “Cloud Native” Non-Relational DBMS with a Web-ApplicationTRANSCRIPT
1
Preparing Your Data For Cloud
Narinder KumarInphina Technologies
2
Agenda
Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability of Different Data-Bases in
different environments
3
Relational DBMS - Pros
Data Integrity ACID Capabilities High Level Query Model Data Normalization Data Independence
4
Relational DBMS - Cons
Scaling Issues By Duplication (Master-Slave)
By Sharding/Division (Not transparent)
Fixed Schema Mostly disk-oriented (Performance) May fair poorly with large data
5
Non-Relational DBMS - Pros
Scalability
Replication / Availability
Performance
Deployment Flexibility
Modelling Flexibility
Faster Development (?)
6
Non-Relational DBMS - Cons
Lack of Transactional Support Data Integrity is Application's responsibility Data Duplication / Application Dependent Eventually Consistent (mostly) No Standardization New Technology
7
RDBMS's and Cloud
8
Cloud Capable RDBMS
s
Almost every RDBMS can run in a IAAS Cloud Platform
9
Cloud Native RDBMS
10
Types of Non-Relational DBMS
Key Value Stores
Document Stores
Column Stores
Graph Stores
11
Key Value Data-Bases
Object is completely Opaque to DB Mostly GET, PUT & DELETE operations are
supported There may be limits on size of Objects
Inspired by Amazon Dynamo Paper
12
Key Value DataBases & Cloud
Project Voldemort
MemCachedDB Tokyo Tyrant
13
Document Data-Bases Object is not completely opaque to DB Every Object has it's own schema
FirstName="Bob", Address="5 Oak St.", Hobby="sailing".
FirstName="Jonathan", Children=("Michael,10", "Jennifer,8")
Can perform queries based on Object's attributes
Possible to describe relationships between Objects
Joins and Transactions are not supported Good for XML or JSON objects
14
Document Data-Bases & Cloud
15
Column-Store Data-Bases
Richer than Document Stores Multi-Dimensional Map
Tables Row Column Time-Stamp
Supports Multiple Data Types Usually use an Underlying DFS
Inspired by Google Big Table Paper
16
Column-Store Data-Bases & Cloud
17
Key Factors while Making a Choice
Application Architecture Requirements Platform choices Non-Functional Requirements
Consistency
Availability
Partition
Security
Data Redemption
18
Different Requirements = Different Solutions
19
Scenario 1
Feature First Corporate Data
Consistency Requirements
Business Intelligence
Legacy Application
RDBMS on Amazon Cloud, RackSpace (IaaS) or
Microsoft Azure/Amazon RDS (PaaS)
20
Scenario 2
Consumer Facing Application Big Files (Images, BLOB's, Files)
Geographically Distributed
Mostly writes
Not heavy requirement on Rich Queries
Key-Value Data Stores (Amazon S3, Project Voldemort, Redis)
21
Scenario 3
Hundreds Of Government Documents with different schemas Need to serve on Web
Data Mining
Document Data-Stores (Amazon SimpleDB, Apache Couche DB, MongoDB)
22
Scenario 4
Scale First Huge Data-Set
Analytical Requirements
Consumer Facing
High Availability over Consistency
Column Data-Stores (Google App Engine, Hbase, Cassandra)
23
Mix & Match of Earlier Scenarios
Polyglot Persistence RDBMS for low-volume
and high value
Key-Value DB for large files with little queries
Memcached DB for short-lived Data
Column DB for Analytics
24
CAP Theorem
25
Conclusions
One Size does not Fit all
Many choices
No-SQL DB's providing Alternatives
RDBMS serve useful purpose
26
www.inphina.comhttp://thoughts.inphina.com
27
References http://nosql-database.org/
http://www.drdobbs.com/database/218900502
http://perspectives.mvdirona.com/2009/11/03/OneSizeDoesNotFitAll.aspx
http://blog.nahurst.com/visual-guide-to-nosql-systems
http://blog.heroku.com/archives/2010/7/20/nosql/
http://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape.html
http://project-voldemort.com/
http://code.google.com/p/redis/
http://memcachedb.org/
http://aws.amazon.com/simpledb/
http://couchdb.apache.org/
http://www.mongodb.org
http://riak.basho.com/