prepare your data for the cloud
DESCRIPTION
Session Presented @IndicThreads Cloud Computing Conference, Pune, India ( http://u10.indicthreads.com ) ------------ More and more Enterprises are moving their IT infrastructure to Cloud platforms. Out of the entire components, Data Storage still remains a tricky part of the puzzle. I would like to present an overview of the choices, their advantages and limitations, we as Software Developers have currently. Based upon the choices, we may need to think about the design and architecture of the data-manipulation components of the application, we plan to put on Cloud. Following is an overview of the proposed agenda: * Existing “Cloud Capable” and “Cloud Native” Relational DBMS * Existing “Cloud Capable” and “Cloud Native” Non-Relational DBMS * Main differences between Relational and Non-Relational DBMS’s * Advantages and Limitations of Relational DBMS on Cloud Platforms * Advantages and Limitations of Non-Relational DBMS on Cloud Platforms * Design Patterns while using Non-Relational DBMS in the application * Code Walk-through showing Integration of “Cloud Capable” and “Cloud Native” Non-Relational DBMS with a Web-Application Takeaways from the session * Overview of current Market Situation w.rt. Data Storage on Cloud * Helpful Pointers towards making the right choice of Data Storage platform * How Non-Relational DBMS’s can be integrated into our applicationsTRANSCRIPT
![Page 1: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/1.jpg)
1
Preparing Your Data For Cloud
Narinder KumarInphina Technologies
![Page 2: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/2.jpg)
2
Agenda
Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability of Different Data-Bases in
different environments
![Page 3: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/3.jpg)
3
Relational DBMS - Pros
Data Integrity ACID Capabilities High Level Query Model Data Normalization Data Independence
![Page 4: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/4.jpg)
4
Relational DBMS - Cons
Scaling Issues By Duplication (Master-Slave)
By Sharding/Division (Not transparent)
Fixed Schema Mostly disk-oriented (Performance) May fair poorly with large data
![Page 5: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/5.jpg)
5
Non-Relational DBMS - Pros
Scalability
Replication / Availability
Performance
Deployment Flexibility
Modelling Flexibility
Faster Development (?)
![Page 6: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/6.jpg)
6
Non-Relational DBMS - Cons
Lack of Transactional Support Data Integrity is Application's responsibility Data Duplication / Application Dependent Eventually Consistent (mostly) No Standardization New Technology
![Page 7: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/7.jpg)
7
RDBMS's and Cloud
![Page 8: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/8.jpg)
8
Cloud Capable RDBMS
s
Almost every RDBMS can run in a IAAS Cloud Platform
![Page 9: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/9.jpg)
9
Cloud Native RDBMS
![Page 10: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/10.jpg)
10
Types of Non-Relational DBMS
Key Value Stores
Document Stores
Column Stores
Graph Stores
![Page 11: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/11.jpg)
11
Key Value Data-Bases
Object is completely Opaque to DB Mostly GET, PUT & DELETE operations are
supported There may be limits on size of Objects
Inspired by Amazon Dynamo Paper
![Page 12: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/12.jpg)
12
Key Value DataBases & Cloud
Project Voldemort
MemCachedDB Tokyo Tyrant
![Page 13: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/13.jpg)
13
Document Data-Bases Object is not completely opaque to DB Every Object has it's own schema
FirstName="Bob", Address="5 Oak St.", Hobby="sailing".
FirstName="Jonathan", Children=("Michael,10", "Jennifer,8")
Can perform queries based on Object's attributes
Possible to describe relationships between Objects
Joins and Transactions are not supported Good for XML or JSON objects
![Page 14: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/14.jpg)
14
Document Data-Bases & Cloud
![Page 15: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/15.jpg)
15
Column-Store Data-Bases
Richer than Document Stores Multi-Dimensional Map
Tables Row Column Time-Stamp
Supports Multiple Data Types Usually use an Underlying DFS
Inspired by Google Big Table Paper
![Page 16: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/16.jpg)
16
Column-Store Data-Bases & Cloud
![Page 17: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/17.jpg)
17
Key Factors while Making a Choice
Application Architecture Requirements Platform choices Non-Functional Requirements
Consistency
Availability
Partition
Security
Data Redemption
![Page 18: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/18.jpg)
18
Different Requirements = Different Solutions
![Page 19: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/19.jpg)
19
Scenario 1
Feature First Corporate Data
Consistency Requirements
Business Intelligence
Legacy Application
RDBMS on Amazon Cloud, RackSpace (IaaS) or
Microsoft Azure/Amazon RDS (PaaS)
![Page 20: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/20.jpg)
20
Scenario 2
Consumer Facing Application Big Files (Images, BLOB's, Files)
Geographically Distributed
Mostly writes
Not heavy requirement on Rich Queries
Key-Value Data Stores (Amazon S3, Project Voldemort, Redis)
![Page 21: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/21.jpg)
21
Scenario 3
Hundreds Of Government Documents with different schemas Need to serve on Web
Data Mining
Document Data-Stores (Amazon SimpleDB, Apache Couche DB, MongoDB)
![Page 22: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/22.jpg)
22
Scenario 4
Scale First Huge Data-Set
Analytical Requirements
Consumer Facing
High Availability over Consistency
Column Data-Stores (Google App Engine, Hbase, Cassandra)
![Page 23: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/23.jpg)
23
Mix & Match of Earlier Scenarios
Polyglot Persistence RDBMS for low-volume
and high value
Key-Value DB for large files with little queries
Memcached DB for short-lived Data
Column DB for Analytics
![Page 24: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/24.jpg)
24
CAP Theorem
![Page 25: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/25.jpg)
25
Conclusions
One Size does not Fit all
Many choices
No-SQL DB's providing Alternatives
RDBMS serve useful purpose
![Page 26: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/26.jpg)
26
www.inphina.comhttp://thoughts.inphina.com
![Page 27: Prepare Your Data For The Cloud](https://reader033.vdocuments.site/reader033/viewer/2022060110/5560b83ed8b42af43b8b4bc2/html5/thumbnails/27.jpg)
27
References http://nosql-database.org/
http://www.drdobbs.com/database/218900502
http://perspectives.mvdirona.com/2009/11/03/OneSizeDoesNotFitAll.aspx
http://blog.nahurst.com/visual-guide-to-nosql-systems
http://blog.heroku.com/archives/2010/7/20/nosql/
http://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape.html
http://project-voldemort.com/
http://code.google.com/p/redis/
http://memcachedb.org/
http://aws.amazon.com/simpledb/
http://couchdb.apache.org/
http://www.mongodb.org
http://riak.basho.com/