flexibility: the future of database technology - amazon s3paper.pdf · white paper flexibility: the...

13
Flexibility: The Future of Database Technology Dramatically improve performance and reduce operational costs by harnessing the full power of the cloud. WHITE PAPER

Upload: truonganh

Post on 11-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Flexibility: The Future of Database Technology Dramatically improve performance and reduce operational costs by harnessing the full power of the cloud.

W H I T E PA P E R

White Paper | Flexibility: The Future of Database Technology www.parelastic.com 2

Table of Contents

Today’s Operational Big Data Challenges......................................................................................... 3

ParElastic Patented Elastic Transparent Sharding ArchitectureTM ................................................. 4

Dynamic Horizontal Scalability ....................................................................................................... 4

Flexibility in a Relational Database ................................................................................................ 4

Use Your Database Unmodified ..................................................................................................... 5

Consistent Database Performance, Guaranteed. .......................................................................... 5

Feature and Benefit Summary............................................................................................................ 5

Adaptive Provisioning ..................................................................................................................... 5

Automated Data Distribution .......................................................................................................... 5

Scaling Reads and Writes .............................................................................................................. 5

Built for Multi-Tenancy .................................................................................................................... 5

Supports “Cross-Shard” Operations ............................................................................................... 6

Summary .............................................................................................................................................. 6

Appendix: Deployment Scenarios ..................................................................................................... 7

Deployment in the Cloud ................................................................................................................ 7

Database Tier for SaaS applications .............................................................................................. 8

High Core Density, and High Performance Dedicated Hardware ................................................. 10

Read and Write Scaling ................................................................................................................ 11

Cloud Bursting .............................................................................................................................. 11

Corporate SQL GridTM .................................................................................................................. 12

White Paper | Flexibility: The Future of Database Technology www.parelastic.com 3

Today’s Operational Big Data Challenges

Three simultaneous and disruptive trends are creating a massive increase in the volume and volatility of data under management and, consequently, are driving the “operational big data” phenomenon:

• Social media applications make the largely static, read-only Web interactive with a continuous stream of updates from users.

• Mobile communications are “always on/always connected,” resulting in higher degrees of interactivity and variability in workloads, as users can access data any hour of the day.

• Cloud computing brings the promise of infinite scalability and the ability to pay only for the resources needed. However, it requires that software be built to utilize large numbers of commodity systems.

These trends put tremendous pressure on traditional database systems within organizations. In response, application developers spend an inordinate amount of their development time and resources on innovation in the data tier to address the shortcomings of existing database offerings. This leads to higher operational costs, an inferior client experience, and longer time to market.

Organizations have two choices for addressing these obstacles: 1) move to new and unproven databases at tremendous risk, or 2) leverage existing databases with ParElastic.

The ParElastic Database Virtualization EngineTM is specifically designed to combat operational big data challenges. It dramatically increases the flexibility of relational databases, improving perfor-mance and reliability while reducing storage and processing costs. It is now possible to scale out databases on demand and support operational workloads that exceed the capabilities of a single database server, while provisioning, consuming, and paying only for the capacity needed at any given instant.

The ParElastic Database Virtualization Engine brings the flexibility of the cloud to every dimension of the database tier:

• Scales Out Databases on Demand. ParElastic creates a virtual database from several off-the-shelf relational databases. Multiple relational database servers handle dynamic workloads while appearing to applications as a single database server. Loads are automatically balanced and failures isolated, eliminating performance bottlenecks and downtime.

• Dynamically Adds Storage and Processing. Pay Only for What You Use. ParElastic’s patented technology dynamically and independently adds storage and processing power in response to changing workloads and data volumes. This approach eliminates the need to provision systems for peak capacity. Organizations pay for only the capacity they need at any given instant.

• Uses Existing Databases and Applications. No Risk. No Disruption. The ParElastic Database Virtualization Engine sits between existing applications and the database. Using standard, proven databases, interfaces, and semantics eliminates risk. And ParElastic does not require any changes to applications, complex partitioning, or sharding, which means there is no disruption. With ParElastic, organizations have the freedom to focus on their applications rather than on their infrastructure.

White Paper | Flexibility: The Future of Database Technology www.parelastic.com 4

ParElastic Patented Elastic Transparent Sharding ArchitectureTM

ParElastic sits between the application and a group of relational databases. The software exposes industry-standard interfaces (xDBC, Perl DBI, PDO, etc.) and database-specific interfaces (TDS for SQL Server, libmy for MySQL, etc.). Applications perceive a single “virtual database” and interact with this virtual database in the underlying databases’ native SQL dialect.

Dynamic Horizontal ScalabilityUser data is stored on multiple database servers, with each table stored in a manner consistent with its purpose in the user schema. Some tables are horizontally partitioned over multiple database servers, while others are replicated across them. When ParElastic receives a query, each database server performs some part or parts of the query under the direct control of the software. The results are then returned to the application.

ParElastic’s technologies automatically partition tables over multiple database servers in a way that optimizes query performance. ParElastic determines where to place data based on rules defined by the application, which allows ParElastic to scale the data tier to address workloads that are too large for a single database server. In addition, new database servers can be added to the group without requiring costly data redistribution.

Flexibility in a Relational DatabaseWith ParElastic, the number of database servers in use can also be changed dynamically based on user interaction and system workload. When the workload exceeds the current capacity of the data tier, the software provisions additional database servers; when the workload drops, unused database servers can be released. The system will immediately utilize additional database servers provisioned in response to an increase in workload and adapt as database servers are released when workloads decrease.

ParElastic Database Virtualization Engine—Enables elastic database capacity by creating a virtual database from several off-the-shelf relational database servers.

ParElastic

MySQL Databases

Application

Standard Interfaces

White Paper | Flexibility: The Future of Database Technology www.parelastic.com 5

This flexibility enables users to provision and pay only for the capacity they need at any given instant, which makes ParElastic ideal for cloud-based deployments and deployments on high-core-density dedicated database servers.

Use Your Database UnmodifiedUse MySQL or any variant like Percona or MariaDB. There are no changes to your application, database, or staff. Applications interact with a “virtual database” that is exposed by the ParElastic software. The applications perceive a single database server that resembles the underlying (native) database server and interact with it using industry-standard database specific interfaces and language constructs. The applications need not know about the underlying physical layout of data and the number of participating database servers.

Consistent Database Performance … Guaranteed ParElastic offers flexibility by creating a virtual database from a collection of unmodified relational database nodes. As data and workload in the application increase, consistent performance is guaranteed. ParElastic’s virtual database scales up both reads and writes. Unlike single-master- multi-slave configurations that only scale up reads, ParElastic improves performance in both reads and writes by spreading the I/O over more spindles.

Feature and Benefit Summary

The ParElastic Database Virtualization Engine is the only solution that brings flexibility to all dimensions of the database. Key features include:

Adaptive ProvisioningWhen user volume and system workloads exceed the data tier’s current capacity, ParElastic immediately provisions additional database servers. When the workload decreases, unused servers can be released.

Automated Data DistributionTo optimize query performance, tables are automatically partitioned over multiple database servers. ParElastic determines where to place data based on rules defined by the system administrator.

Scaling Reads and Writes ParElastic provides flexibility by creating a virtual database from a collection of unmodified relational database nodes. As data and workload in the application increase, consistent performance is guaranteed.

Built for Multi-TenancyFor SaaS environments, each tenant operates as if it had its own private database. Behind the scenes, ParElastic balances load across all available servers and eliminates per-database overhead. Since workload is distributed across all servers, the “bad neighbor effect” is eliminated — one tenant’s operations will not adversely affect the performance of others.

White Paper | Flexibility: The Future of Database Technology www.parelastic.com 6

Supports “Cross-Shard” OperationsParElastic operates transparently across all partitions, which alleviates the need to drive large datasets to the application layer for processing and maximizes performance. ParElastic treats an entire dataset as one unified, virtual database.

Enterprises using ParElastic enjoy flexible development, deployment, and operations.

• Flexible Development. Standard SQL, tools, and interfaces can be used. As a result, existing MySQL applications do not need to be modified.

• Flexible Deployment. ParElastic scales existing databases, which means organizations can leverage their existing investments. The ParElastic Database Virtualization Engine can be deployed to data centers as well as public and private clouds.

• Flexible Operations. With ParElastic, enterprises don’t have to overprovision. As workload demands increase, they can add capacity for storage or processing power independently and pay only for what they use.

Summary

ParElastic is the only solution that brings the flexibility of cloud architectures to all dimensions of a database. By leveraging existing MySQL servers, ParElastic dramatically increases the flexibility of existing relational databases, improving performance and reliability while reducing operational costs.

Enterprises can now take full advantage of opportunities driven by cloud computing, mobile communications, and social media. With ParElastic, organizations enjoy outstanding user experiences, lower operational costs, faster time to market, and no risk. It just works. Deployment scenarios are detailed in the appendix.

White Paper | Flexibility: The Future of Database Technology www.parelastic.com 7

Appendix: Deployment Scenarios

Deployment in the CloudThis scenario depicts a simple example of ParElastic deployed in a public cloud, such as Amazon. In this illustration, MySQL databases are running on standard EC2 instances. Data are stored on and queries are processed by the MySQL database instances. When load on the system is increased, additional EC2 instances with MySQL database servers can be provisioned in the cloud and brought online as compute nodes. These nodes can then be deprovisioned when load on the system decreases. When data volumes increase, additional EC2 instances with MySQL database servers can be provisioned and brought online as storage nodes.

ParElastic allows applications to provision and use only the infrastructure they need at any given point in time, thereby delivering the maximum benefits of cloud-based deployment. Further, by dividing the work between several database instances, the data tier can scale to workloads that exceed the capacity of a single server. ParElastic delivers flexible scale-out of off-the-shelf MySQL databases, which allows applications to migrate to ParElastic with minimal change and capitalize on the maturity and stability of the millions of existing MySQL deployments. This architecture also allows all management and high-availability tools, such as replication and failover, to be used seamlessly with ParElastic.

Deployment in the Cloud

ParElastic

Amazon EC2 Instances + MySQL Databases

Provision additional as needed

Application

Standard Interfaces

White Paper | Flexibility: The Future of Database Technology www.parelastic.com 8

Database Tier for SaaS ApplicationsExtending the previous scenario, ParElastic can significantly help with implementation of SaaS applications. Typically, applications are written using a simple approach in which each client of the application has data stored in a dedicated database. Each client database uses the same schema that includes the same tables and the same internal structures. When a client connects to the SaaS application, the application translates all the client’s requests to the appropriate database.

While this approach makes for simpler application deployment and separation of client data, it also introduces several complications. First, each of the client databases resides on a single server, and allocating databases to servers causes individual servers to become bottlenecks. If, for example, one client performed an operation that stressed the database server, all other clients sharing that database server would face degraded performance. Thus, in assigning databases to servers, significant excess capacity must be provisioned on each of the servers, which leads to low overall server utilization and significant excess capacity. Second, the multiplicity of databases introduces additional overhead that degrades database performance and increases administrative complexity.

ParElastic virtualizes the application databases and provides the SaaS application with a virtual database for each client. ParElastic then distributes client data and consequent database load over a number of database servers.

Database Tier for SaaS Applications

C4’s Database C3’s DatabaseC2’s Database

MySQL ServerMySQL Server MySQL Server MySQL Server

C1’s Database

Client C1 Client C3Client C2 Client C4

SaaS Application

White Paper | Flexibility: The Future of Database Technology www.parelastic.com 9

Clients interact with the virtual databases exposed by ParElastic, and these databases are physically implemented by the underlying database instances. Each client is provided access to their data only, while data belonging to other clients is hidden. This setup enhances the SaaS application in several ways:

• Flexible Scale-Out of the Application’s Databases. Because the data and the query processing are distributed over multiple database servers, the application can scale the data tier beyond the physical limits of a single database server.

• Reduced Database Overhead and Maintenance Costs. All data is stored in a common database, minimizing database overhead and maintenance costs.

• Increased Server Utilization and Lower Infrastructure Costs. ParElastic can utilize additional compute nodes that are provisioned when the load on the database warrants it. This allows systems to provision and use only the infrastructure they need at any given instant, significantly increasing server utilization and reducing costs.

Database Tier for SaaS Applications With ParElastic

MySQL Server

MySQL Server

Database Database

Client C1 Client C3Client C2 Client C4

SaaS Application

ParElastic

C2’s Virtual

Database

C4’s Virtual

Database

C1’s Virtual

Database

C3’s Virtual

Database

White Paper | Flexibility: The Future of Database Technology www.parelastic.com 10

High-Core-Density and High-Performance Dedicated HardwareIn addition to cloud-based deployments, ParElastic is highly applicable to deployment on dedicated hardware in the data center. This scenario illustrates how ParElastic allows users to fully exploit advanced hardware that is now becoming increasingly affordable.

Conventional database technologies are either single-threaded or dedicate no more than a single thread of execution to any query. These databases are unable to fully exploit newer hardware with high core densities, huge amounts of RAM, and high-performance interconnections to RAM and local storage. Their capabilities are further severely curtailed due to locking and other IPC mechanisms.

Because ParElastic builds a virtual database on top of a collection of free-running database instances, multiple MySQL instances can be launched on the same physical server, and all of them can be brought to bear on a single query. The fact that parallel databases perform better and scale to a greater degree than do single-image databases is well-established.

Consider a simple example of a table that has a billion rows and occupies 10 gigabytes on a disk. A single MySQL server performing a sequential scan of this entire table would be bottlenecked by the speed at which it can process the blocks of data coming off the disk. With multiple MySQL instances, each operating on a slice of the data, the operation could in theory be completed in a quarter of the time while utilizing more cores and maximizing utilization of the I/O bandwidth.

In this deployment, a query received by ParElastic is collaboratively executed by a collection of MySQL instances all running on the same physical server. Because each server is executing on its section of the data, it can do so very effectively while exploiting the full power of the advanced hardware available on the system. In systems that have GPU-based acceleration, the database instances can also exploit those technologies.

This deployment scenario illustrates the power of parallelism and how ParElastic can leverage increasingly affordable and increasingly powerful hardware at a fraction of the cost of high-end parallel database servers like IBM DB2 Parallel Edition or Oracle Exadata.

Single MySQL Server(increased memory, processing and storage)

ParElastic Database Virtualization Engine

MySQL Instance

MySQL Instance

MySQL Instance

MySQL Instance

High Core Density and High Performance Dedicated Hardware

White Paper | Flexibility: The Future of Database Technology www.parelastic.com 11

Read and Write ScalingAn immediate benefit of the ParElastic Database Virtualization Engine is that it can scale both reads and writes. While systems that are scaled based on replication, caches, or content delivery networks (CDNs) can provide improved read performance, they introduce severe complications in write-intensive scenarios, such as inconsistent and stale data read from read slaves, reduced write throughput, and increased latency.

The ParElastic Database Virtualization Engine distributes I/Os over a number of databases, with each database instance performing reads and writes on its own dedicated table space. The distribution of data across multiple storage sites in the parallel database enables linear scalability in read and write throughput.

Cloud Bursting “Cloud bursting” is a term commonly used to refer to the practice of using cloud-based resources to address periodic spikes in demand. Traditional database technologies cannot effectively leverage the cloud to handle these periodic spikes.

Consider the case of a database deployment with ParElastic controlling a collection of MySQL databases in the data center. When presented with a sudden spike in demand, ParElastic can provision additional compute instances in the cloud and bring them to bear on the increased demand. When the spike in demand abates, the cloud-based resources can be released.

This approach enables customers to create highly optimized configurations for their databases in their own data center without provisioning for occasional peak loads. Enterprises can maximize utilization of the data center infrastructure while easily addressing occasional spikes in load by utilizing on-demand resources in the cloud.

Cloud Bursting

ParElastic Database Virtualization Engine

MySQL Databases in Data Center

MySQL Databases in Cloud

ApplicationStandard Interfaces

White Paper | Flexibility: The Future of Database Technology www.parelastic.com 12

Corporate SQL GridTM Large corporations often have site licenses for their database products that are installed on hundreds of servers within their data center. Frequently, at specific times of the day, week, month, or year, some very small subset of those database servers encounter very high loads and become bottlenecks. At the same time, a huge unused database capacity is available on a number of servers that cannot be brought to bear on the load at hand. System administrators have to provision expensive hardware to augment the processing capabilities of that small subset of servers in order to cope with the peak load.

In this scenario, ParElastic can help by building a corporate “SQL Grid” — a shared pool of database resources that can all be brought to bear on the periodic load. It is, in many ways, similar to the cloud-bursting scenario, but rather than bursting into the cloud, the system bursts onto other database servers found within the corporate data center.

Because all the servers reside within the security envelope of the corporation, the safety of the data can be ensured using existing access-control mechanisms. Available database instances on the servers that are otherwise unused can be leveraged to address the spike in load on any one of the servers. Because the ParElastic SQL grid exposed by ParElastic uses the underlying semantics of the database, applications can work largely unmodified.

Corporate SQL Grid

Corporate LAN ParElastic SQL Grid

For more information about ParElastic, please contact us at [email protected].

125 CambridgePark Drive, Suite 400Cambridge, MA 02140

www.parelastic.com

© 2013 ParElastic Corporation. This document is provided for informational purposes only, and the contents hereof are subject to change without notice. While reasonable efforts have been made to ensure correctness, this document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission.

The ParElastic logo is a trademark of ParElastic Corporation. Other names may be trademarks of their respective owners.