eb-6592 cloud computing models for data warehousing

Upload: narenfk

Post on 05-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    1/22

    Cloud Computing Models for Data Warehousing

    Cloud Computing Models for DataWarehousing

    2012 Technology White Paper

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    2/22

    Cloud Computing Models for Data Warehousing

    TABLE OF CONTENTS

    Executive Summary ....................................................................................................... 1

    Cloud Computing Concepts .......................................................................................... 2

    Utility Computing .................................................................................................... 2

    Cloud Definitions ..................................................................................................... 3

    Core Characteristics of a Cloud ......................................................................... 3

    Cloud Deployment Models ............................................................................... 3

    The Benefits Reported for Cloud Computing ......................................................... 4

    The Promise of Cloud Computing for Data Warehousing ...................................... 5

    Lower Costs ....................................................................................................... 5

    Faster Delivery .................................................................................................. 6Performance and Scalability ............................................................................. 6

    Agility ................................................................................................................ 7

    Database Workloads and Their Fit with Cloud Infrastructure ...................................... 8

    Shared-nothing Databases are Required to Support BI in the Cloud ..................... 8

    Shared-nothing Databases: Necessary but Not Sufficient ...................................... 9

    Public Versus Private Clouds for BI and Analytic Databases ...................................... 10

    Challenges with the Public Cloud Today ............................................................... 10

    Benefits of Private Cloud over the Traditional Server Model ............................... 11

    Cloud Adoption Preferences ................................................................................. 13

    Private Clouds and the Data Warehouse in Action .................................................... 14

    Creating a Consolidated Data Platform .......................................................... 14

    Costs, Budgeting and Planning ....................................................................... 15

    Managing Performance .................................................................................. 16

    Agility .............................................................................................................. 17

    Conclusion and Recommendations ............................................................................ 19

    About the Author ........................................................................................................ 20

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    3/22

    Cloud Computing Models for Data Warehousing

    Page 1

    Executive SummaryCloud computing is creating a new era for IT by providing a set of services that appears to have

    infinite capacity, immediate deployment and high availability at trivial cost. It's the result of theevolution of computing and communications technology from a high-value asset to a simple

    commodity. In that evolution, the focus shifts from the concept of computing as a physical thing

    in a data center to computing as a service, like electricity, that is accessible from the nearestnetwork connection.

    Today most organizations are looking at cloud as a way to lower data center and IT provisioning

    costs. While cost reduction is a real benefit, there is more value in the increased speed,flexibility and ease of delivery in cloud environments. The only way to gain these advantages is

    by changing the approach and practices for delivering applications and data. The real change inIT is a change to how work is done rather than adding a new technology.

    Early worries about loss of control over the environment are being outweighed by thecombination of lower costs, faster deployments and simpler scalability. Even so, not all

    deployments are moving to public cloud providers. Many organizations are adopting private

    clouds for some applications because of technical, performance and regulatory reasons.

    Database workloads are a particularly challenging area for the cloud environment. As a rule,cloud deployments beyond a moderate scale favor shared-nothing database architectures

    designed to run transparently in a multi-node environment. Despite the availability of thesedatabases, performance and scalability of relational query workloads can suffer in the public

    cloud.

    We are still in an early period of standardization and design of software to run in the cloud. Not

    all workloads are suitable for deployment on a collection of small virtualized commodity serverstoday. Business intelligence and analytic database workloads fall into this area, raising the

    importance of careful analysis for fit with both public and private cloud options.

    There are other reasons for not using public cloud environments. Data privacy and security

    regulations can prevent an organization from using a public cloud. Data movement and datamanagement between internal systems and the cloud may be enough of a challenge that it

    eliminates any speed or cost advantages associated with using the cloud.

    Private clouds offer a solution to these challenges. A private cloud is like a single-tenant version

    of a public cloud. The dedicated nature of a private cloud resolves the privacy and regulatorydifficulties and can resolve some of the technical challenges with BI workloads. There are

    tradeoffs between public and private clouds that make hybrid solutions likely for the next fiveto ten years.

    Teradata Active Data Warehouse Private Cloud is the first real example of a private cloud fordata warehousing workloads on the market, embodying key elements of self-service, pay-for-

    use and elastic growth and shrinkage of resources.

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    4/22

    Cloud Computing Models for Data Warehousing

    Page 2

    Cloud Computing Concepts

    Utility Computing

    Cloud computing is a model for delivering IT platform infrastructure. It's a shift from the idea of

    computing platforms as hardware and software products to the idea of computing platforms asa service used by applications, much as a household appliance uses electricity as a service. Thisutility computing model parallels the evolution of the electric industry.

    In the early days of electricity there was no electricgrid. Many small electric companies started with their

    own generators and wires running directly tocustomers, with the biggest demand being for street

    lights. Organizations and individuals wanting a reliableand controlled electric supply installed their own

    generation equipment that was sized to their needs.

    Generation and transmission technology matured,going through a commoditization process. Standards

    developed for electricity and the electric marketconsolidated into a smaller number of suppliers withinterconnected service. The availability of electricity

    as a service meant there was no longer a need forprivate generators. It's rare to find an individual with

    their own home generator today.

    Electricity available as a metered service to organizations resulted in a savings in capital assets

    since generators and transformers were no longer needed, a savings in resources to supply thegenerators (coal or oil), and an equally large savings in operations and engineering labor for the

    people who maintained the equipment.

    In similar fashion, the IT industry grew, spread and has been consolidating around a small

    number of large platform suppliers. The IT market today is much like the market for electricity ahundred years ago: organizations with a desire for reliable and controlled computing buy and

    run their own equipment.

    Cloud computing is the inevitable result of the commoditization of hardware and

    communications technologies. The combination of computing power and the ability to access itfrom anywhere means there is less need to buy and maintain hardware, much like universal

    access to electricity reduced the need for private generators.

    The important aspect of computing is the work done by applications and the data theygenerate. Computing platforms have become a commodity service that can be transmittedfrom a remote location. This is a fundamental disruption to the IT industry, a disruption we're

    still at the beginning of.

    Figure 1: Private home generator circa 1918

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    5/22

    Cloud Computing Models for Data Warehousing

    Page 3

    Cloud Definitions

    It's important to define terms before progressing. The most important is cloud computing,which the National Institute of Standards and Technology defines as " a model for enabling

    ubiquitous, convenient, on-demand network access to a shared pool of configurable computing

    resources (e.g., networks, servers, storage, applications, and services) that can be rapidly

    provisioned and released with minimal management effort or service provider interaction."1

    Core Characteristics of a Cloud

    The key elements NIST defines for a cloud that differentiate it from a cluster of servers in a data

    center or rented hardware at a hosting provider are:

    On-demand self-service. A consumer should be able to acquire computing resources asneeded without requiring human interaction with the service provider.

    Network accessibility. The capabilities provided should be available over a network usingstandard client software that is independent of any underlying hardware.

    Resource pooling. The computing resources are allocated from a shared pool in a waythat is transparent to the consumers of the service. The resources can be dynamically

    reassigned based on demand and have no strict dependence on physical location.

    Elasticity. Capacity should be dynamically provisioned so that it can grow or shrink on

    demand, and should appear as if it comes from an unlimited pool of resources.

    Measured service. Resources should be delivered in a "pay-for-use" model where the

    consumer is charged based on actual use of resources. The consumer should have theability to monitor and control resource use, making the billing process transparent.

    Cloud Deployment Models

    Cloud architecture is presented as being either public or private, which implies both who hasaccess and where it is located. The difference most commonly described is that public cloudspool resources and share them across many organizations, while private clouds are dedicated

    to a single organization.

    The tradeoff is important, as it means that private clouds don't have the benefit of pooled

    resources outside a single firm. This dedicated nature of the private cloud implies that costs willbe higher since they can't be shared across multiple organizations. There is also a hard limit to

    the scalability of the infrastructure, beyond which more physical resources need to be added.

    The other differentiator between public and private clouds is the ability to place a private cloud

    on-site, to own the environment but place it at a managed-hosting provider, or to obtain

    dedicated infrastructure from a cloud service provider.

    The use of a private cloud is driven by the need to maintain control of the service deliveryenvironment. This could be due to industry requirements, regulatory controls or specific

    performance requirements. Most organizations face data privacy or security regulations thatprevent them from locating data in uncertified facilities or locations.

    1 The NIST Definition of Cloud Computing, http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    6/22

    Cloud Computing Models for Data Warehousing

    Page 4

    The Benefits Reported for Cloud Computing

    The core benefits of a cloud over a traditional server environment vary by need and the type ofsystems deployed. Figure 2 lists the benefits reported by IT decision makers as reasons for

    moving to the cloud2.

    Figure 2: Benefits seen as reasons for deploying in a cloud deployment.

    The reasons are easily grouped into one of two categories: cost savings or time-to-value. Cost isthe biggest single factor for most people, but the real justifications are more complex, involving

    several combined benefits.

    A large part of the reduction in costs is from shared services and the metered billing. The

    pooled resources of the public cloud provider mean that they can keep all equipment at ahigher utilization than a traditional server farm, reducing the per-unit cost below what is

    normally possible in a dedicated data center. In a private cloud the resources are limited to onecompany, but may still be pooled across lines of business or departments, providing some of

    the same benefits.

    The ability to turn off or reduce the amount of resource dedicated to an application when it's

    not needed reduces the cost of running that application. In the case of a data warehouse, thehardware is usually sized to the peak workload. The data warehouse will use fewer resources

    than the peak most of the time. Paying for only the resources that are needed and paying whenthey are needed can drastically reduce costs.

    The ability to provision resources immediately without the procurement, delivery or setup timeinvolved in the traditional model is a key element to speeding projects. The reduced costs and

    the provisioning speedup have a secondary benefit. Many small projects that were too hard todeliver in the timeframe required, or that couldn't meet the ROI hurdle, become viable in the

    new environment.

    2 Source of data: IBM global survey of IT and line-of-business decision makers.

    39%

    39%

    39%

    39%

    40%

    40%

    42%

    44%

    46%

    47%

    50%

    Resolve problems related to updating/upgrading

    Able to scale IT resources to meet needs

    Relieve pressure on internal resources

    Rapid deployment

    Able to take advantage of latest functionality

    Reduce IT support needs

    Lower outside maintenance costs

    Lower labor costs

    Software license savings

    Hardware savings

    Pay only for what we use

    Cost reduction

    Reduce time to value

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    7/22

    Cloud Computing Models for Data Warehousing

    Page 5

    The Promise of Cloud Computing for Data Warehousing

    Organizations often look at the IT department first when cutting costs. IT is an obvious starting

    point because of the combination of high capital costs, high labor costs and the challenge ofexplaining the value of IT.

    The business intelligence (BI) group is one area of increasing expense in many organizations. BIand analytics have been in the top five IT spending priorities for several years according to

    multiple CIO surveys. With increasing spending comes increased scrutiny.

    Lower Costs

    Cloud computing is seen as a potentially inexpensive alternative to conventional data

    warehouse servers. The lowering of costs is due to a number of factors. First is the economy of

    scale that a cloud provider has when purchasing and pooling resources across a large customerbase. This achieves utilization higher than an internal data center allowing the per-unit cost of

    the provider to be less than the per-unit cost in IT.

    Second, the unit cost of incremental growth is less expensive in the cloud. In the traditional

    model, the incremental cost of scaling up a data warehouse involves adding expensive serverresources, upgrading to a larger server or adding nodes to a cluster.

    Because this is a capital cost, it is usually planned far in advance and more resources arepurchased than are immediately needed. In the cloud, the incremental cost is limited to only

    the resources needed at a point in time and is paid after use.

    The elastic environment of cloud computing translates into cost savings. The model is usage

    based, so resources may be reduced when they aren't needed, and increased when required.

    This same elasticity applies to development and test environments, which can be shut downwhen not needed. In a traditional environment, the hardware and software must be purchasedup front.

    There is no need for hardware upgrades in the public cloud, simplifying operations. The reasonfor upgrades is due to the aging of physical assets and the need for increased capacity or

    performance. The complexity of managing upgrades disappears when the provider managesthese as part of the service.

    If hardware is changed by the service provider, it will be transparent to the virtual machinelayer running on top of that hardware. If additional capacity or performance is needed by IT,

    resources are added on demand, bypassing the need for upgrades.

    The cost of operations is lower in the public cloud because the provider delivers this as part ofthe service. There are no data center, systems, storage or network administration costs. Onlythe DBA or developer must be involved to determine capacity and performance needs.

    The proper cost comparison between traditional and cloud models is not just hardware, but thetotal picture of data center and operations costs associated with the environment.

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    8/22

    Cloud Computing Models for Data Warehousing

    Page 6

    Faster Delivery

    Using the cloud for database infrastructure removes procurement and provisioning barriers

    that slow projects. Imagine if, instead of waiting months for budget approval, order processing,

    shipment, setup and configuration, you could start development on projects immediately anddeploy them into production with no delays due to operations.

    If it took less than an hour to add the capacity needed, what would you do differently? For a

    start, resources dedicated to provisioning could be moved to more productive work. Projectdelays based on capacity and performance would no longer be a problem.

    On-demand self service enables faster provisioning of resources for a data warehouse, whetherit's for the initial install or adding additional resources for more capacity or better performance.

    The benefit isn't limited to the production environment. Ongoing projects in a BI program canbe delivered more quickly since development and testing environments can be created and

    removed as needed.

    The entire procurement process is accelerated by the pay-for-use model. Because the cloud istreated as a service, it is paid for as an operational expense rather than as a capital expense.This removes the need for a capital budget for infrastructure, simplifying the IT budgeting

    process.

    Unexpected BI requests and unplanned projects are common in most organizations. These

    usually require additional hardware resources. The need for a capital acquisition can slow orstall a request as the budget changes are allocated and approved. If costs can be expensed then

    they can easily be paid by the group making the request without the need for a slow budgetapproval process.

    Performance and ScalabilityPerformance and scalability are the two biggest challenges faced by most data warehouse

    DBAs. Performance management is even more challenging for operational BI workloads withstrict performance service level agreements (SLA). This involves a lot of effort managing

    workloads, tuning, and possibly segregating some work to a separate environment.

    The approach taken to performance management in the cloud is different. The cloud allows

    hardware resources to fluctuate around demand. The static model of hardware in thetraditional environment is replaced by a dynamic model of computing resource delivery.

    There is no need to size a public cloud environment for a planned peak capacity to meet an SLA.Resources are effectively unlimited, so the DBA can specify an SLA and allow the environment

    to automatically adjust resources as the system is running. This replaces the need to throttlework that is interfering with performance of critical workloads. Public cloud database offerings

    today lack this type of SLA management.

    An additional tool for the DBA is the ability to provide feedback to the departments that

    consume more resources than expected. The consumers can decide whether the work they aredoing is worth the additional cost of dynamic resources.

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    9/22

    Cloud Computing Models for Data Warehousing

    Page 7

    This changes the nature of discussion about performance. It's now a discussion about the cost

    of doing a set of work. The cost remains static whether the work is done on one machine in ahundred hours or a hundred machines in one hour. IT can discuss with users the value of the

    work being done, and they can make rational decisions about work, cost and timeliness.

    Agility

    The combination of increased speed, lower cost and the elastic nature of the cloud translateinto agility for the data warehouse. The faster turnaround of BI projects means a more

    responsive BI group that is better able to address unplanned requests.

    The measure of a BI organization is their ability to handle the normal workload and meet

    unexpected demands. Every organization has unplanned projects. Many smaller projects have atime limit after which they are no longer valuable.

    Today, one-off projects and those with an unclear benefit are hard to justify because of thecapital budgeting process and the time and effort to provide resources. In a cloud environment

    they can be built at lower cost, and if they don't have the expected benefit they can quickly be

    shut down with no sunk cost in hardware or software licenses.The combination of fast provisioning for development and new production workloads and thelower cost of doing so allows the BI group to complete more of these projects. It also allows the

    BI group to give some control over resources and priorities to others in the organization,supporting more agile BI development practices.

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    10/22

    Cloud Computing Models for Data Warehousing

    Page 8

    Database Workloads and Their Fit with Cloud InfrastructureUsing the public cloud is not a simple migration of applications. Some applications benefit more

    than others, and some are more appropriate for private than public cloud deployment.

    The most important factor for deciding the suitability of the public cloud is the workload.

    Workloads have different characteristics which make them more or less suited to today'stypical cloud environment. There are three primary system workloads in IT.

    OLTP Transaction processing is a mixed read-write workload which can be lightly to verywrite-intensive. OLTP requires low latency response, accesses small amounts of data at one

    time and has predictable access patterns with few, if any, complex joins between tables.

    Business intelligence BI workloads are read-intensive, with writes usually done during off-

    hours or in ways that don't compete with queries. While quick response times are desired,

    they are not typically in the sub-second range that OLTP requires. Data access patterns aremore unpredictable, read lots of data and can have many complex joins.

    Analytics Analytic workloads are both compute-intensive and read-intensive, similar inmany ways to BI except that access patterns are more predictable. They generally accessentire datasets at one time, sometimes with complex joins prior to doing computations.

    Most analytic workloads are done in a batch mode, with the output used downstream via BIor other applications.

    The success of large scale and extremely high concurrency workloads at Web and onlinestartups demonstrates how well the cloud can support transaction processing. These workloads

    are simpler to run at scale because the data volume and complexity of an individual transactionis small and easy to isolate, simplifying the back-end database. It's easier to gain scalability by

    spreading the back-end work across virtual servers in the cloud.

    Shared-nothing Databases are Required to Support BI in the Cloud

    Business intelligence and analytic database workloads are at an intersection of requirementsthat makes them harder to run in the public cloud. BI queries normally retrieve some, but not

    all, of the data. This selectivity poses challenges for brute force cloud processing models.

    The query needs of BI can't be met by the cloud databases available today because they are

    either single-node or non-relational. Single-node databases can't scale in a cloud environment.

    A public cloud is more like a collection of equally-sized small nodes that are used as building

    blocks. Increasing the resources of a single node in public clouds is very limited when comparedto on-premises hardware. If a database can't grow past the boundary of a single cloud node,

    then the system has an inherent performance limit.

    The non-relational or "NoSQL" databases are designed more like object stores, with limited to

    no SQL support or ability to join tables. Typical BI queries require joins across many tables, andthey process significantly more data than what is found in a single transaction. These databases

    are designed mostly to address OLTP problems, much like the standard relational databases inuse today.

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    11/22

    Cloud Computing Models for Data Warehousing

    Page 9

    Another complication is the mismatch between BI tools and cloud databases. Most cloud

    databases are non-relational, making them incompatible with the SQL-based query, reportingand dashboard tools in use today.

    Architects face a technical problem when moving to a cloud environment: relational databases

    are the primary choice to support BI workloads, but conventional databases are a poor match

    for public cloud infrastructure. These databases are designed to run on a single system or in ashared-disk cluster. The way to increase database performance for larger data volumes is by

    making the servers larger.

    The optimal database architecture for cloud deployment is a shared-nothing database. The

    massively parallel processing (MPP) database model matches the architecture of a cloudenvironment. Shared-nothing relational databases, unlike their traditional counterparts, are

    designed to function in a distributed hardware environment such as the cloud.

    Shared-nothing Databases: Necessary but Not Sufficient

    Cloud computing is more than a hardware platform. The concept includes the ability toprovision easily, to dynamically adjust resources as needed, and to pay for use rather than

    paying upfront for hardware and software.

    A data warehouse platform that is truly a cloud service should include all these capabilities.

    While provisioning public cloud resources may be simple and inexpensive, extending a databaseacross more nodes is usually not. Adding resources to a database in the public cloud can require

    extensive work by administrators to redistribute data. Self-service data and resourceprovisioning is required in order to speed projects.

    Database licensing presents a second obstacle to provisioning. Most MPP database vendorshave some form of scalable pricing based on data volume or nodes today. The concept of

    elasticity challenges the way vendors sell their products. An elastic model implies thatresources can grow and shrink. Current vendor licensing assumes that resources and costs can

    only grow.

    Most database vendors do not allow customers to pay based on actual use. These vendors lack

    the concept of an abstract service with usage that can be metered, unlike the Teradata ActiveData Warehouse Private Cloud which provides flexible pricing options that can grow and shrink

    based on resource usage. Without this, they have no way to measure the use of the databaseand charge for it. They are trying to sell product in a service delivery world.

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    12/22

    Cloud Computing Models for Data Warehousing

    Page 10

    Public Versus Private Clouds for BI and Analytic DatabasesThe public cloud solves many problems, but it introduces new challenges as well. Platform

    requirements vary based on workload characteristics, data sensitivity and businessrequirements. It's important to understand what the challenges are in order to determine

    which cloud deployment model, public or private, can deliver the greatest benefit.

    Challenges with the Public Cloud Today

    Performance for BI Workloads. Public cloud infrastructure is dramatically different from theconventional hardware environment used for data warehousing. The cloud is built with uniform

    commodity components rather than high-end servers. This means there are no high-speedinterconnects between nodes, no high-speed I/O subsystems, and direct-attached high-speed

    disks are rare.

    These differences can slow the performance for BI workloads. Data warehouse databases are

    usually configured to work with hardware that is designed for heavy I/O workloads, somethingnot normally considered in public clouds. This means that even if they are able to run in the

    cloud, they probably will not perform well.

    Public clouds are multi-tenant. A node in the public cloud is mostly likely not a server, but one

    of several virtual machines running on a single server. While the memory, CPU and storage arededicated, the virtual machines generally share the same internal bus, network hardware and

    I/O channels.

    Most databases expect hardware to be dedicated rather than shared. This can lead to hidden

    conflicts as an unknown virtual machine saturates the underlying server's shared resources,causing one node of the database to run slower.

    Private clouds can deliver a better performing option than public clouds. The hardware can be

    configured for heavy database workloads, either by IT or by a vendor. In an appliance model,the vendor has already specified the proper hardware configurations.

    Legal and Regulatory Challenges with Data. The use of a private cloud is often driven by theneed to maintain stricter control over data. Multiple organizations' data can be mingled in thesame database, creating a liability if the public cloud provider accidentally exposes that data orsystems that access it.

    Privacy laws in many countries regulate where data can be stored or moved. In a public cloud,

    it's not normally apparent where the data is stored. This makes it difficult, if not impossible, foran organization to dictate that their data remain within a specific country's borders. Use of a

    public cloud will be bound by these regulations.

    Using a private cloud allows an organization to control the physical location of the

    infrastructure and who has network access to the systems. This provides most of the publiccloud computing benefits in situations where regulatory controls dictate that data reside only in

    certain locales or data centers, or be stored in non-shared databases.

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    13/22

    Cloud Computing Models for Data Warehousing

    Page 11

    Data Management and Integration. The public cloud adds complexity to data management.Most data integration and data management tools in use today are not designed forcompatibility with the infrastructure and communication standards. They also share the same

    license incompatibilities with databases for deployment in a cloud environment.

    The location of data that is loaded into the database is another consideration. When the data to

    be loaded is largely external, data movement is less of a challenge. When the data originatesinside a data center and has to be sent out over the typical bandwidth-constricted network

    connections, data movement can be an obstacle.

    In the case of ETL-style workloads where data can move back and forth many times,

    performance can suffer. There is a possibility of increased cost too, as some cloud providers addcharges for data movement into or out of the cloud.

    Security Requirements. The public access nature of the cloud and multi-tenant cloud databasesis a problem for some IT organizations. Most databases assume they are running inside a

    private data center, not exposed as an endpoint on the Internet. They haven't been designedfor open public exposure.

    This creates additional compliance and security costs because there are more components tomonitor and exceptions to standard practices that must be managed. The software andpersonnel costs can outweigh the benefits of running the database externally. According to one

    survey, 75 percent of financial services companies said that concerns about data security andprivacy were the biggest obstacle to using a public cloud.3

    Benefits of Private Cloud over the Traditional Server Model

    The ability to run BI workloads in a public cloud is limited today. Until public cloudinfrastructure can be configured to take into account the specialized needs of BI and analytic

    query workloads, performance at scale will be challenging. The regulatory, security and privacyconcerns will prevent some organizations from using the public cloud.

    For this reason, many organizations are using private clouds while the public cloud technologiesand practices mature and standardize. The private cloud offers the control that is needed over

    the environment to meet these regulatory, security and data management concerns while stilldelivering the performance, cost and scalability benefits.

    IT departments are concerned about resource utilization in the data center. A data warehouseor mart has highly variable workloads that can leave an expensive server idle for long periods.

    With multiple marts, the inefficiency is even more pronounced as multiple servers areunderused. A private cloud offers more efficient use of these resources.

    With a private cloud, the data warehouse environment can be sized to the baseline workload inorder to maintain much higher server utilization. When the workload increases, the elastic

    facilities allow the environment to maintain a constant high utilization while scaling up, andthen shrink back to the baseline as the workload subsides.

    3 IBM global CIO survey, 2010.

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    14/22

    Cloud Computing Models for Data Warehousing

    Page 12

    A private cloud provides the ability to consolidate multiple data marts onto a single platform

    and maintain utilization rates of more than 90% for all of the environments, taking advantage ofthe elastic capability to keep performance consistent. This provides significant cost savings from

    efficient use, as well as reducing license and maintenance costs by running a leaner datawarehouse environment.

    Figure 3 compares some of the key attributes of the traditional server-based and private cloudmodels. A private cloud can deliver many of the public cloud benefits for a data warehouse

    while avoiding some of the limitations of the public cloud.

    Traditional server model Private cloud

    Initial purchase time weeks to months weeks to months

    Initial install time days to weeks afterreceiving hardware

    days to weeks after

    receiving hardware

    Time for incremental purchase weeks to months(including CapExapprovals)

    minutes to hours*

    Startup costs High: servers, storage,network, software,

    resources to configure theenvironment

    Moderate: servers,storage, network,

    software, resources toconfigure the

    environment**

    Cost model CapEx, typically TCO or

    ROI justification;depreciation overextended period

    Mix of CapEx and OpEx,

    monthly fee based onuse***

    Incremental scale costs High Low****

    Performance for BI/DW workloads High High

    Control over data location andaccess

    High High

    Figure 3: Comparison of key attributes in the deployment models.

    ____________________________________________________________________________________________

    There are some aspects of private cloud where the answer depends on the context*

    Cloud speed for incremental scale unless at a hard physical boundary where more physical hardware is

    needed, then there is a larger additional cost for most cloud capacity.**

    Startup costs vary. A white box hardware environment requiring full purchase and configuration is similar to

    the traditional model, therefore high. A vendor appliance format for a private cloud offering is low er.***

    CapEx for the initial environment; OpEx for the ongoing scale and elastic properties.****

    Low cost to scale until hitting a hardware boundary requires another capacity purchase.

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    15/22

    Cloud Computing Models for Data Warehousing

    Page 13

    Cloud Adoption Preferences

    Due to the limitations of the public cloud, adoption for data warehouses is still low. Privatecloud use for some workloads is growing faster than the same use in the public cloud, and

    varies by industry.

    Financial services is an example of an industry with many different workloads on large volumesof data. Figure 4 shows the preference of decision makers in financial services for deploymentof the three core workloads4.

    Figure 4: Stated deployment preferences for different workloads in the financial services industry.

    The financial services industry tends to favor options that allow for more control over databecause of the many security, privacy and legal requirements it faces. The industry profile ismore likely to match mainstream IT behavior than the many examples of cloud use reported by

    Web startups and on-line businesses. The innovators and early adopters in the technology

    industry don't share the technical and organizational barriers that many IT organizations face.

    4 Source of data: IBM global survey of IT and line-of-business decision makers.

    9%

    21%

    11%

    52%

    44%

    45%

    39%

    35%

    44%

    Data mining, text mining, or

    other analytics

    Data warehouses or data

    marts

    Transactional databases

    Prefer not to use cloud

    Private cloud preference

    Public cloud preference

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    16/22

    Cloud Computing Models for Data Warehousing

    Page 14

    Private Clouds and the Data Warehouse in ActionThis section describes the experiences of several organizations moving from traditional server-

    based data warehousing to an Analytic Private Cloud that delivers scalability, stableperformance, self-service provisioning, and permits more flexible deployment and payment

    options.

    Creating a Consolidated Data Platform

    Data warehousing was historically built around the idea of a single reporting system rather thanthe concept of delivering a platform for multiple uses. The primary goal of data warehousing is

    a combination of centralized management of important data as well as providing the capabilityto build systems on top of the data warehouse that can use the data.

    Our industry has not paid as much attention to the latter. Organizations need more than apassive repository. They need a platform than enables multiple, different uses of data. A

    platform addresses data infrastructure that will support different workloads, data latencies andperformance requirements.

    One IT executive describes the difference between the system-oriented view of the datawarehouse and the view of the data warehouse as a platform as being like the difference

    between using a server and using the cloud. A platform allows multiple, sometimesincompatible uses. "The idea of a cloud gives us something we didnt have with the traditional

    model. The cloud allows us to build an enterprise data warehouse but without the constraint ofenforcing a single universal data model to solve all needs," he said. This allows them to have an

    enterprise data warehouse, but also attach data marts to it within the same platform.

    The private cloud provides an alternative to building many marts scattered across the

    organization. There is still a central data model, but also the ability to manage otherarrangements of the data, or data that is not included in the core data model. "The [Teradata]

    Data Lab lets us support separate environments for different groups that are integrated andmanaged within a single platform," he said.

    It may appear to be easier to give different groups their own data marts focused on theirspecific needs because the incremental cost of expanding a difficult-to-scale single database is

    too high. As one BI director says, "If you have a central data warehouse with performanceproblems, then adding to it is unlikely, and instead you add another mart."

    The challenge with this divided approach is not the cost of managing single systems, but thetotal cost of the BI environment. While individual marts address specific needs, they introduceredundancy and complexity. They make integrating data harder because it is stored in silos,

    introducing data reconciliation problems.

    Most of those who consolidated multiple data marts onto a Teradata platform were lessconcerned about gaining better performance than they were about reducing the cost and

    complexity of their BI environment. "We want a central clearing house for data without themessy moving around of data to lots of databases we have today," said one IT manager of his

    goal. "Building a [single] information repository is where the [Teradata Analytic Private] cloud isespecially valuable."

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    17/22

    Cloud Computing Models for Data Warehousing

    Page 15

    Organizations need better ways to manage and deliver data. The challenge is that the more you

    make data a centralized resource in a traditional server model, the more difficult it is tosupport, and the higher costs go. Creating multiple data silos as a way around these problems

    adds more complexity and cost. Private clouds offer a better way to deliver this environment.

    Costs, Budgeting and Planning

    Cost savings is still the biggest driver for people looking at cloud computing alternatives. It's areaction to the rising costs from data growth and the increasing cost of performance as the

    data grows.

    The incremental growth of infrastructure is reduced during budget tightening periods. "Capital

    budgets are very competitive," said one BI manager. "When times are tougher, we get askedwhether we've done everything we can to squeeze out more performance before they'll

    approve any new money. If we're at the absolute limit, then you can add resources."

    Worry about the cost of a desired level of performance and the incremental cost of scale are

    symptomatic of a way of thinking: data warehouse as a product rather than as a service. The

    metered payment model of the cloud lessens these concerns for IT. As one BI managerdescribed it, "You buy a thing with limited capacity and you're constrained by it, instead ofbuying as much or as little capacity as you need at one time."

    The new model for planning data warehouse infrastructure is the way we use electricity. Aproduct-centric model means diligent capacity planning and up-front payments, while a service

    model means paying for what you need when you need it.

    The cloud pay-for-use model allows companies to shift the mix of money from capital expenses

    (CapEx) to operating expenses (OpEx). Capital budgets take longer term planning and are doneannually at most firms.

    This is at odds with the dynamic nature of information use, where capacity and performance

    needs fluctuate throughout the day, month to month and over annual cycles. "For us it's aboutnot having to commit large amounts of capital up-front for data projects that might not last,"said one BI executive.

    "Getting CapEx is really tough. Avoiding that is the number one benefit for us," reportedanother BI executive. "Every time we need to upgrade our systems [during the year] we needed

    to purchase hardware. This took months every time. Now we don't have to go to the executivecommittee. We pay for what we consume."

    The benefit isn't solely consumption-based pricing. Consumers can provision on demand insmall increments rather than the large increments of a server-based environment. This allows

    resources to be committed a little at a time.

    With the cloud, responsibility for much of the BI platform budget is pushed to the business

    departments as part of their operating expenses. This has the added benefit of fastermanagement approval for BI projects. There are no high up-front costs, no capital budget

    approval process, and each department funds only what it uses.

    A budgeting challenge pointed out by one BI manager was the way costs are allocated for

    projects shared by several departments. "If the priorities for one department change and they

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    18/22

    Cloud Computing Models for Data Warehousing

    Page 16

    drop out, everything changes. We can't buy what we planned, and we have to redo our

    justification and ROI model. This takes weeks." The flexible pricing models ofTeradatasAnalytic Private Cloud allow purchases like this to be easily readjusted.

    A benefit of the on-demand and pay-for-use model is transparency for the data warehouse

    budget and better visibility into use. Most organizations allocate the total CapEx cost across

    departments. In this model, it's common for some departments to complain that they are beingunfairly hit with more than their fair share of costs.

    To deliver pay-for-use, monitoring is better in the cloud environment. Better monitoring canprove or disprove departments' unfairness claims.

    Metered billing aligns payment with consumption. "One department was paying 50% of thedata warehouse budget under the old allocation model," reported one BI manager. "The

    problem is they used about 20% of the resources while another department paid 20% and used40%. Both groups were surprised when we showed them the real numbers."

    Managing Performance

    As one IT executive stated, "Performance management is everyone's biggest issue here." Thechallenge with performance is that data warehouse workloads are characterized byunpredictable demand. "Unexpectedly spiking workloads are the major problem," he said.

    The only solution in a server-based model is to size the environment for the expected peakworkload and adjust the resources annually or if lucky, a few times per year. On-demand and

    automatic resource provisioning and the elastic nature of the Teradata Analytic Private Cloudcan help with unplanned demand.

    BI applications, operational BI and management dashboards often have performance SLAs. Thegoal of the SLAs is to keep the user experience consistent. It's simpler to maintain consistency

    in a cloud environment by allowing resources to fluctuate around a specified performance level.

    This is even more important if one consolidates data marts onto the same platform. The marts

    can't have a negative impact on performance for existing users, and the mart shouldn't performworse than it did when it was standalone. Elastic on-demand resources can be used to support

    this consolidation of workloads.

    "Scale and performance in minutes, not months" is how a DBA manager describes it. There is a

    choice with elasticity. One can keep costs steady with a performance ceiling, or keepperformance steady with variable costs. The only option in non-cloud environments is the

    former.

    "We see consumption jump immediately after an upgrade because of the backlog of things that

    could impact performance for other more important applications," he said. "By the end of theyear, the warehouse is hitting a performance ceiling, and we have to delay or stop new

    requests. That problem goes away with the Teradata [Active Data Warehouse] cloud."

    New BI workloads are a source of growing performance demands. Exploratory analysis and data

    discovery can require large volumes of data. Unlike reporting, they need to deliver the datainteractively in real time. Unlike dashboards, data access is highly unpredictable.

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    19/22

    Cloud Computing Models for Data Warehousing

    Page 17

    This holds true for analytics projects as well. Many analytics projects require high consumption

    of resources for periods of a few days. These projects use enough resources to slow everyoneelse down, or they are moved lower in workload priority so they don't affect other users of the

    warehouse. Either way, someone is hurt by the performance problem.

    In a private cloud the elastic resources and self-service provisioning support these new uses.

    Performance can be maintained at a steady level by allowing resources to fluctuate based onthe current workloads in the warehouse.

    Agility

    There are several aspects to agility. Speed of deployment is a large factor in BI projects because

    of the long lead times associated with hardware procurement and setup. The on-demandcapability of a cloud allows a team to quickly develop, test and deploy projects that would

    require coordination across several operations groups in a traditional environment, takingweeks or months longer.

    "On-demand resources mean we can deliver projects as needed," said an IT manager using

    Teradata Data Labs. "We can set up the environment for a project overnight. If the projectwasn't valuable enough to the sponsors, we can end it and the resources are just given up withno penalty or cost."

    In a central data warehouse the higher ROI projects get done first. A side effect is creation of aset of projects that get perpetually pushed down in priority because they don't offer enough of

    the right kind of benefits to justify working on them. Speed to deploy and to expand resourcesin a Teradata Active Data Warehouse Private Cloud removes bottlenecks that halt smaller

    projects.

    Many business projects aren't planned more than a year in advance when the data warehouse

    team plans hardware and software purchases. Departments build these unplanned projects

    themselves when the BI team can't deliver them. "This is what really happens to unplannedprojects here. If they are important enough, they get done independently outside the datawarehouse as independent data marts," reported a DBA.

    These data silos recreate the problems of irreconcilable metrics, integration complexity andcost that a data warehouse is supposed to remove. On-demand provisioning and an elastic

    platform allow IT to deliver these projects.

    Provisioning delays don't exist in an on-demand cloud, allowing the BI team to deliver projects

    when needed. With a scalable on-demand platform like the Teradata Active Data WarehousePrivate Cloud, it's also possible for departments to do self-funded projects but build them on

    the warehouse.

    "We had massive growth in consumption after moving to a more flexible platform because we

    were always hemmed in by capital investment in the past," said one IT executive. "Now we nolonger have that artificial limit on what to do with limited resources."On a Teradata Active Data

    Warehouse Private Cloud the BI group can provide a platform for data marts or one-off projectsthat is identical to the data warehouse, simplifying the overall environment.

    The result is faster project delivery, which means the ability to complete more projects and theability to do smaller projects that were often not valuable enough to justify the months of

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    20/22

    Cloud Computing Models for Data Warehousing

    Page 18

    effort to develop and deploy. Many of these small projects are also temporary in nature, or the

    benefit is uncertain until they've been done as a proof of concept.

    These examples of organizations deploying various flavors of self-service provisioning, on-demand capacity and elastic models demonstrate key benefits. They show how the data

    warehouse might evolve into a data platform capable of serving a broader set of needs while

    simplifying the environment.

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    21/22

    Cloud Computing Models for Data Warehousing

    Page 19

    Conclusion and RecommendationsOur current assumptions about the data warehouse environment are changing. With the

    existing product view, hardware and software platforms can be more expensive, take a longtime to purchase, take a long time to provision once purchased, are available in large

    increments, require up-front payment and are assets that are depreciated over a long term.

    The computing-as-a-service view inverts these assumptions. The cloud view is that the platform

    is an inexpensive commodity service, purchased and provisioned immediately, paid for in smallincrements in arrears, and treated as an expense rather than a depreciable asset.

    The public cloud is still evolving both technically and procedurally. It's maturing as a platformfor applications and computational processing. As a platform for BI and analytic database

    workloads, the public cloud is still immature.

    Private clouds for data warehouse workloads deliver many of the benefits of public cloudswithout some of the problems. Successfully adopting cloud computing for BI and analyticdatabase workloads requires a shift in technology and methodology. The discussion with

    organizations using elements of a private cloud model highlights several key findings.

    A private cloud is the better option today for gaining the scalability, elasticity, and

    performance management capabilities cloud computing provides. The private cloud alsooffers data warehouse teams a more controlled environment than the public cloud for

    learning how to change development and management practices while the public cloudmatures from today's early state.

    An environment without pay-for-use software licensing doesn't offer the cost anddeployment benefits that a private cloud should offer, making this an important

    element to evaluate.

    Paying for use is a major change in the organization. A key recommendation is tocontinue with the existing budget allocations while running the new payment model inparallel. This allows everyone to understand the OpEx budget impact, how processes

    must change, and how much business units actually use the data warehouse.

    Good usage monitoring by the platform is key. While important for billing, ongoing

    monitoring provides an early warning of unexpected spikes in use. This is helpful tomanage SLAs as well as avoiding end-of-month billing surprises for departments.

    Capacity planning and performance management change, becoming simpler, but dontgo away. Monitoring provides the baseline for planning both expense budgets and the

    resource needs.

    There are enough differences between the private cloud and the traditional model thattransition planning is vital to success. There are no cookbooks for the best way to implement aprivate cloud for data warehousing. One must learn how people will use the new environment,

    and then adjust development, deployment and administration practices to follow.

    We are still in an early stage of the market. The benefits have been demonstrated by early

    adopters. As the cloud market matures, data warehousing will be an important beneficiary.

  • 7/31/2019 EB-6592 Cloud Computing Models for Data Warehousing

    22/22

    Cloud Computing Models for Data Warehousing

    About the AuthorMark Madsen is a research analyst focused on analytics, business intelligence and information

    management. Mark is an award-winning architect and former CTO whose work has beenfeatured in numerous industry publications. He is an international speaker and author. For

    more information, or to contact Mark, visit http://ThirdNature.net.

    Third Nature is a research and consulting firm focused on new practicesand emerging technology for business intelligence, analytics and

    information management. The goal of the company is to helporganizations learn how to take advantage of new information-driven management practices

    and applications. We offer consulting, education and research services to support business andIT organizations and technology vendors.

    Teradata is the worlds largest company focused onanalytic data solutions through integrated datawarehousing, big data analytics, and business

    applications. Only Teradata gives organizations theadvantage to transform data across the organization into actionable insights empowering

    leaders to think boldly and act decisively for the best decisions possible. Visitteradata.com

    Third Nature Inc., 2012. All rights reserved.

    Teradata and the Teradata logo are registered trademarks of Teradata Corporation and/or its affiliates in the U.S.

    and worldwide. No part of this document may be reproduced in any form or incorporated into any information

    retrieval system, electronic or mechanical, without the permission of the copyright owner. Inquiries regarding

    permission or use of material contained in this document should be addressed to:

    Third Nature, Inc.PO Box 1166

    Rogue River, OR 97537

    EB-6592 > 0412

    http://thirdnature.net/http://thirdnature.net/http://www.teradata.com/http://www.teradata.com/http://www.teradata.com/http://www.teradata.com/http://thirdnature.net/