5174affa160bd cloud computing big data

Upload: marcos-rangel

Post on 07-Aug-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/21/2019 5174affa160bd Cloud Computing Big Data

    1/4

    Page 1

    Cloud computing has been generating considerable hype these

    days. Every participant in the datacenter and IT ecosystem hasbeen rolling out cloud initiatives and strategies from hardware

    vendors, ISVs, SaaS providers, and Web 2.0 companies - start-

    ups and incumbents are equally active.

    Cloud computing promises to transform IT infrastructure anddeliver scalability, flexibility, and efficiency, as well as new

    services and applications that were previously unthinkable.

    Despite all of this activity, cloud computing remains as

    amorphous today as its name suggests. However, one criticaltrend shines through the cloud Big Data. Indeed, its the core

    driver in cloud computing and will define the future of IT.

    BIG DATA THE PERFECT STORM

    Cloud computing has been driven fundamentally by the need toprocess an exploding quantity of data. Data is no longer measured

    in gigabytes but in exabytes as we are Approaching the

    ZettaByte Era.1 Moreover, data types structured, semi-

    structured, or unstructured continue to proliferate at analarming rate as more information is digitized, from family

    pictures to historical documents to genome mapping to financial

    transactions to utility metering. The list is truly unbounded. Buttoday, data is not only being generated by users and applications.

    It is increasingly being machine-generated, and such data is

    exponentially leading the charge in the Big Data world. In a

    recent article, The Economist called this phenomenon the Data

    Deluge (http://www.economist.com/opinion/displaystory.cfm?

    story_id=15579717).

    One can argue that Web 2.0 companies have been pushing the

    upper bounds of large-scale data processing more than anyone.That being said, this data explosion is not sparing any vertical

    industries financial, health care, biotech, advertising, energy,

    telecom, etc. All are grappling with this perfect storm. Below are

    just a few stats:

    Google was processing two years ago more than 400PB of

    data/month in just one application

    The New York Times is processing an 11-million-story

    archive dating back to 1851

    eBay processes more than 50TB/day in its data warehouse

    CERN is processing 2GB/second for their most recent

    particle accelerator

    Facebook crunches 15TB/day into a 2.5PB data warehouse

    Without question, data represents the competitive advantage of

    any enterprise, and every organization is now encumbered withthe task of storing, managing, analyzing, and extracting value

    CLOUD COMPUTING: BIG DATA IS THE FUTURE OF IT

    Winter 2009 | Ping Li | [email protected]

    from this exponential data growth as inexpensively as

    possible.

    Previous computing platform transitions had technology

    dislocations similar to cloud computing but along different

    dimensions. The shift from mainframe to client-server wasfueled by disruptive innovation in computing horsepower that

    enabled distributed microprocessing environments. The

    following shift to web applications/web services during the lastdecade was enabled by the open networking of applications and

    services through the internet buildout. While cloud computing

    will leverage these prior waves of technology computing andnetworking it will also embrace deep innovations in storage/

    data management to tackle big data.

    Along these lines, many of the early uses of cloud computing

    have been focused less on computing and more on storage.For example, a significant portion of the initial applications on

    AWS were primarily leveraging just S3 with applications

    executing behind the firewall. Popular storage applications, like

    Jungle Diskand Smug Mug,were early AWS customers. Thisexplosion of data has driven enterprises (and consumers for

    that matter) to find cheap, on-demand storage in unlimited

    quantities which cloud storage promises to deliver. Untilnow, massive tape archives in the middle of nowhere (like Iron

    Mountain) have been the only means to achieve that cheapstorage. However, enterprises today need more; they need

    quick access data retrieval for multiple reasons, fromcompliance to business analytics. It is simply no longer

    sufficient to have cold data; rather, it needs to be online and

    resilient (and cheap, of course); hence, the accelerating shift

    towards storing every piece of data in memory or on disks

    (Data Domainsmartly rode this trend).

    The need to balance data availability/usability and costeffectiveness has prompted significant innovation in both on-

    premise and hosted cloud storage cloud storage systems

    (Caringo, EMC Atmos, and ParaScale, to name just a few),

    flash-based storage systems (Fusion IO, Nimble Storage,Pliant, etc.) are just some current examples. Furthermore,

    hierarchical storage management (HSM, which has always

    sounded great but has been implemented only rarely) will

    become an important element in storage workflows.Enterprises will require seamless capability to move data

    across different tiers of storage (both on-premise and into the

    cloud) based on policy and data type to maximize retrievalcosts. As cloud computing matures, true cloud applications will

    be (re)written to leverage hierarchical and cloud-like storage

    tiers to retrieve data dynamically from different storage layers.

    1 Source: Approaching the Zettabyte Era. Cisco, 16 June 2008.

    http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.htmlhttp://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.htmlhttp://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.htmlhttp://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.htmlhttp://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.htmlhttp://www.economist.com/opinion/displaystory.cfm?story_id=15579717http://www.economist.com/opinion/displaystory.cfm?story_id=15579717http://www.economist.com/opinion/displaystory.cfm?story_id=15579717http://www.economist.com/opinion/displaystory.cfm?story_id=15579717http://aws.amazon.com/http://www.jungledisk.com/http://www.smugmug.com/photos/best-photo-sharing/http://www.smugmug.com/photos/best-photo-sharing/http://www.ironmountain.com/http://www.ironmountain.com/http://www.ironmountain.com/http://www.ironmountain.com/http://www.datadomain.com/http://www.caringo.com/http://www.emc.com/products/detail/software/atmos.htmhttp://www.emc.com/products/detail/software/atmos.htmhttp://www.parascale.com/http://www.fusionio.com/http://www.fusionio.com/http://www.nimblestorage.com/http://www.plianttechnology.com/http://www.plianttechnology.com/http://www.nimblestorage.com/http://www.fusionio.com/http://www.parascale.com/http://www.emc.com/products/detail/software/atmos.htmhttp://www.caringo.com/http://www.datadomain.com/http://www.ironmountain.com/http://www.smugmug.com/photos/best-photo-sharing/http://www.jungledisk.com/http://aws.amazon.com/http://www.economist.com/opinion/displaystory.cfm?story_id=15579717http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.html
  • 8/21/2019 5174affa160bd Cloud Computing Big Data

    2/4

    Page 2

    A NEW CLOUD STACK

    In order for cloud computing to become a mainstream approach,a new cloud stack (like mainframe and OSI) will likely emerge.

    Just like prior computing platform transitions (client/server, web

    services, etc.), core platform capabilities, such as security, access

    control, application management, virtualization, systems

    management, provisioning, availability, etc. will be a prerequisite

    before IT organizations are able to adopt the cloud completely.

    Clearly, this stack will exist in a different representation than

    prior platform layers to embrace a cloud environment. Simply

    replicating the current computing stack but allowing it to resideoff-premise will not achieve the scale, capabilities, and

    economies of cloud computing. In particular, this new cloud

    framework needs the ability to process data in increasingly

    greater orders of magnitude and do it at a fraction of the cost

    by leveraging commodity, multi-threaded servers for storage andcomputing. In many ways, this cloud stack has been implemented

    already, albeit in a primitive form, at large-scale internet

    datacenters.The challenge of processing terabytes of data daily at Google,

    Facebook, and Amazon drove them to adopt a new data

    architecture, which is essentially Martian to traditional enterprise

    datacenter architects. No longer are ACID and relationaldatabases back-ending transactional applications. Internet

    datacenters quickly encountered the scaling limitations of SQL

    databases as the volume of data exploded. Instead, high-performance, scalable/distributed non-SQL data stores are being

    developed internally and implemented at scale. Big Table and

    Cassandraare among the many variants, and this non-database

    database trend has proliferated to the point of having its own

    conference: NoSQL. Database caching layers (i.e., Northscales

    Memcached) are also being implemented to further driveapplication performance, and its now accepted as a standard

    tier in datacenters.

    Managing non-transactional data has become even more

    daunting. From log files to click stream data to web indexing,

    internet data centers are collecting massive volumes of data that

    need to be processed cheaply in order to drive monetization

    value. Hadoop is an open source data management frameworkthat has become widely deployed for massive parallel

    computation and distributed file systems in a cloud environment.

    Hadoop has allowed the largest web properties (Yahoo!,LinkedIn, Facebook, etc.) to store and analyze any data in near

    real-time at a fraction of the cost that traditional data

    management and data warehouse approaches could evencontemplate. Although the framework has roots in internet

    datacenters, Hadoop is quickly penetrating broader enterprise use

    cases. The diverse set of participants at Hadoop World NYC

    hosted by Clouderaclearly points to this trend.

    SECURING THE CLOUD

    Given this data intensive nature, any widely adopted cloud

    computing platform will inevitably account for richer security

    requirements. The security challenges will be focused less on

    point network and data level security, although high bandwidthencryption solutions and sophisticated key management will be

    needed to match the massively parallel computational cloud

    environments. In this case, the primary security challenges will

    stem from control. User authentication will become

    increasingly challenging as applications are federated outsidethe firewall because of SaaS adoption. In addition, managing

    and reconciling user identities across individual user directories

    for each SaaS/Cloud application will present further security

    issues. Much like web applications in the 90s created an SSOlayer, cloud computing is essentially abstracting a web services

    interface for infrastructure IT, and it will demand a similar

    unified authentication/entitlement layer.

    In addition to federated user authentication, cloud computingwill also require data authentication and security. Impervas

    database firewall is an example of an increasingly important

    cloud security product. As applications reside in differentpublic and private clouds, it will be critical for the cloud

    applications to be able to talk to each other. This will drive

    the need for ensuring data authentication and policy control forthe volumes of data flowing between cloud applications.

    Moreover, given the multi-tenancy paradigm of cloud

    environments, policy granularity will be paramount to ensure

    security and compliance. Data integration across cloud

    platforms will be more of an obstacle than application

    integration, as applications have become more open/standard.Standard data APIs will emerge as part of the new cloud

    stack to allow disparate environments to talk to each other andavoid vendor lock-in. Data migration challenges are perhaps

    the greatest factor today for locking users to a particular cloud

    platform.

    Over time, these APIs and layers will harden and will become

    tailored, depending on use case and workload for particularapplications. The adoption of these new frameworks will

    ultimately make cloud computing safe and broaden its

    penetration into enterprises of all sizes.

    WHATS BREWING IN A CLOUD?

    Despite constant comparisons to grid and utility computing,

    cloud computing has the potential to address a much broader

    set of applications and use cases beyond the limited HPC

    environments served traditionally by grid computing. This

    breadth of cloud computing is engendered in a new set ofunderlying technology forces. Virtualization technologies,

    high-powered commodity servers, low-cost/high bandwidthconnectivity, concurrent/multi-threaded programming models

    and open source software stacks are all technology building

    blocks that can deliver the high performance and scalability of

    grid/utility computing, but importantly and concurrently

    with underlying commodity resources.

    These technology drivers enable applications and users to be

    abstracted cleanly from particular IT infrastructure resources

    (computing, storage, networking, etc.) in new and powerful

    ways; i.e., location agnostic and multi-tenancy are two critical

    http://labs.google.com/papers/bigtable.htmlhttp://incubator.apache.org/cassandra/http://developer.yahoo.net/blog/archives/2009/06/nosql_meetup.htmlhttp://www.northscale.com/http://www.cloudera.com/hadoophttp://www.cloudera.com/hadoop-world-nychttp://www.cloudera.com/hadoop-world-nychttp://www.cloudera.com/hadoop-world-nychttp://www.cloudera.com/hadoop-world-nychttp://www.imperva.com/index.htmlhttp://www.imperva.com/index.htmlhttp://www.cloudera.com/hadoop-world-nychttp://www.cloudera.com/hadoophttp://www.northscale.com/http://developer.yahoo.net/blog/archives/2009/06/nosql_meetup.htmlhttp://incubator.apache.org/cassandra/http://labs.google.com/papers/bigtable.html
  • 8/21/2019 5174affa160bd Cloud Computing Big Data

    3/4

    Page 3

    elements among others. Unlike traditional HPC grid

    environments, which were designed for a specific application in a

    single company, cloud computing enables disparate applications

    and entities to harness a shared pool of resources. In addition,

    applications can be broken up in the cloud where computingresources may reside on the client while the data is accessed

    portably from multiple cloud locations (as an example).

    Many different definitions of cloud computing have surfaced.

    Rather than posit yet another, several characteristics are residentin any cloud instance: (i) self-provisioned (either by user,

    developer, or IT); (ii) elasticity (on-demand allocation of any

    computing, storage and networking resources); (iii) multi-

    anything (multi-user, multi-application, multi-session, etc.);and (iv) portability (applications are abstracted from physical

    infrastructure and can be migrated easily). These capabilities

    allow enterprise to shift IT resources from capex to opex ausage based model that is particularly appealing during recent

    economic constraints.

    These cloud prerequisites will yield a powerful a set of use cases

    beyond grid computing that are unique to cloud platforms. Cloudcomputing will reach its full potential in the future when a whole

    new set of applications (never possible before) is created that is

    purpose-built for the cloud. For example, one can envision

    powerful collaboration applications emerging that enable internalenterprise and external users to seamlessly cooperate that would

    have been previously impossible with users and data isolated on

    disparate enterprise islands. Its likely these innovative

    applications will require new programming models and

    potentially languages yet to be hardened.

    STILL IN THE EARLY DAYS

    Despite the high energy surrounding cloud computing and earlycloud offering successes, such as Amazon Web Services, cloudcomputing for enterprise services is definitely still in its

    formative stages. In contrast, however, consumers have already

    adopted cloud computing technologies. One could argue that web

    companies like Google, Yahoo!, Facebook, and Salesforce are

    examples of consumers leveraging cloud computing. These Web2.0/SaaS offerings clearly exhibit the core cloud characteristics

    outlined above, and in turn are delivering new, value-added

    services previously considered unthinkable. Interestingly, thistime the consumers, via their use of Web 2.0 services, have been

    teaching the typically early technology adopter enterprises the

    effectiveness of cloud computing.

    Today, the enterprise use of cloud computing represents opposite

    ends of the spectrum: (i) Web 2.0 start-ups seeking to launchapplications quickly and cheaply, and (ii) compute intensive

    enterprises that need batch processing for bursty, large-scale

    applications. Although these users are driving the early adoptionof cloud technology, its unlikely these limited use cases will

    establish cloud computing as a pervasive platform. Cloud

    computing instead will need to penetrate mainstream IT

    infrastructure slowly and offer a broader set enterprise

    applications.

    It is important to note here that these Web 2.0 start-ups represent

    a powerful trend in the role of developers in driving cloudcomputing adoptions. Many early users of cloud computing are

    examples of developers launching applications without

    requiring the involvement of IT (in the case of a Web 2.0 start-

    up, they dont have an IT department). Increasingly,

    empowering developers and line of business owners toinnovate and deploy new applications without the shackles of

    IT will be a motivating driver for cloud adoption. No longer do

    users need to have ITs blessing and time to get their job done.

    This developer-centric nature was a primary motivator ofVMwares strategic acquisition of SpringSource.In addition to

    inheriting significant Java technology, VMware now has adistinct opportunity to transition SpringSources dominant Java

    developer mindshare to develop onto VMwares private cloud

    platform. Amazon Web Services has experienced tremendous

    success from its developer-centric platform APIs. Unlike

    traditional hosting providers that cater to IT/operations,Amazon went after developers first and has only recently

    begun to add the functionality that will appeal to broader

    enterprise IT.

    Within enterprises, there are early signs of developers (Q&A

    environments, batch processing, and developer prototyping)

    and line of business/departmental leveraging cloud computing.It is not uncommon for new platform technologies to start at

    the fringes of IT before mainstream adoption takes place.

    Unlike typical three-tier traditional enterprise datacenters, the

    internet datacenters of Facebook, Google, etc. were not

    encumbered by legacy enterprise stacks, applications, and ITrules; which in turn enabled them to be built from the ground

    up with cloud stacks to handle elastically large-scale consumer

    transactions for multiple applications. Therefore, andunsurprisingly, Amazons internet datacenters was easily

    adapted to become the first and leading public computingprovider. It will certainly take significant time/effort forenterprise IT infrastructure gatekeepers to evolve their current

    architectures to embrace a new cloud platform. Luckily,

    enterprises can reap the technology innovation from internet

    data centers (many which are open source) to accelerate this

    transition.

    MORE THAN ONE FLAVOR

    There have been analogies drawn between cloud computing

    and public utilities (electric, gas, etc.) where the value is all

    about economies of scale. According to this hypothesis, the

    world will only have a few cloud providers that reachmaximum efficient scale. It is quite unlikely that this will

    happen. Multiple cloud models will emerge depending on the

    user, the workload, and the application. For example, certain

    developers will prefer to interface with a cloud provider at ahigher level of abstraction, such as Google App Engine, as

    opposed to a more bare metal API, such as Rackspace.

    Alternatively, an application may choose to run on MSFTAzure to leverage SQL/MSFT services or Salesforce Force for

    CRM integration and distribution advantages. Today, one can

    break cloud platforms into roughly two camps: developer-centric (Amazon, MSFT) and IT-centric (EMC, VMware).

    http://gigaom.com/2009/08/10/vmware-to-buy-springsource-for-420m/http://gigaom.com/2009/08/10/vmware-to-buy-springsource-for-420m/http://gigaom.com/2009/08/10/vmware-to-buy-springsource-for-420m/
  • 8/21/2019 5174affa160bd Cloud Computing Big Data

    4/4

    Page 4

    Cloud platforms will remain distinct and diverse as long as they

    continue to deliver unique value-add for their particular use cases

    and users.

    To drive this cloud diversity point further, the concept of a cloud

    within a cloud is also emerging where distinct services,

    such as data warehousing, can be built atop a more generic cloud

    platform to provide a higher layer cloud service.

    In addition, private clouds behind the firewalls present yet

    another flavor of cloud computing as enterprises leverage thebenefits of cloud frameworks while maintaining security/control

    as well as the compliance of their internal datacenters. Lastly,

    hybrid clouds that bridge private and public clouds on a

    permanent and temporary basis (also known as cloud bursting)will come to fruition for certain applications or as a migration

    path for enterprises. Several start-ups (Cirtas, CloudSwitch and

    Zetta among them) are building products that make the cloudsafe for enterprises. Innovation will abound to solve the

    specific issues in all of these various cloud environments.

    LOOKING AHEAD

    To further parse all this, I hosted a cloud computing panel with an

    esteemed group of technology thought leaders at Accels 15th

    Stanford Technology Symposium. Needless to say, thesepanelists had plenty of deep insights, opinions, and predictions

    about cloud computing.

    The panel brought together technologists who view cloud

    computing from distinctly different lenses: private cloudinnovators, public cloud providers, cloud enabling technology

    solutions and cloud infrastructure applications. In wrapping up

    the panel session, I asked each speaker to conjure up a singleprediction for cloud computing in the next few years. Heres what

    the experts said:

    Jonathan Bryce, CTO/Founder, Mosso (Rackspace): I think

    cloud computing is going to be a mindshift; its going to take a

    while. But I think an economy like this is actually a hugeopportunity for entrepreneursI think this is a time when

    resources are scarce thats when great businesses end up getting

    built. And I think part of whats going to enable some of thosebusinesses is cloud computing, and being able to get started with

    a lower varied entry, lower price point, all of those kind of

    things

    Mike Olson, CEO/Co-founder, Cloudera: I think that a lot ofwhats been said around here about data is really right on. I

    predict that in the next 10 years, computer science as computerscience isnt really going to be the place that smart young guys

    are going to find tremendously rewarding careers. I think that the

    application of these new compute systems to large data in the

    sciences will advance human kind substantially. I think thatscience will be done maybe not even in the lab on the wet bench

    anymore, but with data, with computer systems looking at vast

    amounts of data.

    Raghu Ramakrishnan, Chief Scientist for Audience and Research

    Fellow, Yahoo! Research: So a lot of the companies that areout there today Yahoo!, Facebook, Google theyre all

    exposing data APIs. Imagine whats going to happen once

    large clouds are routinely available to build theyre own

    application and you start aggregating your own data, and you

    have the opportunity to fuse that with all the data thats outthere. Someones going to figure out the next big thing, by

    taking 2 + 2 and coming up with 20.

    Mike Schroepfer, VP Engineering, Facebook: one of the

    things that is going to happen is that people are going to figure

    out that we need a more blended workload between the cloudand the client. Weve been operating kind of in the cycle of

    reincarnation and computer science, moved toward most of the

    computing happening in the cloud, and my browser effectively

    being its own terminal. You know, in the last 2 or 3 years, thespeed and capability of browsers has been outpacing that of

    most chips. Youre seeing 2x to 4x improvements in core

    performance on the engines and VMs in those browsers year on

    year, which is way outpacing the speed of chip designSo I

    believe that there will be a couple of people who will figure outways to blend computation and storage on the client, more

    gracefully with that on the server, but still provide you with all

    of the benefits of basically access to my data anywhere I need,

    and the kind of reliability of the cloud.

    Jayshree Ullal, President and CEO, Arista Networks: Well,

    theres a technology impact but I actually think its going to

    really make CIOs rethink their jobs. Today, you can have a

    server administrator, an application administrator, a network

    administrator, and theyre all silos but you need your generalpractitioner. And thats really missing right now in the cloud.

    So if I had to make a prediction, less on the technology, more

    on the operational side, I would say for the deployment of this,

    its got to be a generalized IT person, whether thats the CIO or

    somebody he or she appoints

    Rich Wolski, Professor of Computer Science, University ofCalifornia, Santa Barbara and CTO/Founder, Eucalyptus

    Systems: theres another revolution coming thats going to

    intersect the cloud revolution and that has to do with data

    simulationpretty much everything you own is going to betrying to send you data. And youre going to need, personally,

    a great deal of storage and compute capacity to be able to deal

    with that. I think the cloud is going to make that revolution that

    much quicker to come to us.

    These predictions depict cloud computing as still being in itsformative phases, but that it will emerge as fundamental

    breakthroughs in datacenter and IT infrastructure in the years to

    come. Despite the current macro headwinds, deep innovation,

    and market opportunities in cloud computing will persist. Oncethis economic storm passes, Im convinced the sun will shine

    through, and cloud computing is sure to have many silver

    linings.

    Ping Li is a partner at Accel Partners in Palo Alto

    and focuses primarily on Information Technology

    infrastructure and digital media platforms.

    http://www.cirtas.com/http://www.cirtas.com/http://www.cloudswitch.com/http://www.zetta.net/index.phphttp://www.accel.com/events/event.php?event_id=12http://www.accel.com/events/event.php?event_id=12http://www.accel.com/events/event.php?event_id=12http://www.accel.com/events/event.php?event_id=12http://www.accel.com/events/event.php?event_id=12http://www.zetta.net/index.phphttp://www.cloudswitch.com/http://www.cirtas.com/