joel christner (jec2160) coms-e6125 spring 2010 web enhanced information management

24
An Examination of Cloud Storage Architectures for Scalable Internet and Cloud Computing Applications Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management Professor Kaiser

Upload: jessie

Post on 23-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

An Examination of Cloud Storage Architectures for Scalable Internet and Cloud Computing Applications. Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management Professor Kaiser. Agenda. What are Cloud Computing and Cloud Storage? Why Cloud Storage? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

An Examination of Cloud Storage Architectures for Scalable Internet and Cloud Computing Applications

Joel Christner (jec2160)COMS-E6125 Spring 2010

Web Enhanced Information ManagementProfessor Kaiser

Page 2: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Agenda

• What are Cloud Computing and Cloud Storage?• Why Cloud Storage?• How Cloud Storage Works• Comparison of Cloud Storage Architectures• Summary

Page 3: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

What Are Cloud Computing and Cloud Storage?

Page 4: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Terms and Definitions• “General term for anything that involves delivering

hosted services over the Internet. These services are broadly divided into three categories: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS).”– http://www.searchcloudcomputing.com– Generally accepted “public cloud” definition

• Cloud storage is a component of cloud computing, and is an elastic, on-demand, scalable platform for data storage and retrieval

Page 5: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Architectures

• Three forms of clouds (architectures):– Public cloud – infrastructure owned by an

external entity, usage-based billing– Private cloud – infrastructure owned by the

company themselves, chargeback– Hybrid cloud – public cloud with isolated

resources and secure connectivity (virtual data center extension)

Page 6: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Attributes• Attributes of clouds:

– Virtualized – abstract logical from physical resources to enable mobility (VM migration), abstract logical from physical access and integration points

– Elasticity – virtualization enables elasticity (add or remove processors, memory, disk capacity)

– Scalability – virtualization as abstraction via management and access middleware enables simplicity in dynamically adding or removing resources

– Pay-as-you-Grow – built using commodity components (low cost) that can be added or removed as capacity needs change (virtualization and abstraction)

Page 7: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Why Cloud Storage?

Page 8: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Computing Evolution

• Computing has shifted over the last three decades:

centralized -> distributed -> centralized

Mainframes withdumb terminals

Distributed computingand the workgroup

Centralized andconsolidated data centers

Page 9: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Storage Evolution• Storage has followed this shift as well but remains the

most costly element of any enterprise– Mainframe – costly, single-vendor, but completely monolithic

yielding easiest data management and protection (single point)– Personal computers – cheap, modular, fully distributed, difficult

to manage and protect data– Workgroup servers – cheap, modular, still distributed data,

difficult to manage and protect– Centralized servers and storage networks – expensive, modular,

simpler data management and protection

Page 10: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Storage Network Fabrics• Organizations are still struggling to consolidate their data

(data is still distributed) but the following storage network fabrics are in use today:– Storage area network (SAN) – block volume access over a

shared network (Fibre Channel, Internet SCSI, Fibre Channel over Ethernet)

– Network attached storage (NAS) – filesystem protocol access over a shared network (Common Internet File System, Network File System, all of which use IP and generally Ethernet)

• Contrast with DAS (directly-attached storage)

Page 11: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Storage Capital Cost Elements• Capital cost elements

– Disk (in the workstation, in the server, in shared storage arrays)– Over-provisioned capacity (idle until used)– Storage array controllers (providing volume management and

value-added capabilities over shared disk)– Storage network infrastructure (FC/Ethernet switches,

HBAs/NICs, multipathing, failover software)– Data protection hardware and software (backup application,

servers, tape libraries and automation, tapes)– Software licenses (snapshots, replication)

• Vendors = high profit margin = high cost

Page 12: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Storage Operational Cost Elements

• Operational cost elements– Real estate (storage systems are large)– Facilities (power, space, cooling)– Failed component replacement (tapes, drives)– Off-site storage (Iron Mountain)– Volume provisioning, allocation, resizing, data

migration, and ongoing management– People (salaries, benefits)

Page 13: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Benefits of Cloud Storage?• Uses commodity components, eliminating the most costly

elements of traditional storage capital costs• Virtualization and abstraction eliminate the most costly and time-

consuming elements of traditional storage operational costs• Eliminates scalability issues associated with existing storage

arrays (max drives, max capacity)• Public cloud storage enables pay-as-you-grow capacity• Private cloud storage enables chargeback• Hybrid cloud storage enables near public cloud storage cost with

private cloud performance and security• Store virtually anything (user information, image files, documents,

dynamic page structure, binaries, code files, anything) flexibly and at the lowest cost

Page 14: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

How Cloud Storage Works

Page 15: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Cloud Storage System Components

• Access software– Integrated or binary (emulating SCSI)– Access via HTTP RESTful APIs or SOAP

• Control servers– Core of the system with databases (or NoSQL)– Holds consumer authentication credentials– Manages registration/removal of metadata/storage servers– Management interface

• Metadata servers– Stateless, caching to scale control servers– Consumer authentication, session key mgmt – Object location management, – Read/write request routing amongst storage servers

• Storage servers– Handles read/write requests (GET/PUT/POST/DELETE)

Page 16: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Cloud Storage System Architecture

ApplicationServers

1..n Metadata Servers scale-out and stateless

IO optimized for metadata

1..n Storage Serversscale-out

Capacity optimized for data storage

N+N Control ServersHA, no scale-out

Server Load Balancing

Authentication

Session Keys

Locate Object

Read/Write Request Routing

Read/Write

HTTP

RES

Tful

API

SOAP

API

Page 17: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Why More Scalable?• Metadata servers scale the control servers through

caching where appropriate• Brick-based approach to adding IO or storage capacity

– simply add more metadata servers or storage servers• SLB provides load-balancing, scale, and HA for

metadata servers• Metadata servers provide load-balancing for storage

servers• Storage servers may have a replication policy for data

high availability (copy objects across storage servers)

Page 18: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Not Infinitely Scalable, But…• Traditional enterprise storage systems have a host of scalability limitations:

– Number of trays behind the controller– Number of disks behind the controller– Number of connected hosts– Number of network interfaces– Number of configurable volumes, snapshots

• Cloud storage system scalability is limited by:– Number of IOPS for the control server and offload percentage via metadata

servers– Control server database capacity for metadata, object location– Number of metadata servers behind an SLB– IOPS capacity per metadata server– Storage capacity per storage server

• In general, cloud storage is considered multiple orders of magnitude more scalable than traditional enterprise storage

Page 19: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Raw $/GB Comparison

• Traditional midrange enterprise storage (such as EMC’s Clariion) averages approximately $8/GB in capital costs alone

• Scales to hundreds of TBs

Page 21: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Challenges with Cloud Storage• Access methods

– Enterprise applications expect SCSI access to underlying disk infrastructure and overlay block devices with their own file systems

– Cloud storage systems expose capacity via programmatic APIs (RESTful, SOAP), requiring translation

– Not an issue for home-grown applications• Security

– Cloud storage systems, particularly in public clouds, do not encrypt data– Even if cloud provider encrypted data, data remains vulnerable due to chain of

custody when cloud provider owns the key material• Others

– Performance for raw block device access vs cloud storage systems is lacking, particularly in public and virtual private cloud scenarios due to WAN bandwidth, latency, and packet loss

– Cloud storage systems use replication for high availability but provide no snapshots for enterprise backup systems

Page 22: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Summary

Page 23: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

Summary• Cloud storage architectures decrease the capital and

operational expenses of today’s enterprise and Internet businesses

• Cloud storage eliminates the majority of complexity and limitations associated with traditional storage (capacity limits, data migration, volume management)

• Cloud storage virtually eliminates the system-level scalability limitations associated with traditional storage

• Cloud storage has a series of challenges that limit its applicability in existing application environments, but remains a good fit in homegrown application environments

• Innovation in the cloud storage space will improve usability (translation appliances and software), security, and performance

Page 24: Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

TCP FIN