cloud computing_final
TRANSCRIPT
Cloud computing: state-of-the-art and research challenges
Kaushik Padmanabhan, David Stewart
Cloud Computing• Shared Computer Processing
resources and data
• Hosting and delivering services over the Internet
• Denies maintaining computing infrastructures in house.
• Bulky storage of data.
• Retrieval from any place.
Industry Point of View• Eliminates the requirement for users to plan ahead for provisioning.• Starts from the small and increases the resources depending on the demand.
Industries as Competitorsa) Googleb) Amazonc) Microsoftd) EMCe) HP
Issue• Still at its infancy (security, size constraint, etc.)
Traditional Service Providers • Infrastructure Providers Manage cloud platforms and lease resources according to a usage-based pricing model
• Service ProvidersRent resources from one or many infrastructure providers to serve the end users.
Features of Cloud Computinga) No up-front investment• Pay-as you-go pricing model.• Does not need to invest in the infrastructure• Simply rents resources from the cloud according to its own needs and pay for the usageb) Lowering operating cost• Resources in a cloud environment can be rapidly allocated and de-allocated on demand.• This provides huge savings.c) Highly scalable• A service provider can easily expand its service to large scales in order to handle rapid increase in service
demands.• Surge Computing, otherwise known as Flash-crowd effectd) Easy access• Accessible with Internet connections.e) Reducing business risks and maintenance expenses• A service provider will shift the business risks (hardware failure) to infrastructure providers, who is better in that
field • By this maintenance and labor cost is less.
Things discussed in paper• Survey of cloud computing.• Key concepts.• Architectural principles.• State-of-the-art implementations.• Research challenges.
Related technologiesGrid Computing• Distributed computing paradigm that coordinates networked resources to achieve a common computational objective.• Cloud computing is similar to Grid computing in that it also employs distributed resources to achieve application-level objectives.
Utility Computing• Providing resources on-demand and charging customers based on usage rather than a flat rate.• Cloud computing can be perceived as a realization of utility computing.
Virtualization• Abstracts away the details of physical hardware and provides virtualized resources for high-level applications.• Virtualization forms the foundation of cloud computing, as it provides the capability of pooling computing resources from clusters of servers and dynamically assigning or reassigning virtual resources to applications on-demand.
Autonomic Computing• Autonomic computing aims at building computing systems capable of self-management without human interaction.• Cloud computing also exhibits certain autonomic features.
Cloud-Computing Architecture• Architecture of cloud computing is more modular. • Each layer is loosely coupled with the layers above and below.• Similar to the design of the OSI Layers. Has 4 layers, a) Hardware Layerb) Infrastructure Layerc) Platform Layerd) Application Layer
Hardware Layer• Responsible for managing the physical resources of the cloud like servers,
routers, switches, power and cooling systems.• Implemented in data centers.• A data center usually contains thousands of servers that are organized in racks
and interconnected through switches, routers or other fabrics. Issues• Hardware configuration.• Fault tolerance.• Traffic management. • Power and cooling resource management.
The infrastructure layer• Known as the virtualization layer
• Creates a pool of storage and computing resources by partitioning the physical resources
• Using virtualization technologies such as Xen, KVM and VMware.
• Essential component of cloud computing since dynamic resource assignment occurs here.
Platform layer• Built on top of the infrastructure layer.• Consists of operating systems and application frameworks.• Minimize the burden of deploying applications directly into VM
containers.
Application layer• Highest level of the hierarchy.• Consists of the actual cloud applications.• Leverage the automatic-scaling feature to achieve better performance,
availability and lower operating cost.
Business model• Hardware and platform-
level resources are provided as services on an on-demand basis.
• Clouds offer services that can be grouped into three categories:
(a) Software as a service (SaaS), (b) Platform as a service (PaaS), (c) Infrastructure as a service (IaaS).
Infrastructure as a Service (IaaS)• On-demand provisioning of infrastructural resources, usually in terms of VMs. • The cloud owner who offers IaaS is called an IaaS provider. • Examples include Amazon EC2, GoGrid and Flexiscale.
Platform as a Service (PaaS)• Providing platform layer resources, including operating system support and software development frameworks. • Examples include Google App Engine, Microsoft Windows Azure and Force.com.
Software as a Service (SaaS)• Providing on-demand applications over the Internet. • Examples include Salesforce.com, Rackspace and SAP Business ByDesign.
• In general, SaaS is Service Provider, IaaS and PaaS are called Infrastructure Provider
Types of clouds• Different types of clouds depending on issues to be considered in a
cloud environment.
- Public clouds- Private clouds - Hybrid clouds - Virtual Private Cloud
Public clouds• A cloud in which service providers offer their resources as services to the general public. Issue• Lack fine-grained control over data, network and security settings, which hampers their effectiveness in many business scenarios.
Private clouds • Also known as internal clouds, • Designed for exclusive use by a single organization.• Built and managed by the organization or by external providers. • Offers the highest degree of control over performance, reliability and security. Issue• They are often criticized for being similar to traditional proprietary server farms and do not provide• Benefits such as no up-front capital costs.
Hybrid clouds• Combination of public and private cloud models that tries to address the limitations of each approach. • Part of the service infrastructure runs in private clouds while the remaining part runs in public clouds. • Offer more flexibility• Provide tighter control and security over application data compared to public clouds Issue• Requires carefully determining the best split between public and private cloud components.
Virtual Private Cloud• An alternative solution to addressing the limitations of both public and private clouds is called• Virtual Private Cloud (VPC). • A platform running on top of public clouds. • Leverages virtual private network (VPN) technology that allows service providers to design their own topology and security settings such as firewall rules.
Cloud computing characteristicsSalient features different from traditional service computing,• Multi-tenacity• Shared resource pooling• Geo-distribution & ubiquitous network access • Service oriented • Dynamic resource provisioning • Self-organizing • Utility based pricing
• A layered approach is the basic foundation of the network architecture design.• The basic layers of a data center consist of, a) Core layer b) Aggregation layer c) Access layerCore layer• Provides connectivity to multiple aggregation switches.
Aggregation layer• Provides important functions, such as domain service, location service, server load balancing
Access layer• Servers in racks physically connect to the network. • Typically 20 to 40 servers per rack, each connected to an access switch with a 1 Gbps link. • Access switches usually connect to two aggregation switches for redundancy with 10 Gbps links.
Architectural design of data centers
Data center design (cont.)● Uniform high capacity
○ Maximum traffic flow rate should be based only on servers’ network-interface cards
○ A host should communicate with any other host in the network at full bandwidth
● Rapid and free VM migration
○ Transmitting VMs between machines for heat, power distribution, statistical multiplexing
● Resiliency to server failures, link outages
● Scalability to large number of servers
○ Allow incremental expansion
● Backward compatibility with devices running Ethernet and/or IP
Modular data centers (MDCs)
● Shipping containers filled with servers, network equipment
● Good for interactive (latency-sensitive) applications
● Provides redundancyhttps://commons.wikimedia.org/wiki/File:IBMPortableModularDataCenter.jpg
Distributed file system over clouds● Google File System (GFS)
○ Divides files into 64MB chunks, which may be spread across geo-diverse servers
○ Usually reads and appends to files - rarely overwrites or shrinks
○ Designed to enable high throughput, low latency, and resistance to server
failures
● Hadoop Distributed File System (HDFS)○ Open source, but based on GFS
○ Data also provided over HTTP, allowing web browsers, etc. to access content
○ Data nodes can communicate to better redistribute and replicate data
Distributed app framework over clouds● Computation- and data-intensive jobs
○ Financial trend analysis, animation
● Google MapReduce - one Master, many clients○ Master allocates work to nodes that are physically close to the required data
○ Reduces network traffic on backbone, reducing bottlenecking, improving
throughput
○ If an individual task fails, it’s rescheduled
○ If Master fails, all tasks are lost - Master records its progress so it can restart
once it’s restored
● Hadoop MapReduce - similar open-source project
Commercial products - Microsoft Windows Azure
Taken from source paper.
Research challenges - issues to be addressed
● Automated service provisioning
○ How to translate service level objectives (QoS requirements) to CPU, memory requirements
○ Requires prediction to agilely handle demand fluctuations
○ Also must be able to react to fluctuations that happen before predictions are available
● VM migration
○ Enables load balancing and robust, responsive provisioning in data centers
○ Xen and VMware migration times range from 10-1000ms
○ Main benefit is to avoid workload hotspots - currently, hotspot detection isn’t agile
Issues to be addressed (cont.)● Server consolidation
○ Maximize resource utilization, minimize energy consumption
○ Need to do this without causing server congestion
○ Optimized consolidation is difficult for computers to handle
● Energy management
○ In 2006, US data centers consumed 1.5% of total generated energy (53% for power, cooling)
○ Predicted to increase 0.3% annually
○ Energy-efficient hardware allows lower CPU speeds, turning off unused components
○ Need to balance energy savings with performance
Issues to be addressed (cont.)● Traffic management and analysis
○ Methods used in ISPs, enterprise networks aren’t efficient for use with data
centers
○ Link density, number of servers, different flow patterns
● Data security - confidentiality and auditability○ Infrastructure provider handles this - service providers don’t have access
○ Auditability is usually accomplished remotely, but VM migration makes this
ineffective
○ Needs trust mechanisms - hardware TPM, secure VM monitors
Issues to be addressed (cont.)● Software frameworks
○ Resource consumption and performance of MapReduce jobs vary based on type
of app
○ VMs allocated to each node may have heterogeneous characteristics (i.e.
bandwidth)
○ Optimize through efficient scheduling, configuration to mitigate bottlenecks
○ Challenges include performance modeling Hadoop jobs, adaptive scheduling
○ Can theoretically make MapReduce energy-aware, but this hasn’t been
explored
Issues to be addressed (cont.)● Storage and data management technologies
○ MapReduce and implementations (Hadoop, Dryad) run on Internet-scale file
systems
○ These systems have compatibility issues with legacy file systems and apps
● Cloud architectures○ Current “large data center, centralized operation” schemes are expensive to
build and power
○ Small centers are easier to power, cheaper to build, and more geo-diverse
○ End users could donate resources - heterogeneity, churn events, incentive
schemes are unexplored problems
Conclusion• Cloud computing has recently emerged as a compelling paradigm for
managing and delivering services over the Internet.
• The rise of cloud computing is rapidly changing the landscape of information technology, and ultimately turning the long-held promise of utility computing into a reality.
Reference• Al-Fares M et al (2008) A scalable, commodity data center network
architecture. In: Proc SIGCOMM
• Amazon Elastic Computing Cloud, aws.amazon.com/ec2
• Amazon Web Services, aws.amazon.com
• Ananthanarayanan R, Gupta K et al (2009) Cloud analytics: do we really need to reinvent the storage stack? In: Proc of HotCloud
• Armbrust M et al (2009) Above the clouds: a Berkeley view of cloud computing. UC Berkeley Technical Report