lec0-cloud computing.ppt

Upload: ishan01us

Post on 19-Feb-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/23/2019 lec0-cloud computing.ppt

    1/50

    Cloud Computing

  • 7/23/2019 lec0-cloud computing.ppt

    2/50

    Evolution of Computing with Network (1/2)

    Network Computing Network is computer (client - server) Separation of unctionalities

    Cluster Computing !ightl" coupled computing resources#

    C$%& storage& data& etc' %suall" connected within a N *anaged as a single resource Commodit"& +pen source

  • 7/23/2019 lec0-cloud computing.ppt

    3/50

    Evolution of Computing with Network (2/2)

    ,rid Computing esource sharing across several domains .ecentralied& open standards

    ,lo0al resource sharing

    %tilit" Computing .ont 0u" computers& lease computing power %pload& run& download +wnership model

  • 7/23/2019 lec0-cloud computing.ppt

    4/50

    !he Net Step# Cloud Computing

    Service and data are in the cloud& accessi0le withan" device connected to the cloud with a 0rowser

    ke" technical issue for developer# Scala0ilit"

    Services are not known geographicall"

  • 7/23/2019 lec0-cloud computing.ppt

    5/50

    pplications on the 3e0

  • 7/23/2019 lec0-cloud computing.ppt

    6/50

    pplications on the 3e0

  • 7/23/2019 lec0-cloud computing.ppt

    7/50

    Cloud Computing

    .efinition Cloud computing is a concept of using the internet to allow

    people to access technolog"-ena0led services'

    4t allows users to consume services without knowledge ofcontrol over the technolog" infrastructure that supportsthem'

    - 3ikipedia

  • 7/23/2019 lec0-cloud computing.ppt

    8/50

    *a5or !"pes of Cloud

    Compute and .ata Cloudmaon Elastic Computing Cloud (EC2)& ,oogle

    *apeduce& Science clouds

    $rovide platform for running science code 6ost Cloud

    ,oogle ppEngine 6ighl"-availa0le& fault tolerance& ro0ustness for we0

    capa0ilit"

    Services are not known geographicall"

  • 7/23/2019 lec0-cloud computing.ppt

    9/50

    Cloud Computing Eample - maon EC2

    http#//aws'amaon'com/ec2

  • 7/23/2019 lec0-cloud computing.ppt

    10/50

    Cloud Computing Eample - ,oogle ppEngine

    ,oogle ppEngine $4$"thon runtime environment.atastore $4

    4mages $4*ail $4*emcache $4% etch $4%sers $4

    free account can use up to 788 *9 storage&enough C$% and 0andwidth for a0out 7 millionpage views a month

    http#//code'google'com/appengine/

  • 7/23/2019 lec0-cloud computing.ppt

    11/50

    Cloud Computing

    dvantages Separation of infrastructure maintenance duties from

    application development

    Separation of application code from ph"sical resources0ilit" to use eternal assets to handle peak loads0ilit" to scale to meet user demands :uickl" Sharing capa0ilit" among a large pool of users& improving

    overall utiliation

    Services are not known geographicall"

  • 7/23/2019 lec0-cloud computing.ppt

    12/50

    Cloud Computing Summar"

    Cloud computing is a kind of network service andis a trend for future computing

    Scala0ilit" matters in cloud computing technolog" %sers focus on application development Services are not known geographicall"

  • 7/23/2019 lec0-cloud computing.ppt

    13/50

    Counting the num0ers vs' $rogramming model

    $ersonal Computer +ne to +ne

    Client/Server +ne to *an"

    Cloud Computing *an" to *an"

  • 7/23/2019 lec0-cloud computing.ppt

    14/50

    3hat $owers Cloud Computing in ,oogle;

    Commodit" 6ardware$erformance# single machine not interesting

    elia0ilit" *ost relia0le hardware will still fail# fault-tolerant softwareneeded

    ault-tolerant software ena0les use of commodit"components

    Standardiation# use standardied machines to run allkinds of applications

  • 7/23/2019 lec0-cloud computing.ppt

    15/50

    3hat $owers Cloud Computing in ,oogle;

    4nfrastructure Software.istri0uted storage#

    .istri0uted ile S"stem (,S)

    .istri0uted semi-structured data s"stem 9ig!a0le

    .istri0uted data processing s"stem *apeduce

    What is the common issues of all these software?

  • 7/23/2019 lec0-cloud computing.ppt

    16/50

    ,oogle ile S"stem

    iles 0roken into chunks (t"picall" < *9) Chunks replicated across three machines for safet"

    (tuna0le)

    .ata transfers happen directl" 0etween clients andchunkservers

  • 7/23/2019 lec0-cloud computing.ppt

    17/50

    ,S %sage = ,oogle

    288> clusters iles"stem clusters of up to 7888> machines

    $ools of 18888> clients 7> $eta0"te iles"stems ll in the presence of fre:uent 63 failure

  • 7/23/2019 lec0-cloud computing.ppt

    18/50

    9ig!a0le

    .ata model (row& column& timestamp)cell contents

  • 7/23/2019 lec0-cloud computing.ppt

    19/50

    9ig!a0le

    .istri0uted multi-level sparse map ault-tolerance& persistent

    Scala0le !housand of servers !era0"tes of in-memor" data $eta0"tes of disk-0ased data

    Self-managing Servers can 0e added/removed d"namicall" Servers ad5ust to load im0alance

  • 7/23/2019 lec0-cloud computing.ppt

    20/50

    3h" not 5ust use commercial .9;

    Scale is too large or cost is too high for mostcommercial data0ases

    ow-level storage optimiations help performancesignificantl" *uch harder to do when running on top of a data0ase la"erlso fun and challenging to 0uild large-scale s"stems

  • 7/23/2019 lec0-cloud computing.ppt

    21/50

    9ig!a0le Summar"

    .ata model applica0le to 0road range of clientsctivel" deplo"ed in man" of ,oogles services

    S"stem provides high-performance storage s"stem on a

    large scale Self-managing

    !housands of servers

    *illions of ops/second

    *ultiple ,9/s reading/writing

    Currentl" ? 788> 9ig!a0le cells argest 0igta0le cell manages ? @$9 of data spread over

    several thousand machines

  • 7/23/2019 lec0-cloud computing.ppt

    22/50

    .istri0uted .ata $rocessing

    $ro0lem# 6ow to count words in the tet files; 4nput files# N tet filesSie# multiple ph"sical disks$rocessing phase 1# launch * processes

    4nput# N/* tet files +utput# partial results of each words count

    $rocessing phase 2# merge * output files of step 1

  • 7/23/2019 lec0-cloud computing.ppt

    23/50

    $seudo Code of 3ordCount

  • 7/23/2019 lec0-cloud computing.ppt

    24/50

    !ask *anagement

    ogistics .ecide which computers to run phase 1& make sure the

    files are accessi0le (NS-like or cop")

    Similar for phase 2 Eecution#

    aunch the phase 1 programs with appropriate commandline flags& re-launch failed tasks until phase 1 is done

    Similar for phase 2 utomation# 0uild task scripts on top of eisting

    0atch s"stem

  • 7/23/2019 lec0-cloud computing.ppt

    25/50

    !echnical issues

    ile management# where to store files; Store all files on the same file server9ottleneck .istri0uted file s"stem# opportunit" to run locall"

    ,ranularit"# how to decide Nand M; Ao0 allocation# assign which task to which node;

    $refer local 5o0# knowledge of file s"stem

    ault-recover"# what if a node crashes; edundanc" of data Crash-detection and 5o0 re-allocation necessar"

  • 7/23/2019 lec0-cloud computing.ppt

    26/50

    *apeduce

    simple programming model that applies to man"data-intensive computing pro0lems

    6ide mess" details in *apeduce runtime li0rar"utomatic paralleliation oad 0alancing Network and disk transfer optimiation 6andle of machine failures

    o0ustness Eas" to use

  • 7/23/2019 lec0-cloud computing.ppt

    27/50

    *apeduce $rogramming *odel

    Borrowed from functionalprogramming

    map(f, [x1,,xm,]) = [f(x1),,f(xm),]

    reduce(f,x1, [x2,x3,])= reduce(f, f(x1,x2), [x3,])

    =

    (continue until the list is exhausted)

    Users implement two functionsmap (in_key, in_value)(key, value) listreduce (key, [value1,,valuem])f_value

  • 7/23/2019 lec0-cloud computing.ppt

    28/50

    *apeduce ? New *odel and S"stem

    Two phases of data processing

    !ap" (in_key,in_value)#(keyj, valuej) $j= 1k%

    &educe" (key, [value1,valuem])(key,f_value)

  • 7/23/2019 lec0-cloud computing.ppt

    29/50

    *apeduce Bersion of $seudo Code

    No ile 4/+ +nl" data processing logic

  • 7/23/2019 lec0-cloud computing.ppt

    30/50

    Eample ? 3ordCount (1/2)

    4nput is files with one document per record Specif" a map function that takes a ke"/value pair

    ke" document %

    Balue document contents

    +utput of map function is ke"/value pairs' 4n our case&output (w&D1D) once per word in the document

  • 7/23/2019 lec0-cloud computing.ppt

    31/50

    Eample ? 3ordCount (2/2)

    *apeduce li0rar" gathers together all pairs with thesame ke"(shuffle/sort)

    !he reduce function com0ines the values for a ke"' 4n our

    case& compute the sum

    +utput of reduce paired with ke" and saved

  • 7/23/2019 lec0-cloud computing.ppt

    32/50

    *apeduce ramework

    or certain classes of pro0lems& the *apeduceframework provides#utomatic efficient paralleliation/distri0ution 4/+ scheduling# un mapper close to input dataault-tolerance# restart failed mapper or reducer tasks

    on the same or different nodeso0ustness# tolerate even massive failures#

    e'g' large-scale network maintenance# once lost 1F88out of 2888 machines

    Status/monitoring

  • 7/23/2019 lec0-cloud computing.ppt

    33/50

    !ask ,ranularit" nd $ipelining

    ine granularit" tasks# man" more map tasks thanmachines *inimies time for fault recover" Can pipeline shuffling with map eecution

    9etter d"namic load 0alancing +ften use 288&888 map/7888 reduce tasks with 2888

    machines

  • 7/23/2019 lec0-cloud computing.ppt

    34/50

  • 7/23/2019 lec0-cloud computing.ppt

    35/50

  • 7/23/2019 lec0-cloud computing.ppt

    36/50

  • 7/23/2019 lec0-cloud computing.ppt

    37/50

  • 7/23/2019 lec0-cloud computing.ppt

    38/50

  • 7/23/2019 lec0-cloud computing.ppt

    39/50

  • 7/23/2019 lec0-cloud computing.ppt

    40/50

  • 7/23/2019 lec0-cloud computing.ppt

    41/50

  • 7/23/2019 lec0-cloud computing.ppt

    42/50

  • 7/23/2019 lec0-cloud computing.ppt

    43/50

    *apeduce# %ses at ,oogle

    !"pical configuration# 288&888 mappers& 788reducers on 2&888 nodes

    9road applica0ilit" has 0een a pleasant surpriseGualit" eperiences& log anal"sis& machine translation&

    ad-hoc data processing$roduction indeing s"stem# rewritten with

    *apeduce

    H18 *apeductions& much simpler than old code

  • 7/23/2019 lec0-cloud computing.ppt

    44/50

    *apeduce Summar"

    *apeduce is proven to 0e useful a0straction ,reatl" simplifies large-scale computation at

    ,oogle un to use# focus on pro0lem& let li0rar" deal with

    mess" details

  • 7/23/2019 lec0-cloud computing.ppt

    45/50

    .ata $la"ground

    *apeduce > 9ig!a0le > ,S .ata pla"ground Su0stantial fraction of internet availa0le for processing Eas"-to-use teraflops/peta0"tes& :uick turn-around

    Cool pro0lems& great colleagues

  • 7/23/2019 lec0-cloud computing.ppt

    46/50

  • 7/23/2019 lec0-cloud computing.ppt

    47/50

    +pen Source Cloud Software# $ro5ect 6adoop

    ,oogle pu0lished papers on ,S(I8@)&*apeduce(I8

  • 7/23/2019 lec0-cloud computing.ppt

    48/50

    4ndustrial 4nterest in 6adoop

    KahooL hired core 6adoop developersnnounced that their 3e0map is produced on a 6adoop cluster

    with 2888 hosts(dual/:uad cores) on e0' 1M& 288F'

    maon EC2 (Elastic Compute Cloud) supports 6adoop 3rite "our mapper and reducer& upload "our data and program&

    run and pa" 0" resource utiliation

    !iff-to-$. conversion of 11 million scanned New Kork !imesarticles (1F71-1M22) done in 2< hours on maon S@/EC2 with6adoop on 188 EC2 machines

    *an" silicon valle" startups are using EC2 and starting to use6adoop for their coolest ideas on internet-scale of data

    49* announced 9lue Cloud&D will include 6adoop amongother software components

  • 7/23/2019 lec0-cloud computing.ppt

    49/50

    ppEngine

    un "our application on ,oogle infrastructure anddata centers ocus on "our application& forget a0out machines&

    operating s"stems& we0 server software& data0asesetup/maintenance& load 0alance& etc'

    +perand for pu0lic sign-up on 288F/7/2F $"thon $4 to .atastore and %sers

    ree to start& pa" as "ou epand http#//code'google'com/appengine/

  • 7/23/2019 lec0-cloud computing.ppt

    50/50

    Summar"

    Cloud computing is a0out scala0le we0 applicationsand data processing needed to make appsinteresting

    ots of commodit" $Cs# good for scala0ilit" and cost 9uild we0 applications to 0e scala0le from the start

    ppEngine allows developers to use ,oogles scala0leinfrastructure and data centers

    6adoop ena0les scala0le data processing