desktop grids

35
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Desktop Grids Ashok Adiga Texas Advanced Computing Center {[email protected]}

Upload: ursula

Post on 12-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Desktop Grids . Ashok Adiga Texas Advanced Computing Center {[email protected]}. Topics. What makes Desktop Grids different? What applications are suitable? Three Solutions: Condor United Devices Grid MP BOINC. Compute Resources on the Grid. Traditional: SMPs, MPPs, clusters, … - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Desktop Grids Ashok Adiga

Texas Advanced Computing Center

{[email protected]}

Page 2: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Topics

• What makes Desktop Grids different?

• What applications are suitable?• Three Solutions:

– Condor– United Devices Grid MP– BOINC

Page 3: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Compute Resources on the Grid

• Traditional: SMPs, MPPs, clusters, …– High speed, Reliable, Homogeneous,

Dedicated, Expensive (but getting cheaper)– High speed interconnects– Upto 1000s of CPUs

• Desktop PCs and Workstations– Low Speed (but improving!), Heterogeneous,

Unreliable, Non-dedicated, Inexpensive– Generic connections (Ethernet connections)– 1000s-10,000s of CPUs– Grid compute power increases as desktops are

upgraded

Page 4: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Desktop Grid Challenges• Unobtrusiveness

– Harness underutilized computing resources without impacting the primary Desktop user

• Added Security requirements– Desktop machines typically not in secure environment– Must protect desktop & program from each other

(sandboxing)– Must ensure secure communications between grid nodes

• Connectivity characteristics– Not always connected to network (e.g. laptops)– Might not have fixed identifier (e.g. dynamic IP addresses)

• Limited Network Bandwidth– Ideal applications have high compute to communication

ratio– Data management is critical to performance

Page 5: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Desktop Grid Challenges (cont’d)

• Job Scheduling heterogeneous, non-dedicated resources is complex– Must match application requirements to

resource characteristics– Meeting QoS is difficult since program might

have to share the CPU with other desktop activity

• Desktops are typically unreliable– System must detect & recover from node failure

• Scalability issues– Software has to manage thousands of resources– Conventional application licensing is not set up for

desktop grids

Page 6: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Application Feasibility• Only some applications map well to

Desktop grids– Coarse-grain data parallelism– Parallel chunks relatively independent– High computation-data communication

ratios– Non-Intrusive behavior on client device

• Small memory footprint on the client• I/O activity is limited

– Executable and data sizes are dependent on available bandwidth

Page 7: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Typical Applications

• Desktop Grids naturally support data parallel applications– Monte Carlo methods– Large Database searches– Genetic Algorithms– Exhaustive search techniques– Parametric Design– Asynchronous Iterative algorithms

Page 8: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Condor• Condor manages pools of workstations and

dedicated clusters to create a distributed high-throughput computing (HTC) facility.– Created at University of Wisconsin– Project established in 1985

• Initially targeted at scheduling clusters providing functions such as:– Queuing– Scheduling– Priority Scheme– Resource Classifications

• And then extended to manage non-dedicated resources– Sandboxing– Job preemption

Page 9: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Why use Condor?• Condor has several unique mechanisms such

as :– ClassAd Matchmaking – Process checkpoint/ restart / migration– Remote System Calls– Grid Awareness– Glideins

• Support for multiple “Universes”– Vanilla, Java, MPI, PVM, Globus, …

• Very simple to install, manage, and use– Natural environment for application developers

• Free!

Page 10: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Typical Condor PoolCentral Manager

master

collector

negotiatorschedd

startd

= ClassAd Communication Pathway

= Process Spawned

Submit-Onlymasterschedd

Execute-Onlymaster

startd

Regular Node

scheddstartd

masterRegular Node

scheddstartd

master

Execute-Onlymaster

startd

Page 11: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Condor ClassAds• ClassAds are at the heart of

Condor• ClassAds

– are a set of uniquely named expressions; each expression is called an attribute

– combine query and data – semi-structured : no fixed schema– extensible

Page 12: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Sample ClassAdMyType = "Machine"TargetType = "Job"Machine = "froth.cs.wisc.edu"Arch = "INTEL"OpSys = "SOLARIS251"Disk = 35882Memory = 128KeyboardIdle = 173LoadAvg = 0.1000Requirements = TARGET.Owner=="smith" || LoadAvg<=0.3 && KeyboardIdle>15*60

Page 13: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Condor Flocking• Central managers can allow schedds from other

pools to submit to them.

Schedd

Collector

Negotiator

Central Manager

(CONDOR_HOST)

Collector

Negotiator

Pool-Foo Central Manager

Collector

Negotiator

Pool-BarCentral Manager

SubmitMachine

Page 14: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Example: POVray on UT Grid Condor

Now: 15 min

Was: 2h 17 min

5-8 min

5-8 min

Page 15: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Parallel POVray on CondorA. Submitting POVray to Condor Pool – Perl

Script1. Automated creation of image “slices”2. Automated creation of condor submit files3. Automated creation of DAG file4. Using DAGman for job flow control

B. Multiple Architecture Support1. Executable = povray.$$(OpSys).$$(Arch)

C. Post processing with a C-executable1. “Stitching” image slices back together into one image file2. Using “xv” to display image back to user desktop

• Alternatively transferring image file back to user’s desktop

Page 16: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

POVray Submit Description File

Universe = vanillaExecutable = povray.$$(OpSys).$$(Arch)Requirements = (Arch == "INTEL" && OpSys == "LINUX") || \ (Arch == "INTEL" && OpSys == "WINNT51")

|| \ (Arch == "INTEL" && OpSys == "WINNT52") transfer_files = ONEXITInput = glasschess_0.iniError = Errfile_0.errOutput = glasschess_0.ppmtransfer_input_files = glasschess.pov,chesspiece1.incarguments = glasschess_0.inilog = glasschess_0_condor.lognotification = NEVERqueue

Page 17: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

DAGman Job Flow

CHILD

…PARENT A1

B

A0 AnA2 A3 A4 A5

Pre-processing prior to executing Job B

Page 18: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

DAGman Submission Script# Filename: povray.dag Job A0 ./submit/povray_submit_0.cmdJob A1 ./submit/povray_submit_1.cmdJob A2 ./submit/povray_submit_2.cmdJob A3 ./submit/povray_submit_3.cmdJob A4 ./submit/povray_submit_4.cmdJob A5 ./submit/povray_submit_5.cmdJob A6 ./submit/povray_submit_6.cmdJob A7 ./submit/povray_submit_7.cmdJob A8 ./submit/povray_submit_8.cmdJob A9 ./submit/povray_submit_9.cmdJob A10 ./submit/povray_submit_10.cmdJob A11 ./submit/povray_submit_11.cmdJob A12 ./submit/povray_submit_12.cmdJob B barrier_job_submit.cmdPARENT A0 CHILD BPARENT A1 CHILD BPARENT A2 CHILD BPARENT A3 CHILD BPARENT A4 CHILD BPARENT A5 CHILD BPARENT A6 CHILD BPARENT A7 CHILD BPARENT A8 CHILD BPARENT A9 CHILD BPARENT A10 CHILD BPARENT A11 CHILD BPARENT A12 CHILD BScript PRE B postprocessing.sh

glasschess

$ condor_submit_dag povray.dag

#!/bin/sh./stitchppms glasschess > glasschess.ppm 2> /dev/nullrm *_*.ppm *.ini Err* *.log povray.dag.*/usr/X11R6/bin/xv $1.ppm

#!/bin/sh/bin/sleep 1

Page 19: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

United Devices Grid MP• Commercial product that aggregates unused

cycles on desktop machines to provide a computing resource.

• Originally designed for non-dedicated resources– Security, non-intrusiveness, scheduling, …– Screensaver/graphical GUI on client desktop

• Support for multiple clients– Windows, Linux, Mac, AIX, & Solaris clients

Page 20: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

How Grid MP™ Works

• Submits jobs• Monitors job progress• Processes results

• Authenticates users and devices• Dispatches jobs based on priority…• Monitors and reschedules failed jobs• Collects job results

• Web browser interface• Command Line Interface• XML Web services API

User

Low latency parallel jobs

Large sequential jobs

Large data parallel jobs

• Advertises capability• Launches job• Secure job execution • Returns result• Caches data for reuse

Administrator

Grid MP Agent- Clusters

Grid MP Agent-Workstation- Desktop

Grid MP Agent- Servers

Grid MP Services

Page 21: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

UD Management Features• Enterprise features make it easier to

convince traditional IT organizations & and individual desktop users to install software– Browser-based administration tools allow local

management/policy specification to manage• Devices• Users• Workloads

– Single click install of client on PCs• Easily customizable to work with software

management packages

Page 22: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Grid MP™ Provisioning Example

User Group A

Grid MPServices

Root Administrator

User Group B

Device Group Administrator

Device Group Y

Device Group Z

Device Group Administrator(s)

Device Group XDevice Group X• User Groups A = 50%, B = 25%• Usage: 8am-5pm, 2hr cut-off• Runnable app. list …..

Device Group Y• User Group B = 100%• Usage: 24hrs, 1hr cut-off• Runnable app. list….

Device Group X• User Groups A = 50%, B = 50%• Usage: 6pm-8am, 8hr cut-off• Runnable app. list…..

Page 23: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Application Types Supported• Batch jobs

– Use mpsub command to run single executable on single remote desktop

• MPI jobs– Use ud_mpirun command to run MPI job

across a set of desktop machines• Data Parallel jobs

– Single job consists of several independent workunits that can be executed in parallel

– Application developer must create program modules and write application scripts to create workunits

Page 24: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Hosted Applications• Hosted Applications are easier to manage

– Provides users with managed application– Great for applications that are run frequently but

rarely updated– Data Parallel applications fit best in hosted scenario– Users do not have to deal with the application

maintenance only developer does.

• Grid MP is optimized for running hosted applications– Applications and data are cached at client nodes– Affinity scheduling to minimize data movement by re-

using cached executables and data. – Hosted application can be run across multiple

platforms by registering executables for each platform

Page 25: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Example: Reservoir Simulation

• Landmark’s VIP product benchmarked on Grid MP• Workload consisted of 240 simulations for 5 wells

– Sensitivities investigated include:• 2 PVT cases, • 2 fault connectivity, • 2 aquifer cases, • 2 relative permeability cases,• 5 combinations of 5 wells• 3 combinations of vertical permeability multipliers

– Each simulation packaged as a separate piece of work.• Similar Reservoir simulation application has been

developed at TACC (with Dr. W. Bangerth, Institute of Geophysics)

Page 26: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Example: Drug Discovery• Think & LigandFit

applications– Internet Project in

partnership with Oxford University Model interactions between proteins and potential drug molecules

– Virtual screening of drug molecules to reduce time-consuming, expensive lab testing by 90%

– Drug Database of 3.5 billion candidate molecules.

– Over 350K active computers participating all over the world.

Page 27: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Think• Code developed at Oxford University• Application Characteristics

– Typical Input Data File: < 1 KB– Typical Output File: < 20 KB– Typical Execution Time: 1000-5000

minutes– Floating-point intensive– Small memory footprint– Fully resolved executable is ~3Mb in

size.

Page 28: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Grid MP: POVray Application Portal

Page 29: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

BOINC• Berkeley Open Infrastructure for

Network Computing (BOINC)– Open source follow-on to SETI@home – General architecture supports multiple

applications– Solution targets volunteer resources, and not

enterprise desktops/workstations – More information at

http://boinc.berkeley.edu • Currently being used by several internet

projects

Page 30: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Structure of a BOINC project

Scheduling server (C++)

BOINC DB(MySQL) Work

generation

data server (HTTP)

data server (HTTP)

data server (HTTP)

Web interfaces

(PHP)

Retry generation

Result validation

Result processing

Garbage collection

Ongoing tasks:- monitor server correctness- monitor server performance- develop and maintain applications

Page 31: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

BOINC • No enterprise management tools

– Focus on “volunteer grid”• Provide incentives (points, teams, website)• Basic browser interface to set usage preferences

on PCs• Support for user community (forums)

• Simple interface for job management– Application developer creates scripts to

submit jobs and retrieve results• Provides sandbox on client• No encryption: uses redundant

computing to prevent spoofing

Page 32: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Projects using BOINC• Climateprediction.net: study climate change • Einstein@home: search for gravitational signals

emitted by pulsars • LHC@home: improve the design of the CERN

LHC particle accelerator • Predictor@home: investigate protein-related

diseases • Rosetta@home: help researchers develop cures

for human diseases • SETI@home: Look for radio evidence of

extraterrestrial life • Cell Computing biomedical research (Japanese;

requires nonstandard client software) • World Community Grid: advance our knowledge

of human disease. (Requires 5.2.1 or greater)

Page 33: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

SETI@home

• Analysis of radio telescope data from Arecibo– SETI: search for narrowband signals– Astropulse: search for short broadband signals

• 0.3 MB in, ~4 CPU hours, 10 KB out

Page 34: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Climateprediction.net

• Climate change study (Oxford University)– Met Office model (FORTRAN, 1M lines)

• Input: ~10MB executable, 1MB data• Output per workunit:

– 10 MB summary (always upload)– 1 GB detail file (archive on client, may upload)

• CPU time: 2-3 months (can't migrate)– trickle messages– preemptive scheduling

Page 35: Desktop Grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Why use Desktop Grids?• Desktop Grid solutions are typically complete &

standalone– Easy to setup and manage– Good entry vehicle to try out grids.

• Use existing (but underutilized) resources – Number of desktops/workstations on campus (or in an

enterprise) is typically an order of magnitude greater than traditional compute resources.

– Power of grid grows over time as new, faster desktops are added

• Typical (large) numbers of resources on desktop grids enable new approaches to solving problems