volunteer computing with boinc dr. david p. anderson university of california, berkeley sc10 nov....

Volunteer Computingwith BOINC

Dr. David P. AndersonUniversity of California, Berkeley

SC10Nov. 14, 2010

Goals

Explain volunteer computing Teach how to create a volunteer computing

project using BOINC

Target audience: High-throughput computing users Technical skills:

Basic Linux/Apache sysadmin, familiarity with PHP, SQL and XML, C/C++ (optional)

Outline Why use volunteer computing? Basic concepts of BOINC Developing BOINC applications

(15 minute break) Deploying a BOINC server Deploying applications Submitting jobs Organizational issues

Part 1:

Why use volunteer computing?

The Consumer Digital Infrastructure

1 billion PCs current GPUs: 1 TeraFLOPS (1,000 ExaFLOPS

total) Storage: ~1,000 Exabytes

Commodity Internet: 10-1,000 Mbps to home Consumers pay for

hardware sysadmin network costs electricity

Volunteer computing

PC owners donate computing resources to projects (e.g., computational science)

Applications run at zero priority while PC in use, and/or while PC is not in use

Examples

Project start where area peak #hosts

GIMPS 1994 math 10,000distributed.net 1995 cryptography 100,000SETI@home I 1999 UCB SETI 600,000Folding@home 1999 Stanford biology 200,000United Devices 2002 commercial biomedicine 200,000CPDN 2003 Oxford climate change 150,000LHC@home 2004 CERN physics 60,000Predictor@home 2004 Scripps biology 100,000WCG 2004 commercial biomedicine 200,000Einstein@home 2005 LIGO astrophysics 200,000SETI@home II 2005 UCB SETI 850,000Rosetta@home 2005 U. Wash biology 100,000SIMAP 2005 T.U. Munich bioinformatics 10,000... ... ... ... ...

Current status

~50 projects 500,000 vounteers 800,000 computers

Processor type0

0.51

1.52

2.53

3.54

4.55

4.6

2.4 2.2

1.2

NVIDIA

CPU

PS3 (Cell)

ATI

High-throughputcomputing

High-performancecomputing

cluster(MPI)

supercomputer

cluster(batch)

Grid

Commercialcloud

Volunteercomputing

single job

# processors

multiple jobs

10K-1M

1000

100

1

Volunteer computing is different

You don’t buy resources; you ask for them Resources are:

heterogeneous sporadically available and connected untrusted and not private behind firewalls/NATs/proxies

Part 2:

Basic concepts of BOINC

About BOINC

Funded by NSF since 2002 Open-source (LGPL) Based at UC Berkeley Few staff, but lots of volunteers

software testing translation documentation support (email lists, message boards, Skype)

Volunteers and projects

volunteers projects

CPDN

LHC@home

WCGattachments

BOINC software overview

client

apps

screensaver

GUI

scheduler

MySQL

data server

daemons

volunteer host

project serverHTTP

BOINC schedulerapplications

Win32 + NVIDIA

Win64

Mac OS X

app versions

jobs

instances

Win32 N-core

Win32

- HW, SW description- existing workload- per resource type: # of instances requested # of seconds requested

- app version descriptions- job descriptions

Job replication

Job instances may fail or return wrong results Job replication: do 2, see if they agree

“agree” may be fuzzy Homogeneous replication

numerical equivalence of hosts Adaptive replication

reduce replication for hosts that seem trustworthy

The job pipeline

work generator

BOINC

validator

assimilator

The BOINC data model

App versions, job inputs, job output can consist of arbitrarily many files

Each file has a physical name (unique, immutable); each reference to a file has a “logical name”

Files have various attributes (e.g., sticky) Each file can have one or more URLs, and are

transferred via HTTP App version files are digitally signed

What kinds of jobs can BOINC handle?

Pretty much anything you’d run on a Grid Bag of tasks (but IPC support soon) Short/long jobs Data intensive, up to a point Geared towards

Few apps, many jobs (high startup cost per app)

Jobs with high slack time

Part 3:

Application development for BOINC

The BOINC runtime environment

processes

files

Native BOINC applications

boinc_init() create runtime system thread

boinc_finish() write finish file

boinc_resolve_filename(logical, physical) boinc_fraction_done(x)

Checkpointing

bool boinc_time_to_checkpoint() call when in checkpointable state

boinc_checkpoint_done()

The BOINC wrapper

Can use for legacy apps XML input file lists sub-jobs

executable, input files What it does:

interfaces to BOINC client copies files to/from slot directory runs executables does checkpointing at sub-job level

Building app versions

Linux gcc

Windows Visual Studio minGW (gcc)

Mac OS X xcode

Multithread apps

boinc_init_parallel() Allows suspend/resume of all threads

Unix: fork/exec Windows: direct thread control

GPU app versions

Develop for NVIDIA or ATI, with CUDA, CAL, OpenCL, etc. (BOINC supplies samples)

Each version has a “plan class” For each plan class, supply a function that

determines can app run on this host?

hardware, driver version, etc. what resources will it use?

#CPUs, #GPUs, GPU RAM, etc.

VM apps

Develop apps on your favorite OS Create a VirtualBox VM image App version consists of

VM wrapper (supplied by BOINC) VM image app executable

Part 4:

Deploying a BOINC server

Hardware options

Native Linux host download/compile BOINC software

BOINC server VM (VMware/Debian) BOINC Amazon EC2 image

Components of a project

Master URL name MySQL database Directory hierarchy A set of daemon processes and cron jobs

Processes

work generator

validator

assimilatorfeeder

MySQL DB

scheduler

transitioner

file deleter

DB purger

clients

Project directory hierarchy

apps/ application files

bin/ daemon programs

cgi-bin/ BOINC scheduler and upload GCI

config.xml configuration file

download/ downloadable files

html/ web site; master URL points here

keys/ keys for code signing, upload auth

log_(hostname) daemon log files

project.xml list of platforms and apps

upload/ uploaded files

BOINC database

platform

app

app_version

user

host

workunit

result

...

Creating a project

make_project name creates

directory hierarchy DB mods for httpd.conf crontab entry

Project configuration and control

config.xml scheduling and other options list of daemons list of periodic tasks

project control bin/start: start daemons, enable scheduler bin/stop: stop daemons, disable scheduler bin/status

Scaling a BOINC server

Components can run on different machines sharing a file system

Each component can be distributed MySQL server is typically the bottleneck 1 server machine can issue ~100K jobs/day; 4

machines can issue > 1 million

Part 5:

Deploying applications

Adding an application

edit project.xml

run bin/xadd

<app> <name>multi_thread</name> <user_friendly_name>Test multi-thread apps</user_friendly_name> </app>

Adding an application version

Create application version directory

Sign files on offline computer run bin/update_versions

apps/uppercase/

uppercase_6.14_windows_intelx86__cuda.exe/uppercase_6.14_windows_intelx86__cuda.exegraphics_app=uppercase_graphics_6.14_windows_intelx86.exe logo.jpgHelvetica.txf

Part 6:

Submitting jobs

Describing job inputs Input template file

<file_info> <number>0</number></file_info><workunit> <file_ref> <file_number>0</file_number> <open_name>in</open_name> </file_ref> <target_nresults>1</target_nresults> <min_quorum>1</min_quorum> <command_line>-cpu_time 60</command_line> <rsc_fpops_bound>446797000000000</rsc_fpops_bound> <rsc_fpops_est>279248000000000</rsc_fpops_est></workunit>

Describing job outputs Output template file

<file_info> <name><OUTFILE_0/></name> <generated_locally/> <upload_when_present/> <max_nbytes>5000000</max_nbytes> <url><UPLOAD_URL/></url></file_info><result> <file_ref> <file_name><OUTFILE_0/></file_name> <open_name>out</open_name> </file_ref></result>

Submitting a job

Stage input files

Submit job

create_work –appname A –wu_name B –wu_template C –result_template D

cp test_files/12ja04aa `bin/dir_hier_path 12ja04aa`

Part 7:

Organizational issues

Single-scientist projects

Need to: Port apps Get publicity interface with public maintain servers

Not many research groups have the resources And it creates a lot of competing “brands”

Umbrella projects

Example: IBM World Community Grid

Projectpublicityweb developmentsysadminapp porting

The Berkeley@home model

• A university has

– scientists

– a powerful “brand”

– PR resources

– IT infrastructure

– lots of alumni (UCB: 500,000)

Hubs• nanoHUB: “science portal” for nanoscience

– social network + “app store”

– sharing of ideas, data, software

– computational portal

• HUBzero: generalization to other areas

– currently ~20 hubs

• Integration of BOINC with HUBzero

– each hub has a volunteer computing project

volunteer computing with boinc dr. david p. anderson university of california, berkeley sc10 nov....

Documents