volunteer computing with boinc dr. david p. anderson university of california, berkeley sc10 nov....
TRANSCRIPT
Volunteer Computingwith BOINC
Dr. David P. AndersonUniversity of California, Berkeley
SC10Nov. 14, 2010
Goals
Explain volunteer computing Teach how to create a volunteer computing
project using BOINC
Target audience: High-throughput computing users Technical skills:
Basic Linux/Apache sysadmin, familiarity with PHP, SQL and XML, C/C++ (optional)
Outline Why use volunteer computing? Basic concepts of BOINC Developing BOINC applications
(15 minute break) Deploying a BOINC server Deploying applications Submitting jobs Organizational issues
Part 1:
Why use volunteer computing?
The Consumer Digital Infrastructure
1 billion PCs current GPUs: 1 TeraFLOPS (1,000 ExaFLOPS
total) Storage: ~1,000 Exabytes
Commodity Internet: 10-1,000 Mbps to home Consumers pay for
hardware sysadmin network costs electricity
Volunteer computing
PC owners donate computing resources to projects (e.g., computational science)
Applications run at zero priority while PC in use, and/or while PC is not in use
Examples
Project start where area peak #hosts
GIMPS 1994 math 10,000distributed.net 1995 cryptography 100,000SETI@home I 1999 UCB SETI 600,000Folding@home 1999 Stanford biology 200,000United Devices 2002 commercial biomedicine 200,000CPDN 2003 Oxford climate change 150,000LHC@home 2004 CERN physics 60,000Predictor@home 2004 Scripps biology 100,000WCG 2004 commercial biomedicine 200,000Einstein@home 2005 LIGO astrophysics 200,000SETI@home II 2005 UCB SETI 850,000Rosetta@home 2005 U. Wash biology 100,000SIMAP 2005 T.U. Munich bioinformatics 10,000... ... ... ... ...
Current status
~50 projects 500,000 vounteers 800,000 computers
Processor type0
0.51
1.52
2.53
3.54
4.55
4.6
2.4 2.2
1.2
NVIDIA
CPU
PS3 (Cell)
ATI
High-throughputcomputing
High-performancecomputing
cluster(MPI)
supercomputer
cluster(batch)
Grid
Commercialcloud
Volunteercomputing
single job
# processors
multiple jobs
10K-1M
1000
100
1
Volunteer computing is different
You don’t buy resources; you ask for them Resources are:
heterogeneous sporadically available and connected untrusted and not private behind firewalls/NATs/proxies
Part 2:
Basic concepts of BOINC
About BOINC
Funded by NSF since 2002 Open-source (LGPL) Based at UC Berkeley Few staff, but lots of volunteers
software testing translation documentation support (email lists, message boards, Skype)
Volunteers and projects
volunteers projects
CPDN
LHC@home
WCGattachments
BOINC software overview
client
apps
screensaver
GUI
scheduler
MySQL
data server
daemons
volunteer host
project serverHTTP
BOINC schedulerapplications
Win32 + NVIDIA
Win64
Mac OS X
app versions
jobs
instances
Win32 N-core
Win32
- HW, SW description- existing workload- per resource type: # of instances requested # of seconds requested
- app version descriptions- job descriptions
Job replication
Job instances may fail or return wrong results Job replication: do 2, see if they agree
“agree” may be fuzzy Homogeneous replication
numerical equivalence of hosts Adaptive replication
reduce replication for hosts that seem trustworthy
The job pipeline
work generator
BOINC
validator
assimilator
The BOINC data model
App versions, job inputs, job output can consist of arbitrarily many files
Each file has a physical name (unique, immutable); each reference to a file has a “logical name”
Files have various attributes (e.g., sticky) Each file can have one or more URLs, and are
transferred via HTTP App version files are digitally signed
What kinds of jobs can BOINC handle?
Pretty much anything you’d run on a Grid Bag of tasks (but IPC support soon) Short/long jobs Data intensive, up to a point Geared towards
Few apps, many jobs (high startup cost per app)
Jobs with high slack time
Part 3:
Application development for BOINC
The BOINC runtime environment
processes
files
Native BOINC applications
boinc_init() create runtime system thread
boinc_finish() write finish file
boinc_resolve_filename(logical, physical) boinc_fraction_done(x)
Checkpointing
bool boinc_time_to_checkpoint() call when in checkpointable state
boinc_checkpoint_done()
The BOINC wrapper
Can use for legacy apps XML input file lists sub-jobs
executable, input files What it does:
interfaces to BOINC client copies files to/from slot directory runs executables does checkpointing at sub-job level
Building app versions
Linux gcc
Windows Visual Studio minGW (gcc)
Mac OS X xcode
Multithread apps
boinc_init_parallel() Allows suspend/resume of all threads
Unix: fork/exec Windows: direct thread control
GPU app versions
Develop for NVIDIA or ATI, with CUDA, CAL, OpenCL, etc. (BOINC supplies samples)
Each version has a “plan class” For each plan class, supply a function that
determines can app run on this host?
hardware, driver version, etc. what resources will it use?
#CPUs, #GPUs, GPU RAM, etc.
VM apps
Develop apps on your favorite OS Create a VirtualBox VM image App version consists of
VM wrapper (supplied by BOINC) VM image app executable
Part 4:
Deploying a BOINC server
Hardware options
Native Linux host download/compile BOINC software
BOINC server VM (VMware/Debian) BOINC Amazon EC2 image
Components of a project
Master URL name MySQL database Directory hierarchy A set of daemon processes and cron jobs
Processes
work generator
validator
assimilatorfeeder
MySQL DB
scheduler
transitioner
file deleter
DB purger
clients
Project directory hierarchy
apps/ application files
bin/ daemon programs
cgi-bin/ BOINC scheduler and upload GCI
config.xml configuration file
download/ downloadable files
html/ web site; master URL points here
keys/ keys for code signing, upload auth
log_(hostname) daemon log files
project.xml list of platforms and apps
upload/ uploaded files
BOINC database
platform
app
app_version
user
host
workunit
result
...
Creating a project
make_project name creates
directory hierarchy DB mods for httpd.conf crontab entry
Project configuration and control
config.xml scheduling and other options list of daemons list of periodic tasks
project control bin/start: start daemons, enable scheduler bin/stop: stop daemons, disable scheduler bin/status
Scaling a BOINC server
Components can run on different machines sharing a file system
Each component can be distributed MySQL server is typically the bottleneck 1 server machine can issue ~100K jobs/day; 4
machines can issue > 1 million
Part 5:
Deploying applications
Adding an application
edit project.xml
run bin/xadd
<app> <name>multi_thread</name> <user_friendly_name>Test multi-thread apps</user_friendly_name> </app>
Adding an application version
Create application version directory
Sign files on offline computer run bin/update_versions
apps/uppercase/
uppercase_6.14_windows_intelx86__cuda.exe/uppercase_6.14_windows_intelx86__cuda.exegraphics_app=uppercase_graphics_6.14_windows_intelx86.exe logo.jpgHelvetica.txf
Part 6:
Submitting jobs
Describing job inputs Input template file
<file_info> <number>0</number></file_info><workunit> <file_ref> <file_number>0</file_number> <open_name>in</open_name> </file_ref> <target_nresults>1</target_nresults> <min_quorum>1</min_quorum> <command_line>-cpu_time 60</command_line> <rsc_fpops_bound>446797000000000</rsc_fpops_bound> <rsc_fpops_est>279248000000000</rsc_fpops_est></workunit>
Describing job outputs Output template file
<file_info> <name><OUTFILE_0/></name> <generated_locally/> <upload_when_present/> <max_nbytes>5000000</max_nbytes> <url><UPLOAD_URL/></url></file_info><result> <file_ref> <file_name><OUTFILE_0/></file_name> <open_name>out</open_name> </file_ref></result>
Submitting a job
Stage input files
Submit job
create_work –appname A –wu_name B –wu_template C –result_template D
cp test_files/12ja04aa `bin/dir_hier_path 12ja04aa`
Part 7:
Organizational issues
Single-scientist projects
Need to: Port apps Get publicity interface with public maintain servers
Not many research groups have the resources And it creates a lot of competing “brands”
Umbrella projects
Example: IBM World Community Grid
Projectpublicityweb developmentsysadminapp porting
The Berkeley@home model
• A university has
– scientists
– a powerful “brand”
– PR resources
– IT infrastructure
– lots of alumni (UCB: 500,000)
Hubs• nanoHUB: “science portal” for nanoscience
– social network + “app store”
– sharing of ideas, data, software
– computational portal
• HUBzero: generalization to other areas
– currently ~20 hubs
• Integration of BOINC with HUBzero
– each hub has a volunteer computing project