the software infrastructure of seti @ home ii

22
The software infrastructure of SETI@home II David P. Anderson Space Sciences Laboratory U.C. Berkeley

Upload: pia

Post on 22-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

The software infrastructure of SETI @ home II. David P. Anderson Space Sciences Laboratory U.C. Berkeley. Public-resource computing. Home PCs. your computers. academic. business. Challenges: low bandwidth at client costly bandwidth at server firewall/NAT issues - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The software infrastructure of  SETI @ home II

The software infrastructure

of SETI@home II

David P. AndersonSpace Sciences Laboratory

U.C. Berkeley

Page 2: The software infrastructure of  SETI @ home II

Public-resource computing

Home PCsbusiness

academic

Advantages:• scale• free• growth• public education• no institutional policy issues

Challenges:• low bandwidth at client• costly bandwidth at server• firewall/NAT issues• sporadic connection• untrustworthy, insecure clients• server security• heterogeneity• must recruit participants

your computers

Page 3: The software infrastructure of  SETI @ home II

Achievements of SETI@home

• 1,000,000 years of CPU time in 3 years

• Sustained 30 TeraFLOPs• 1.5E21 floating-point operations• 3,600,000 users in 226 countries• 40 Terabytes of data processed• 3 billion “events” detected• Solved scaling, security problems

Page 4: The software infrastructure of  SETI @ home II

SETI@home II• Broadband pulse search on

existing data• Parkes observatory: Southern sky• Multi-beam receivers• Wider frequency band• Use KL transform• Data archival on clients

Page 5: The software infrastructure of  SETI @ home II

SETI@home software shortcomings

• Monolithic client and server• Limited communication model• Limited computation/data model• Ad hoc accounting model

Page 6: The software infrastructure of  SETI @ home II

PRC platform goals

Research lab X

University Y Public project Z

projects

applications

resource pool

• Participants install one program, select projects, specify constraints• Projects are autonomous• Advantages of a shared platform:

• Better instantaneous resource utilization• Better long-term resource utilization• Faster/cheaper for projects, software is better• Easier for projects to get participants• Participants learn more

Page 7: The software infrastructure of  SETI @ home II

Distributed computing platforms

• Academic and open-source– Globus– Cosm– XtremWeb– Jxta

• Commercial– Entropia– United Devices– Avaki

Page 8: The software infrastructure of  SETI @ home II

BOINC(Berkeley Open Infrastructure for Network

Computing)

• Overall structure• Storage model• Computation model• Programming interface• Operational interface• Participant’s view

Page 9: The software infrastructure of  SETI @ home II

Overall structure

• Project:

• Participant:

Scheduling server (C++)

BOINC DB(MySQL)

Project work manager

data server (HTTP)

App agentApp agent

App agent

data server (HTTP)data server

(HTTP)

Web interfaces

(PHP)

Core agent (C++)

lib

Page 10: The software infrastructure of  SETI @ home II

Storage model• Files: input, output, executables• Created by client or project• Files are immutable• File transfer by HTTP• File attributes:

– Name– URL list– Persistent– Upload-when-present– executable– MD5 checksum– Digital signature

<file_info> <name>protein_db.12</name> <persistent/> <url>http://a.b/c</url> <url>ftp://x.y/z</url> <md5_cksum>fw7398h</md_cksum> <nbytes>4782747</nbytes></file_info>

Page 11: The software infrastructure of  SETI @ home II

File management

• Implicit– Executables, input and output files are

transferred pursuant to computation• Explicit

– Clients report persistent files– Scheduling server maintains DB of

files on active hosts– Project can request upload, download,

delete

Page 12: The software infrastructure of  SETI @ home II

Workunits

<file_info> <name>out123</name> <url>http://…</url></file_info><workunit> <file_assoc> <file_name>out123</file_name> <app_name>input</app_name> </file_assoc></workunit>

• Represents inputs to a computation

• Components:– Cmdline args, environment vars– Expected resource usage– Description of input files

Page 13: The software infrastructure of  SETI @ home II

Results

<file_info> <name>out123</name> <generated_locally/> <upload_when_present/> <url>http://…</url></file_info><result> <file_assoc> <file_name>out123</file_name> <fd>1</fd> </ file_assoc ></result>

<file_info> <name>out123</name> <url>http://…</url> <md5_cksum>182aed847</md5_cksum></file_info>

• Represents results of a computation• Components:

– Which host did the computation– Exit status– Stderr output– CPU time– Output file description

• Template• Actual

Page 14: The software infrastructure of  SETI @ home II

Work sequences(long computations with big footprints)

• Results can be linked into sequences• Result is sent to host that handled

predecessor• If result times out, sequence is

shifted to another host

Upload state

Check for abort

Page 15: The software infrastructure of  SETI @ home II

Hosts and scheduling

• Host measurements– CPU performance (integer/FP/memory)– RAM, cache, disk free/total– On/idle/connected statistics– Network bandwidth statistics

• Workunit properties– RAM/disk/computation requirements

• Scheduling policy– Client: project quotas; high/low water marks– Server: workunit feasibility test; prioritization

Page 16: The software infrastructure of  SETI @ home II

Accounting and result validation

• Standardized unit of credit (CPeUro?)– CPU time * (int+FP+mem)

• Result validation (optional):– Compare redundant results, flag

incorrect results• Granted credit:

– Minimum of claimed credit among correct results

Page 17: The software infrastructure of  SETI @ home II

Programming interfaces

• Application– May be multi-file; any executable– API for interaction with core client (optional)– Checkpoint/restart: MFILE class– Graphics: render to shared memory

• Software development tools– Version management– Web-based bug tracking

Page 18: The software infrastructure of  SETI @ home II

Operational interfaces

• Operations– Add/manage app versions– Create workunits/results– Query results– Query client problems

• Interfaces– C++ libraries– Scriptable apps– Web-based

Page 19: The software infrastructure of  SETI @ home II

Participant preferences

• Examples:– Work only while computer idle– Confirm before connecting– Don’t work if running on batteries– High, low water marks– Limits on disk space, bandwidth– Application-specific preferences– List of projects + authenticators + % allocation

• Edited via Web interface• Can define multiple “preference sets”

Page 20: The software infrastructure of  SETI @ home II

Participation

• Initial project registration:– Create account on project web site– Authenticator is emailed– Install core client, enter authenticator

• Subsequent projects:– Create account on project web site– Authenticator is emailed– Add project to preferences on home

site

Page 21: The software infrastructure of  SETI @ home II

Core client• Goals

– Concurrent communicate/compute– Obey user preferences– Application, screensaver or service– Multi-platform; multiprocessor-

capable

• FSM structurefile transfers

running applications

wait()

active sockets

select()

HTTP transactions

main looppoll

Scheduler requests

Page 22: The software infrastructure of  SETI @ home II

Conclusion• BOINC features

– Multiproject, multi-app open PRC platform– Simple/small but general

• BOINC status– Mostly feature-complete– Client runs on Linux, Solaris, Windows,

MacOS X– http://boinc.sourceforge.net

• Projects:– SETI@home Arecibo (later this year)– Other SETI@home (Parkes etc.)– Climate modeling, other science projects– Genetic art