parallel computing using condor on windows pcs peng wang and corey shields research and academic...

12
Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services Indiana University

Upload: georgiana-phillips

Post on 21-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services

Parallel Computing using Condor on Windows PCs

Peng Wang and Corey ShieldsResearch and Academic Computing DivisionUniversity Information Technology Services

Indiana University

Page 2: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services

Problem Description

Turn Windows desktop systems in STC labs (around 2000) into a parallel scientific computer

Page 3: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services

DiscussionParallel applications need coordination through message passingMPI does not handle ephemeral processes wellMultiplexing communication among processesPorts brokered among multiple parallel sessions

Page 4: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services

What do we have ?Condor NT, vanilla universe

match-making, file transfer, fair sharing, job submission, suspension, preemption, restart, securityTest application – fastDNAml-p

Parallel application, master-worker model, small granularity of work

Page 5: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services

How we did itSimple Message Brokering Library (SMBL)Process and Port Manager (PPM)A mechanism for users to submit jobs (web portal)

Page 6: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services

SMBLAn IO multiplexing server in charge of message delivery for each parallel session (serialize communication)SMBL client library implements selected MPI-like callsBoth the server and the client library are based on a TCP socket abstraction library

Page 7: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services

Process and Port ManagerAssigns port to each of the SMBL server processstart the SMBL server and application processes on demanddirect workers to their servers

Page 8: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services

The PortalApache basedPHP web interfaceCreates and submits the condor submit files

Page 9: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services

The Big Picture

The shaded box indicates components hosted on multiple desktop computers

Page 10: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services

Statistics

Red: total owner Blue: total idle Green: total Condor

Page 11: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services

Scalability Issues

Needed big serverAdjusted condor_config

MAX_JOBS_RUNNING 1000

SHADOW_SIZE_ESTIMATE 900KB MAX_STARTD_LOG 640KB

Lost workers because of per-process file descriptor limit (1024)

Page 12: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services

SummaryBuilt a large parallel scientific computing facility using CondorBuilt parallel message passing library to deal with ephemeral resources Built port broker to handle multiple parallel sessionsBuilt web portalIt is open source, visit: http://smbl.sourceforge.net