parallel computing using condor on windows pcs peng wang and corey shields research and academic...
TRANSCRIPT
![Page 1: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services](https://reader035.vdocuments.site/reader035/viewer/2022081809/5697bfd41a28abf838cacf60/html5/thumbnails/1.jpg)
Parallel Computing using Condor on Windows PCs
Peng Wang and Corey ShieldsResearch and Academic Computing DivisionUniversity Information Technology Services
Indiana University
![Page 2: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services](https://reader035.vdocuments.site/reader035/viewer/2022081809/5697bfd41a28abf838cacf60/html5/thumbnails/2.jpg)
Problem Description
Turn Windows desktop systems in STC labs (around 2000) into a parallel scientific computer
![Page 3: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services](https://reader035.vdocuments.site/reader035/viewer/2022081809/5697bfd41a28abf838cacf60/html5/thumbnails/3.jpg)
DiscussionParallel applications need coordination through message passingMPI does not handle ephemeral processes wellMultiplexing communication among processesPorts brokered among multiple parallel sessions
![Page 4: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services](https://reader035.vdocuments.site/reader035/viewer/2022081809/5697bfd41a28abf838cacf60/html5/thumbnails/4.jpg)
What do we have ?Condor NT, vanilla universe
match-making, file transfer, fair sharing, job submission, suspension, preemption, restart, securityTest application – fastDNAml-p
Parallel application, master-worker model, small granularity of work
![Page 5: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services](https://reader035.vdocuments.site/reader035/viewer/2022081809/5697bfd41a28abf838cacf60/html5/thumbnails/5.jpg)
How we did itSimple Message Brokering Library (SMBL)Process and Port Manager (PPM)A mechanism for users to submit jobs (web portal)
![Page 6: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services](https://reader035.vdocuments.site/reader035/viewer/2022081809/5697bfd41a28abf838cacf60/html5/thumbnails/6.jpg)
SMBLAn IO multiplexing server in charge of message delivery for each parallel session (serialize communication)SMBL client library implements selected MPI-like callsBoth the server and the client library are based on a TCP socket abstraction library
![Page 7: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services](https://reader035.vdocuments.site/reader035/viewer/2022081809/5697bfd41a28abf838cacf60/html5/thumbnails/7.jpg)
Process and Port ManagerAssigns port to each of the SMBL server processstart the SMBL server and application processes on demanddirect workers to their servers
![Page 8: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services](https://reader035.vdocuments.site/reader035/viewer/2022081809/5697bfd41a28abf838cacf60/html5/thumbnails/8.jpg)
The PortalApache basedPHP web interfaceCreates and submits the condor submit files
![Page 9: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services](https://reader035.vdocuments.site/reader035/viewer/2022081809/5697bfd41a28abf838cacf60/html5/thumbnails/9.jpg)
The Big Picture
The shaded box indicates components hosted on multiple desktop computers
![Page 10: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services](https://reader035.vdocuments.site/reader035/viewer/2022081809/5697bfd41a28abf838cacf60/html5/thumbnails/10.jpg)
Statistics
Red: total owner Blue: total idle Green: total Condor
![Page 11: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services](https://reader035.vdocuments.site/reader035/viewer/2022081809/5697bfd41a28abf838cacf60/html5/thumbnails/11.jpg)
Scalability Issues
Needed big serverAdjusted condor_config
MAX_JOBS_RUNNING 1000
SHADOW_SIZE_ESTIMATE 900KB MAX_STARTD_LOG 640KB
Lost workers because of per-process file descriptor limit (1024)
![Page 12: Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services](https://reader035.vdocuments.site/reader035/viewer/2022081809/5697bfd41a28abf838cacf60/html5/thumbnails/12.jpg)
SummaryBuilt a large parallel scientific computing facility using CondorBuilt parallel message passing library to deal with ephemeral resources Built port broker to handle multiple parallel sessionsBuilt web portalIt is open source, visit: http://smbl.sourceforge.net