1 alexandru v staicu 1, jacek r. radzikowski 1 kris gaj 1, nikitas alexandridis 2, tarek el-ghazawi...
TRANSCRIPT
1
Alexandru V Staicu1, Jacek R. Radzikowski1
Kris Gaj1, Nikitas Alexandridis2, Tarek El-Ghazawi2
1 George Mason University2 George Washington University
Effective Use of Networked Reconfigurable Resources
http://ece.gmu.edu/lucite
2
Problem:
• Reconfigurable resources expensive and underutilized
• Many of these resources available over the network
• It is desirable to leverage networked reconfigurable resources to help other users within the same organization
3
Tasks 1, 2, 3
Task 3
Task 1
Execution Host 1
ExecutionHost 2
Execution Host 3
Master HostSubmission Host
Task 2
Approach: Adapt and use a Job Management System
4
Approach:
• Select the most suitable existing Job Management System (JMS)
• Extend this JMS to recognize and utilize reconfigurable resources
- identify and define functional requirements- rank known systems according to these requirements- identify which JMS is the easiest to extend
- add new dynamic resources- configure scheduling to be based on these new resources
5
Tasks 1, 2, 3
Task 3
Task 1
Execution Host 1
ExecutionHost 2
Execution Host 3
Master Host
Submission Host
Task 2
Networked Reconfigurable Resource Management System
FPGAboards
6
Myrinet SAN/LAN
Switch
WILDFORCE
Dell
WILDSTAR
Dell
SLAAC
Dell
WILDSTAR
Dell
WILDFORCE
Dell Sparc 10
SLAAC Research Reference Platform
Ethernet Intelligent Hub 100
Mbps
Heterogeneous network with FPGA-based accelerators
Dell HP
Sparc 20 DellGateway
SLAAC WILDSTAR
WILDFORCE SLAAC
Ethernet Intelligent Hub 100
Mbps
7
Functional units of a typical Job Management System
jobs & their requirements
UserServer
Job SchedulerResourceMonitor
availableresources
resource requirements
scheduling policies
JobDispatcherresource allocation
and job execution
Resource Manager
8
Classification of Investigated Systems (1)
Centralized JMS
DistributedJMS w/o a Central Scheduler
DistributedOperating
System
• LSF• CODINE• PBS• Condor• RES
• Globus• Legion• NetSolve
• MOSIX
9
ParameterStudy
Scheduler
ResourceMonitor andForecaster
DistributedComputingInterface
• Compaq DCE• AppLES • NWS
Classification of Investigated Systems (2)
10
Operating system, flexibility, user interface
LSF Codine PBS CONDOR RES
Distribution
Source code
OS Support
User Interface
SolarisLinuxTru64NT
GUI &CLI
CLI
com pub pub/com pub gov
GUI &CLI
GUI &CLI
GUI &CLI
11
Scheduling and Resource Management
LSF Codine PBS CONDOR RES
Batch jobs
Interactive jobs
Parallel jobs
Accounting
12
Efficiency and Utilization
LSF Codine PBS CONDOR RES
Stage-in andstage-out
Timesharing
Process migration
Dynamic loadbalancing
Scalability
13
Fault Tolerance and Security
LSF Codine PBS CONDOR RES
Checkpointing
Daemon fault recovery
Authentication
Authorization
14
Documentation and Technical Support
LSF Codine PBS CONDOR RES
Documentation
Technicalsupport
15
JMS features supporting extension to reconfigurable hardware
• capability to define new dynamic resources
• strong support for stage-in and stage-out- configuration bitstreams- executable code- input/output data
• support for Windows NT and Linux
16
Ranking of Centralized Job Management Systems (1)
Capability to define new dynamic resources:
Excellent: LSF, PBS, CODINEMore difficult: CONDOR, RES
Stage-in and stage-out:
Excellent: LSF, PBSLimited: CONDORNo: CODINE, RES
17
Ranking of Centralized Job Management Systems (2)
Overall suitability to extend to reconfigurable hardware:
1. LSF2. CODINE3. PBS4. CONDOR5. RES
without changing the JMS source code
requires changes to the JMS source code
18
Submission host
LIM
Batch API
Master host
MLIM
MBD
Execution host
SBD
Child SBD
LIM
RES
User job
Extension of LSF to reconfigurable hardware (1)Operation of LSF
LIM – Load Information ManagerMLIM – Master LIMMBD – Master Batch DaemonSBD – Slave Batch DaemonRES – Remote Execution Server
queue1
2
3
45
6 7
89
10
11
12
13
Loadinformation
otherhosts
otherhosts
bsub app
19
Extension of LSF to reconfigurable hardware(2)
Submission host
LIM
Batch API
Master host
MLIM
MBD
Execution host
SBD
Child SBD
LIM
RES
User job
ELIM – External Load Information ManagerACS API – Adaptive Computing Systems API
queue1
2
3
45
6 7
89
10
11
12
13
Loadinformation
otherhosts
otherhosts
bsub app
ELIM
ACS API
14FPGAboard
Statusof theboard
20
Conclusions (1)
• 12 systems evaluated using 25 functional requirements + the suitability of extension to support reconfigurable hardware
• LSF, CODINE, PBS, and Condor ranked the highest in the functional requirements
• LSF, CODINE, and PBSPro found easy to extend without changes in their source codes
• LSF most suitable to support reconfigurable hardware
21
• General software architecture of the extended system developed
• Experimental developments, verification and performance evaluation of the extended system in progress
Conclusions (2)