1 grid computing in hong kong dr. cho-li wang systems research group department of computer science...

39
1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

Upload: jerome-pulsipher

Post on 01-Apr-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

1

Grid Computing in Hong Kong

Dr. Cho-Li WangSystems Research Group

Department of Computer Science and Information SystemsThe University of Hong Kong

Page 2: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

2

Agenda

Grid computing – a simple picture The Hong Kong Grid SRG Projects

SLIM, ODGPC G-JavaMPI JESSICA2 LOTS DSM for Grid

Summary and Conclusion

Page 3: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

3

Grid Computing : A Simple Picture

Grid Computing

Access toremote resources

via standard protocols

forcross-domain collaboration

CPU power,Memory,Network,Storage…

Data..Services..

Resource providers

End users

Much like “utilities” in our daily lives – electricity, water, etc.

Advantages: Cost-effectiveness Platform extensibility Convenience (P&P)

Page 4: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

4

Grid Computing in Hong Kong -- The Hong Kong Grid

The experimental grid in HK

Supported under HKU Foundation Seed Grant http://www.hkgrid.org/

Page 5: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

5

The Hong Kong Grid (HKGrid)

Goals: to construct and make available a grid test bed

to facilitate the development of grid middleware and applications by local industry and institutions in Hong Kong and their partners in the region

to demonstrate the benefits of adopting grid technologies and to showcase any outstanding results of development or application

HKGrid provides a platform for its members to experiment with various research prototypes and pilot applications

Page 6: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

6

HKGrid - Current constituents

Institutions Computing facilities

City University of HK Service gateway (2-way Xeon SMP)

HK Baptist University 2-way Xeon SMP x 64(#300 in TOP500, 6/2003)

HK University of Science and Technology

4-way SMP cluster

The HK Polytechnic University Service gateway (2-way Xeon SMP)

The HK Institute of HPC Service gateway (2-way Xeon SMP)

HKU – Computer Centre 2-way Xeon SMP x 128(#240 in TOP500, 11/2003)

HKU – Department of CSIS Pentium 4 x 300(#340 in TOP500, 6/2003)

A 4 Tflop/s theoretical maximum computing power

Page 7: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

7

Grid Point Monitoring

with Ganglia

URL: http://gideon.csis.hku.hk/status/

Page 8: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

8

HKU Grid Point: Grid and Cluster Software

Gatekeepergideon.csis.hku.hk

Remote job submission

Gideon Ostrich Srgdell Real

- Globus Toolkit (GT) 2.0, 2.4, 3.0.1

Local Job Scheduler

IPC / Network communication

- OpenPBS 2.3.16 - Maui 3.2.5

-HPF, Fortran 90-C, C++, Java with MPI-JESSICA2 (HKU)

- MPICH-G2 1.2.3

Grid middleware

Job scheduling

Programming

Communication Lib

Page 9: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

9

Main Computing Facilities: HKU-CSIS Gideon 300 Cluster

Page 10: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

10

Research Projects in HKGrid

HKBU: Knowledge Grid (Autonomous grid service composition). HKPU: Peer-to-peer (P2P) grid, meta scheduler, fault tolerance HKUST: Development of sensor Grid infrastructure HKU

ETI: Modelling of Air Quality in Hong Kong (E-Business Technology Institute with the Environmental Protection Department, HKSAR)

Computer Centre : HKU campus grid ; scientific applications running across the ApGrid

CSIS : Robust Speech Recognition (J. Wu and Dr. Q. Huo) CSIS : Simulation for the DNA Shuffling Experiment (W.H. Hon and Dr. T.W. Lam) CSIS: Approximate String Matching on DNA Sequences (L.L. Cheng) CSIS: Whole Genome Alignment via Mutation-Sensitive Sequence Similarity (H.L.

Chan, N. Lu, and Dr. T.W. Lam) ME: Parallel Simulation of Turbulent Flow Model (Dr. C.H. Liu, Dept. of

Mechanical Engineering) CSIS : HKU Grid Point (863 Project: China National Grid) CSIS: Asia-Pacific Grid …..

Page 11: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

11

HKGrid – Connections

Links to China National Grid (CNGrid) and Asia-Pacific Grid (ApGrid) via CERNET and APAN

Internet2 connection to the Abilene backbone at Chicago, USA

Plays the role of a gateway for the other bigger grids

Page 12: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

12

China National Grid (CNGrid) : 863 Project

上海超级计算中心

中科院计算所

香港大学 (CSIS)

西安交通大学

中国科技大学

国防科技大学

中科院应用物理所

清华大学

China National Grid Participants

Supporting software :

VEGA (织女星 ) grid management system : dynamic service deployment, single-sign-on, data replication, and performance monitoring. Developed by Institute of Computing Technology, Chinese Academy of SciencesV.1.0 released 8

中科院计算所开发的网格系统软件已将计算所、华中科技大学 与香港大学网格节点连接在一起,通过 VEGA_GOS …

Page 13: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

13

ApGrid / PRAGMA TestbedApGrid / PRAGMA Testbed 10 countries 21 organizations 22 clusters 853 CPUs

Page 14: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

14

ApGrid Demon on The HKU School Open Day (Oct. 2003)

Page 15: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

15

Grid Research at HKU-CSIS

SRG Projects SLIM + ODGPC G-JavaMPI JESSICA2 LOTS DSM

Page 16: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

16

Our Goal

Utility computing: to aggregate and make use of distributed computing resources transparently

Traditional means: to utilize the dedicated HPC facilities distributed across institutions Performance and reliability are key

Pervasive means: any user can be resource provider (e.g., idle PCs, etc.) or consumer, or both Convenience and security are key

To construct an advanced grid computing platform to accommodate utility-like computing via traditional and

“pervasive” means

Page 17: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

17

Research at HKU – An Advanced Grid Computing Platform

G-JavaMPIJESSICA

LOTSODGPCSLIM

Load balancing

AGP

On-demand Grid point construction (ODGPC)

Research Issues

Single-system image

Performance and ReliabilityObjectives

User’s convenience

Grid point construction

Convenient system administration

(Programming Environment)

Page 18: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

18

SLIM

Single Linux Image Management

Page 19: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

19

SLIM

Utility computing decouples computing platforms (resources) and computing logic (applications)

I.e., a single platform can run completely different applications

Problem: different applications demand different execution environments (OS, shared libraries, supporting apps, etc.)

Hassles associated with managing execution environments (EE’s) in the resource provider side offset the benefits of resource sharing

SLIM is a network service for managing and constructing EE’s, and disseminating them to remote computing platforms

Page 20: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

20

SLIM – System design

How it works? A node sends a EE specification across the network to find the Boot

server Boot server delivers the requested Linux kernel Image server constructs an EE by collecting shared libraries, user

data, etc. Linux kernel boots, and contacts the Image Server to “mount” the

EE via a file synchronization protocol such as NFS Aggressive caching techniques are deployed to optimize

performance

Page 21: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

21

SLIM – Ongoing and future work

SLIM has been managing: the HKU-CSIS grid point (350 nodes) for

various grid research projects an addition 300+ lab machines for

teaching purpose (different courses have different requirements)

Future work To overcome the challenges in

deploying SLIM over broadband links Realizing the “pervasive utility

computing”

Page 22: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

22

/ usr/ local/ gt3.2OS image

SLIMserver

client clientclient

DHCP

client clientclient

SLIMserver

1

TFTP

2

SLIMserver

client clientclient

3

4

certificate

SLIMserver

1

CA server

client1

client1

client1

42

3

On-Demand Grid Point Construction (ODGPC)

1. Software installation at SLIM server 2. Client boots and obtains kernel

3. OS image/App disseminated 4. Process to generate certificates

Page 23: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

23

SLIM and ODGPC Performance Evaluation

Boot up 100 machines (Linux + GT3) : 6 minutes. Generate certificates for 100 machines (Step 4) : 30 minutes. Total time : 6 + 30 = 36 minutes

256 PCs < 5 minutes(OS only)

Page 24: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

24

SLIM – Key references

http://www.csis.hku.hk/~cmlee/slim/

C.M. Lee, R.S.C. Ho, D.H.F. Hung, C.L. Wang, and F.C.M. Lau, “Managing Execution Environments for Utility Computing,” Network Research Workshop 2004 (with APAN 2004), March, 2004.

(LinuxPilot 2004/04)

Page 25: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

25

G-JavaMPI

A grid-enabled Java-MPI system with dynamic load-balancing via process migration

Page 26: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

26

G-JavaMPI

A grid-enabled implementation of Java binding of MPI, supporting efficient MPI communication among distributed Java processes

Supports transparent Java process migration (through JVMDI) within and across grid points for balancing CPU and network loads

Communication-aware process migration policies based on: application’s communication pattern available network bandwidth on grid overlays

Page 27: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

27

G-JavaMPI – System design

(*)

Gatekeeper

(1)(1*)

LS

Gatekeeper(3*)

LS

Gatekeeper

(3)

LS

(2)

WAN

Migrating(restarting a new process through Globus remote job request with delegated user credentials and Java-MPI job credentials)

Java-MPI communication

Some legacymessages are redirectedduring migration

(2*)

JVM

M

Migration module resides in each JVM

Page 28: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

28

G-JavaMPI – Ongoing and future work

The migration mechanism has been implemented Future work targets at process migration policies

Goal: to offset performance pitfalls caused by heterogeneity through dynamic process migration

Sources of heterogeneity in grids CPU, network, runtime environments, etc.

CPU and network heterogeneities cause long “blocking” periods in cooperative processes, thus limiting the system throughput

G-JavaMPI aims to detect and eliminate “blocking” through process migration (e.g. to migrate a “bottleneck” process to a faster node, etc.)

Page 29: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

29

G-JavaMPI – Key references

L. Chen, C.L. Wang, and F.C.M. Lau, “A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports,” Journal of Computer Science and Technology (China), Vol. 18, No. 4, July 2003, pp. 505-514.

L. Chen, C.L. Wang, F.C.M. Lau, and R.K.K. Ma, “A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports,” International Workshop on Grid and Cooperative Computing (GCC-2002), December 26-28, 2002, Hainan, China, pp. 640-652.

Page 30: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

30

JESSICA2 : A Java-Enabled Single-System Image Computing Architecture

JESSICA2 is a distributed Java Virtual Machine (DJVM) which consists of a group of extended JVMs running on a distributed environment to support true parallel execution of a multithreaded Java application.

Java threads can freely move across node boundaries and execute in parallel to achieve more scalable high-performance computing using clusters

The JESSICA2 DJVM provides standard JVM services, that are compliant with the Java language specification, as if running on a single machine – Single System Image (SSI).

Page 31: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

31

JESSICA2 Architecture

Thread Migration

Global Object Space

JESSICA2JVM

A Multithreaded Java Program

JESSICA2JVM

JESSICA2JVM

JESSICA2JVM

JESSICA2JVM

JESSICA2JVM

Master Worker Worker Worker Worker Worker

JIT Compiler ModePortable Java Frame

Page 32: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

32

JESSICA2 Main Features

Transparent Java thread migration Runtime capturing and restoring of thread execution context. No source code modification; no bytecode instrumentation

(preprocessing); no new API introduced Enable dynamic load balancing on clusters

Full Speed Computation JITEE: cluster-aware bytecode execution engine Operated in Just-In-Time (JIT) compilation mode Zero cost if no migration

Transparent Remote Object Access Global Object Space : A shared global heap spanning all

cluster nodes Adaptive migrating home protocol for memory consistency +

various optimizing schemes. I/O redirection

Page 33: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

33

Ray Tracing on JESSICA2 (64 PCs)

Linux 2.4.18-3 kernel (Redhat 7.3)

64 nodes: 108 seconds

1 node: 4402 seconds ( 1.2 hour)

Speedup = 4402/108=40.75

Page 34: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

34

JESSICA – Key references

W.Z. Zhu , C.L. Wang, and F.C.M. Lau “A Lightweight Solution for Transparent Java Thread Migration in Just-in-Time Compilers,” The 2003 International Conference on Parallel Processing (ICPP-2003), pp. 465-472, Taiwan, Oct. 6-10, 2003

W.Z. Zhu, C.L. Wang and F.C.M. Lau, “JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support,” IEEE Fourth International Conference on Cluster Computing (CLUSTER 2002), Chicago, USA, September 23-26, 2002, pp. 381-388. 

M.J.M. Ma, C.L. Wang, F.C.M. Lau. “JESSICA: Java-Enabled Single-System-Image Computing Architecture,” Journal of Parallel and Distributed Computing, Vol. 60, No. 10, October 2000, pp. 1194-1222.

Page 35: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

35

LOTS

OS

H/W

LOTS

OS

H/W

LOTS

OS

H/W

LOTS

OS

H/W

LOTS

OS

H/W

Large Large Global Global Object Object SpaceSpace

LOTS: Large Object Space on Grid

A large software distributed memory system for Grid. Provides a global object space larger than the process space (4GB in 32-bit CPU) Uses local hard disk to store recently unused objects Scope Consistency + Home Migration to reduce redundant data traffic

Grid

Page 36: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

36

Summary

Performance G-JavaMPI, JESSICA, establish extensible grid platforms

(good for computation-intensive applications) Process/thread migration enables performance

optimization and load balancing LOTS supports shared memory programming

environment on large object space (easier to develop data grid applications)

Reliability G-JavaMPI migrates processes from failed machines SLIM helps construct platforms for failover

Convenience G-JavaMPI, JESSICA, and LOTS enable users to harness

distributed resources via traditional means SLIM and ODGPC simplify Grid point managements

Page 37: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

37

Conclusion

Grid/utility computing are relatively new paradigms that deserve further investigation

We address the performance, reliability, and user convenience issues in grid/utility computing

Our advanced grid computing platform (consisting of G-JavaMPI, JESSICA2, LOTS, and SLIM/ODGPC) is geared to deploy in the HKGrid for easy adoption of Grid technologies.

Page 38: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

38

Q&AThank you!

The SRGers (Photo: 12/2003)

Page 39: 1 Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

39

Reference

• Hong Kong Grid • http://www.hkgrid.org/

• Grid Computing Research Portal• http://grid.csis.hku.hk/

• The HKU Systems Research Group• http://www.srg.csis.hku.hk

VEGA Project http://vega.ict.ac.cn/

The HK Supercomputing Directory http://www.hkhpc.org/~SuperDir/