service, grid service and workflow xian-he sun scalable computing software laboratory illinois...

25
Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology [email protected] Nov. 30, 2006 Fermi Laboratory

Upload: maria-norris

Post on 11-Jan-2016

227 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Service, Grid Service and Workflow

Xian-He SunScalable Computing Software Laboratory

Illinois Institute of Technology

[email protected]

Nov. 30, 2006 Fermi Laboratory

Page 2: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Scalable Computing Software (SCS) Lab.

Parallel Computers at SCS

NU-EUIC

ANL

NCSA/UIUC

Uof C

NU-C

Star TapIIT

OMNII-WIRE

Distributed Optical Testbed(Grid)

Pervasive Computing Environments at SCS

Page 3: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Reduced Complexity

& Cost

Higher Quality of Service

Increased Productivity

IncreasedEfficiency

Grid and Utility Computing

Improved Resiliency

Mimic the electrical power grid

Page 4: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Service Oriented Computing

Convergence of Core Technology Standards allows Common base for Business and Technology Services

GridOGSi

GT2GT1

Web HTTPWSDL,

SOAP

WS-*

Have beenconverging WSRF

Started far apart in

applications &

technologyXML

BPEL

WS-I Compliant

TechnologyStack

• Internet computing: Web service

• Grid computing: Grid service and is merging with WS

• Pervasive computing: Human centered service

• Mobile computing: Phone service

Computing as a service

Page 5: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Information Service

Challenge: Computing as a Service

• SOC is about separation, sharing, and workflow

Sharing (service/resource)• Modeling • Scheduling: system vs application,

replica vs consistency • QoS: external task vs local jobs • Security

Separation (service)• Abstraction: personalized service• Primary service: Automatic

coding• Separation of concern• Separation of resource:

Virtualization

Workflow Management

Page 6: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Service Oriented Architecture (SOA)

• SOA is the special software architecture with services are the key building blocks

• SOA is basically an application development style using services

• They are principles or patterns to develop application using services

The concept of SOA comes from software researchSOA is developed from IT experience over 30 years

Page 7: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

What is SOA ? – more detail

• An architecture that implements business functionality as a set of shared, reusable services

• Way of designing a software system and its surrounding environment to provide services either to end-user applications, to executable business processes or to other services through published and discoverable service interfaces.

• Aggregation of components for a business driver• Extended bus with shared services• service interface being defined separately from

implementation and provides service encapsulation and platform/language independence.

Page 8: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

The General Service Oriented Architecture (SOA)

• Service Provider– Provides a stateless,

location transparent business service

• Service Registry– Allows service

consumers to locate service providers that meet required criteria

• Service Consumer– Uses service providers to

complete business processes

Service Requestor

Service Provider

Service Registry

PublishFind

Bind

Publish-Find-Bind mechanism

Page 9: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

What is Web Service?

• A software component• Identified by unique URI• Who can be discovered by

other soft.comp• web services are a stack

of emerging standards that describe a service-oriented, component-based architecture

Page 10: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Key Players -

• Do you know me ??– Describe by – WSDL

• Do you want to find me ??– Discover in – UDDI

• Do you want to communicate with me??– Communicate through– SOAP/XML

Page 11: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Web Service Components

ServiceProvider

ServiceContract

Service

ServiceConsumer

Client

ServiceRegistry

RegisterFind

Bind

UDDIUDDI

WSDL

SOAP

Page 12: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

The Grid Computing

• Infrastructure (“middleware” & “services”) for establishing, managing, and evolving multi-organizational federations

• Mechanisms for creating and managing workflow within such federations

• Three key criteria– Coordinates distributed resources …– using standard, open, general-purpose protocols and

interfaces …– to deliver non-trivial qualities of service.

Page 13: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Data Grids for High Energy Physics

Tier2 Centre ~1 TIPS

Online System

Offline Processor Farm

~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPSFrance Regional Centre

Italy Regional Centre

Germany Regional Centre

InstituteInstituteInstituteInstitute ~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec or Air Freight (deprecated)

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 2Tier 2

Tier 4Tier 4

1 TIPS is approximately 25,000

SpecInt95 equivalents

www.griphyn.org www.ppdg.net www.eu-datagrid.org

Page 14: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Incr

ease

d fu

nctio

nalit

y,st

anda

rdiz

atio

n

Customsolutions

1990 1995 2000 2005

Open GridServices Arch

Real standardsMultiple implementations

Web services, etc.

Managed sharedvirtual systems

Computer science research

Globus Toolkit

Defacto standardSingle implementation

Internetstandards

The Emergence of Open Grid Standards

2010

Page 15: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Open Grid Services Architecture

• Everything is a service• A standard substrate: the Grid service

– A Grid service is a Web service– Standard interfaces and behaviors that address key

distributed system issues: naming, service state, lifetime, notification

• Supports standard service specifications– Agreement, data access & integration, workflow, security,

policy, diagnostics, etc.– Target of current & planned GGF efforts

• Supports arbitrary application-specific services based on these & other definitions

Page 16: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

SOA and Web Service

• SOA mostly defined and explained with some accompanied implementations

• Web services are a stack of emerging standards that describe a service-oriented, component-based architecture

• Web services are limited SOA, but they are the only available best practical solution till now

• SOA and Web service are still evolving each other

• Web service cannot support all the computing service in its current form

Page 17: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Grid and Web Service• Grid? What is the Grid?

– Standard, technology, infrastructure, application – Globus or general distributed computing ?

• Standard– Merging with Web service

• Application– Large scientific application vs. light business

application • Technology

– Resource sharing vs. service sharing, resource sharing vs. pay for service, coordinate virtual organizations vs. create VOs (very hard), stateful vs. stateless

Page 18: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Information Service

Workflow and LQCD Workflow

• All SOC need the management of workflow

• Is LQCD computing a SOC?

• Does LQCD need to follow Web service standard?

• If yes, we need to support Grid service (GT4)• If no, we do not

Page 19: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Workflow template identification& generation Tools

Users

Workflow Design

Build Time (user)

Run Time (system)

Workflow Execution & Control

Interaction with computing Resources

workflow change

LQCD Middleware

Resources

Interaction with Information Services

Information Services

Performance Info Service

Reliability Info Service

Workflow Enactment Service

Workflow Scheduling

Data Movement Fault Management

Workflow Instantiation

LQCD Workflow System

Page 20: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Workflow Management Systems

• Comparison Functionality– Workflow template identification & generation Tools– Workflow specification– Workflow scheduling & rescheduling– Fault Management– Data Movement– Interaction with monitor system

• Target Systems– Askalon– Kepler – Grid Physics Network

Page 21: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Current Result: the GHS System

The GHS (Grid Harvest Service) system • GHS is a long-term, application-level performance

evaluation and task scheduling system specially designed to handle the resources availability issues for solving large-scale applications.

• The resource availability could be due to contention or due to fault. The two different causes require different performance modeling and prediction

• Support rescheduling

Page 22: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

GHS System Design Structure

Task Partition Task SchedulingTask

ReschedulingTask-Execution

Application Monitoring

Reservation Compete Best-Effort

CPU Network Memory

Computation Communication

Scheduling

Prediction

Modeling

Measurement

Resource Management

System Monitoring

System-level Prediction

Application-level Prediction

Page 23: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Rescheduling Algorithm

Measure the prediction error of the system utilization, PU(k)

PU(k) > threshold ? NO

Find the best machine or machine set for task reallocation

Calculate the expectation of T(reassign) and T(original): E(R) and E(O)

E(O) - E(R) > 0 ?

Task Reallocation

Running application until next monitor period

NO

The reason of rescheduling

• Availability pattern change

• Fault tolerance• New jobs arrive

• Multi-campaign• New milestones

Page 24: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Automated Deployment of Meta-task

• APST software– AppleS scheduling– NWS prediction

• Integrating GHS prediction and scheduling into APST– Modify the MetricType and ServiceType data structure

in the Meta-data Bookkeeper– Add GHS server to provide information service – Add GhsMetataskSched()– Modify XmlFile parser in the Controller component

Page 25: Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology sun@iit.edu Nov. 30, 2006 Fermi

Software Released • http://www.meta.cs.iit.edu/~ghs • GHS 1.0

– Functionalities for performance prediction, measurement, task allocation, and task scheduling

• GHS-APST 1.0 – Integrate GHS prediction and scheduling into APST execution

management– Add GHS server and GHS daemons for performance data

collection and inquiry– Unchanged user interface

• apstd –heuristc=ghs

• Tested on SunOS 5.9 and Linux 2.4.20 • Releases are for contention availability, fault availability

is a work in progress.