service, grid service and workflow xian-he sun scalable computing software laboratory illinois...
TRANSCRIPT
Service, Grid Service and Workflow
Xian-He SunScalable Computing Software Laboratory
Illinois Institute of Technology
Nov. 30, 2006 Fermi Laboratory
Scalable Computing Software (SCS) Lab.
Parallel Computers at SCS
NU-EUIC
ANL
NCSA/UIUC
Uof C
NU-C
Star TapIIT
OMNII-WIRE
Distributed Optical Testbed(Grid)
Pervasive Computing Environments at SCS
Reduced Complexity
& Cost
Higher Quality of Service
Increased Productivity
IncreasedEfficiency
Grid and Utility Computing
Improved Resiliency
Mimic the electrical power grid
Service Oriented Computing
Convergence of Core Technology Standards allows Common base for Business and Technology Services
GridOGSi
GT2GT1
Web HTTPWSDL,
SOAP
WS-*
Have beenconverging WSRF
Started far apart in
applications &
technologyXML
BPEL
WS-I Compliant
TechnologyStack
• Internet computing: Web service
• Grid computing: Grid service and is merging with WS
• Pervasive computing: Human centered service
• Mobile computing: Phone service
Computing as a service
Information Service
Challenge: Computing as a Service
• SOC is about separation, sharing, and workflow
Sharing (service/resource)• Modeling • Scheduling: system vs application,
replica vs consistency • QoS: external task vs local jobs • Security
Separation (service)• Abstraction: personalized service• Primary service: Automatic
coding• Separation of concern• Separation of resource:
Virtualization
Workflow Management
Service Oriented Architecture (SOA)
• SOA is the special software architecture with services are the key building blocks
• SOA is basically an application development style using services
• They are principles or patterns to develop application using services
The concept of SOA comes from software researchSOA is developed from IT experience over 30 years
What is SOA ? – more detail
• An architecture that implements business functionality as a set of shared, reusable services
• Way of designing a software system and its surrounding environment to provide services either to end-user applications, to executable business processes or to other services through published and discoverable service interfaces.
• Aggregation of components for a business driver• Extended bus with shared services• service interface being defined separately from
implementation and provides service encapsulation and platform/language independence.
The General Service Oriented Architecture (SOA)
• Service Provider– Provides a stateless,
location transparent business service
• Service Registry– Allows service
consumers to locate service providers that meet required criteria
• Service Consumer– Uses service providers to
complete business processes
Service Requestor
Service Provider
Service Registry
PublishFind
Bind
Publish-Find-Bind mechanism
What is Web Service?
• A software component• Identified by unique URI• Who can be discovered by
other soft.comp• web services are a stack
of emerging standards that describe a service-oriented, component-based architecture
Key Players -
• Do you know me ??– Describe by – WSDL
• Do you want to find me ??– Discover in – UDDI
• Do you want to communicate with me??– Communicate through– SOAP/XML
Web Service Components
ServiceProvider
ServiceContract
Service
ServiceConsumer
Client
ServiceRegistry
RegisterFind
Bind
UDDIUDDI
WSDL
SOAP
The Grid Computing
• Infrastructure (“middleware” & “services”) for establishing, managing, and evolving multi-organizational federations
• Mechanisms for creating and managing workflow within such federations
• Three key criteria– Coordinates distributed resources …– using standard, open, general-purpose protocols and
interfaces …– to deliver non-trivial qualities of service.
Data Grids for High Energy Physics
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm
~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 2Tier 2
Tier 4Tier 4
1 TIPS is approximately 25,000
SpecInt95 equivalents
www.griphyn.org www.ppdg.net www.eu-datagrid.org
Incr
ease
d fu
nctio
nalit
y,st
anda
rdiz
atio
n
Customsolutions
1990 1995 2000 2005
Open GridServices Arch
Real standardsMultiple implementations
Web services, etc.
Managed sharedvirtual systems
Computer science research
Globus Toolkit
Defacto standardSingle implementation
Internetstandards
The Emergence of Open Grid Standards
2010
Open Grid Services Architecture
• Everything is a service• A standard substrate: the Grid service
– A Grid service is a Web service– Standard interfaces and behaviors that address key
distributed system issues: naming, service state, lifetime, notification
• Supports standard service specifications– Agreement, data access & integration, workflow, security,
policy, diagnostics, etc.– Target of current & planned GGF efforts
• Supports arbitrary application-specific services based on these & other definitions
SOA and Web Service
• SOA mostly defined and explained with some accompanied implementations
• Web services are a stack of emerging standards that describe a service-oriented, component-based architecture
• Web services are limited SOA, but they are the only available best practical solution till now
• SOA and Web service are still evolving each other
• Web service cannot support all the computing service in its current form
Grid and Web Service• Grid? What is the Grid?
– Standard, technology, infrastructure, application – Globus or general distributed computing ?
• Standard– Merging with Web service
• Application– Large scientific application vs. light business
application • Technology
– Resource sharing vs. service sharing, resource sharing vs. pay for service, coordinate virtual organizations vs. create VOs (very hard), stateful vs. stateless
Information Service
Workflow and LQCD Workflow
• All SOC need the management of workflow
• Is LQCD computing a SOC?
• Does LQCD need to follow Web service standard?
• If yes, we need to support Grid service (GT4)• If no, we do not
Workflow template identification& generation Tools
Users
Workflow Design
Build Time (user)
Run Time (system)
Workflow Execution & Control
Interaction with computing Resources
workflow change
LQCD Middleware
Resources
Interaction with Information Services
Information Services
Performance Info Service
Reliability Info Service
Workflow Enactment Service
Workflow Scheduling
Data Movement Fault Management
Workflow Instantiation
LQCD Workflow System
Workflow Management Systems
• Comparison Functionality– Workflow template identification & generation Tools– Workflow specification– Workflow scheduling & rescheduling– Fault Management– Data Movement– Interaction with monitor system
• Target Systems– Askalon– Kepler – Grid Physics Network
Current Result: the GHS System
The GHS (Grid Harvest Service) system • GHS is a long-term, application-level performance
evaluation and task scheduling system specially designed to handle the resources availability issues for solving large-scale applications.
• The resource availability could be due to contention or due to fault. The two different causes require different performance modeling and prediction
• Support rescheduling
GHS System Design Structure
Task Partition Task SchedulingTask
ReschedulingTask-Execution
Application Monitoring
Reservation Compete Best-Effort
CPU Network Memory
Computation Communication
Scheduling
Prediction
Modeling
Measurement
Resource Management
System Monitoring
System-level Prediction
Application-level Prediction
Rescheduling Algorithm
Measure the prediction error of the system utilization, PU(k)
PU(k) > threshold ? NO
Find the best machine or machine set for task reallocation
Calculate the expectation of T(reassign) and T(original): E(R) and E(O)
E(O) - E(R) > 0 ?
Task Reallocation
Running application until next monitor period
NO
The reason of rescheduling
• Availability pattern change
• Fault tolerance• New jobs arrive
• Multi-campaign• New milestones
Automated Deployment of Meta-task
• APST software– AppleS scheduling– NWS prediction
• Integrating GHS prediction and scheduling into APST– Modify the MetricType and ServiceType data structure
in the Meta-data Bookkeeper– Add GHS server to provide information service – Add GhsMetataskSched()– Modify XmlFile parser in the Controller component
Software Released • http://www.meta.cs.iit.edu/~ghs • GHS 1.0
– Functionalities for performance prediction, measurement, task allocation, and task scheduling
• GHS-APST 1.0 – Integrate GHS prediction and scheduling into APST execution
management– Add GHS server and GHS daemons for performance data
collection and inquiry– Unchanged user interface
• apstd –heuristc=ghs
• Tested on SunOS 5.9 and Linux 2.4.20 • Releases are for contention availability, fault availability
is a work in progress.