1 making services fault tolerant pat chan, michael r. lyu department of computer science and...
Post on 22-Dec-2015
214 Views
Preview:
TRANSCRIPT
1
Making Services Fault Tolerant
Pat Chan, Michael R. Lyu Department of Computer Science and EngineeringThe Chinese University of Hong Kong Miroslaw MalekDepartment of Computer Science and EngineeringHumboldt University Berlin
2
Outline Introduction Problem Statement Methodologies for Web Service
Reliability New Reliable Web Service Paradigm Road Map for Experiment Experimental Results and Discussion Conclusion
3
Introduction Service-oriented computing is becoming a
reality. Service-oriented Architectures (SOA) are
based on a simple model of roles. The problems of service dependability,
security and timeliness are becoming critical.
We propose experimental settings and offer a roadmap to dependable Web services.
4
Problem Statement Fault-tolerant techniques
Replication Diversity
Replication is one of the efficient ways for providing reliable systems by time or space redundancy.
Increasing the availability of distributed systems Key components are re-executed or replicated Protect against hardware malfunctions or transient system faults.
Another efficient technique is design diversity. By independently designing software systems or services with
different programming teams, Resort in defending against permanent software design faults.
We focus on the analysis of the replication techniques when applied to Web services.
A generic Web service system with spatial as well as temporal replication is proposed and investigated.
5
Methodologies for reliable Web services -- Redundancy Spatial redundancy
Static redundancy, all replicas are active at the same time and voting takes place to obtain a correct result.
Dynamic redundancy engages one active replica at one time while others are kept in an active or in standby state.
Temporal redundancy Redundant in time
6
Methodologies for reliable Web services -- Diversity
Protect redundant systems against common-mode failures
With different designs and implementations, common failure modes will probably cause different error effects.
N-version programming, recovery blocks…
7
Failure Response Stages of Web Services Fault confinement Fault detection Diagnosis Fail-over Reconfiguration Recovery Restart Repair Reintegration
8
Fault Confinement
Fault Detection Fault Detection
Failover Diagnosis
Online Offline
Reconfiguration
Recovery
Restart
Repair
Reintegration
9
Replication Manager
Web service selection algorithm
WatchDog
UDDI
Registry
WSDL
Web ServiceIIS
Application
Database
Web ServiceIIS
Application
Database
Web ServiceIIS
Application
Database
Client
Port
Application
Database
1. Create web services
2. Select primary web service (PWS)
3. Register
4. Look up
5. Get WSDL
6. Invoke web service
7. Keep check the availability of the PWS
8. If PWS failed, reselect the PWS.
9. Update the WSDL
Proposed Paradigm
10
RM sends message to the Web Service
Reselect a primary Web Service
Do not get reply
Map the new address to the WSDL
System Fail
Get reply
All Service failed
Work Flow of the Replication Manager
11
Road Map for Experiment Research
Redundancy in time Redundancy in space
SequentiallyParallelMajority voting using N modular
redundancyDiversified version of different
services
12
Experiments
A series of experiments are designed and performed for evaluating the reliability of the Web service, single service without replication,single service with retry or reboot and, service with spatial replication.
We will also perform retry or failover when the Web service is down.
13
Summary of the experiments
None Retry/Reboot
Failover Both (hybrid)
Single service, no retry
0 -- -- --
Single service with retry
-- 1 -- --
Single service with reboot
-- 2 -- --
Spatial replication
-- -- 3 4
14
Parameters of the Experiments
Parameters Current setting/metric
Request frequency 1 req/min
Polling frequency 5 ms
Number of replicas 5
Client timeout period for retry 10 s
Failure rate λ # failures/hour
Load (profile of the program) % or load function
Reboot time 10 min
Failover time 1 s
15
Experimental Results
Experiments over 360 hour periods (43200 reqs)
Number of failures Normal
Number of failuresServer busy
Number of failuresServer reboots periodically
Exp 0 4928 6130 6492
Exp 1 2210 2327 2658
Exp 2 2561 3160 3323
Exp 3 1324 1711 1658
Exp 4 1089 1148 1325
Retry11.97% to 4.93%
Reboot11.97% to 6.44%
Failover11.97% to 3.56%Retry and Failover11.97% to 2.59%
Reliability Model Parameters
ID Description Value
λn Network failure rate 0.02
λ* Web service failure rate 0.228
λ1 Resource problem rate 0.142
λ2 Entry point failure rate 0.150
μ* Web service repair rate 0.286
μ1 Resource problem repair rate 0.979
μ2 Entry point failure repair rate 0.979
C1 Probability that the RM responds on time 0.9
C2 Probability that the server reboots successfully 0.9
23
Conclusion
Surveyed replication and design diversity techniques for reliable services.
Proposed a hybrid approach to improving the availability of Web services.
Carried out a series of experiments to evaluate the availability and reliability of the proposed Web service system.
N-Version Programming may finally become commercially viable in service environment.
top related