giray kömürcü

29
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü

Upload: damisi

Post on 19-Jan-2016

43 views

Category:

Documents


1 download

DESCRIPTION

ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006). Giray Kömürcü. OPEN DISTRIBUTED SYSTEMS. One of the most succesfull structures designed in computer community Have side-effects as: Unanticipated runtime events - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Giray Kömürcü

1

ACTIVE FAULT TOLERANT SYSTEM for OPEN

DISTRIBUTED COMPUTING

(Autonomic and Trusted Computing 2006)

Giray Kömürcü

Page 2: Giray Kömürcü

2

OPEN DISTRIBUTED SYSTEMS

• One of the most succesfull structures designed in computer community

• Have side-effects as: Unanticipated runtime events Reconfiguration burdens due to

environmental changes Increasing complexity limits development

Page 3: Giray Kömürcü

3

OPEN DISTRIBUTED SYSTEMS

• Reliability depends on both failures and performance

• Required Reliability has to be maintained

• A set of complex requirements needed due to fluctuations in the environment and its unpredictability

Page 4: Giray Kömürcü

4

ACTIVE FAULT-TOLERANT MODEL

• Exploits the knowledge of pre-fault behaviour to predict environmental faults and failures

• Reduces the unpredictable nature of failures upto a certain limit

• Provides proactive approach to achieve required reliability

Page 5: Giray Kömürcü

5

ACTIVE FAULT-TOLERANT MODEL

• Tolerates current failures that could not be predicted

• Maintains user specified reliability by proper replication strategies

• Uses the information extracted from the system

Page 6: Giray Kömürcü

6

ACTIVE FAULT-TOLERANT MODEL

Page 7: Giray Kömürcü

7

PROACTIVE APPROACH of AFT MODEL

• Design a mechanism to forecast faults and failures

• If AFT predicts a high chance of system failure it takes necessary steps to avoid failure

• Aim is to employ available information about suspected failures to provide required reliability

Page 8: Giray Kömürcü

8

REAL-TIME APPROACH of AFT MODEL

• Some failures can not be predicted before they actually occur

• Based on real-time decision making and reconfiguring according to current failures

• First identifies then tolerates by adaptation strategies

Page 9: Giray Kömürcü

9

AFT STRATEGIES

• Replication is a complex function

• Replication degree, Replica placement, Replication protocol, Communication between replicas

• A single replication strategy is not enough to achieve the required reliability

Page 10: Giray Kömürcü

10

ADJUSTING the DEGREE of REPLICATION

• Optimal degree of replication can be achieved by AFT model

• AFT policy may increase the degree of replication if a failure is more probable

• AFT policy may decrease the degree of replication if a member leaves the system or to reduce communication costs

Page 11: Giray Kömürcü

11

MIGRATION of CURRENT REPLICAS

• Reliability does not depend on just number of replicas, but also their placement

• Prime concern: which nodes should host replicas

• Workload, storage capacity, bandwidth, reliability of server is concerned

Page 12: Giray Kömürcü

12

SHIFTING into a SUITABLE REPLICATION PROTOCOL

ADAPTIVELY

Page 13: Giray Kömürcü

13

PRIMARY COPY REPLICATION

• Any update of data sent to the primary copy first

• Updates are propagated to back-up nodes asynchronously

• Efficient in terms of communication when lots of write messages occur

• Single point of failure problems

Page 14: Giray Kömürcü

14

READ-ONE WRITE-ALL REPLICATION

• Updates are performed anywhere in the system

• Important when information has to be replicated immediately

• Efficient when dealing with failures

• Slow when significant amount of write operations needed

Page 15: Giray Kömürcü

15

MAJORITY REPLICATION

• It is an intermediate solution between the Primary Copy and ROWA replication

• May be done in pair-wise manner

• Principle selection is based on the trade of between reliability and communication cost

Page 16: Giray Kömürcü

16

SHIFTING into a SUITABLE REPLICATION PROTOCOL

ADAPTIVELY

Page 17: Giray Kömürcü

17

RELAXED vs STRICT

• Message Synchronization depends on network traffic by replication and communication overheads

• Relaxed:– A set of updates in a single message within a time period– Less traffic– Guarantees consistency at a certain point– Loss of work is higher during a failure– Not consistent but efficient

• Strict:– Each update by a single message– More traffic– Consistent at each point– Consistent but expensive

Page 18: Giray Kömürcü

18

DESIGN of AFT MODEL ON JUICE OBJECT

• Juice Model: Model for each replica

– Based on adaptable object model

– Reconfigures its internal object at run time

– Consists of five internal elements

Page 19: Giray Kömürcü

19

DESIGN of AFT MODEL ON JUICE OBJECT

• AFT provides adaptation facilities as designed on the Juice Object model

• Adaptation Handler(AH), Replication Handler(RH), Underlying System Information Evaluator(USIE), Client Member Information Evaluator (CMIE)

Page 20: Giray Kömürcü

20

AFT FRAMEWORKCollection of Information

• USIE runs on each replica to collect the local resource information: usage patterns of resources, information of underlying system failures

• Each machine holds a monitor object

Page 21: Giray Kömürcü

21

Collection of Information

• CMIE handles both the current replica’s information and most recently connected client’s information(message failure rate, response time, network latency)

• Gathered from the communicator of the Juice Model

Page 22: Giray Kömürcü

22

Collection of Information

Page 23: Giray Kömürcü

23

Information Analysis• Adaptation Handler(AH) analyses the suspected or

known system faults and failures using the available information

• Predicts future faults and estimates current reliability of the system

• Carries out a cost-benefit analysis considering user requirements

• If needed AH selects the best strategy – Number of replicas, placement, replication

protocol

Page 24: Giray Kömürcü

24

Information Analysis

• Selection of a suitable protocol should follow agreement of all AH’s of the replica group

• One random member collects the votes of the replicas

• Replicas switch to new protocol simultaneously according to the decision

Page 25: Giray Kömürcü

25

Execution of New Strategy

• AH notifies Replication Handler(RH) to replace themselves with the new object

• Since the model is based on two configuration levels switching between strategies does not lead to inconsistencies

Page 26: Giray Kömürcü

26

CONCLUSION

• Describes the design of AFT model which allows user to specify reliability and performance

• AFT employs a combination of proactive and real-time fault-tolerant approachs in open-distributed systems

• Proactive approach exploits the knowledge from USIE & CMIE to warn against probable faults, reduce the failures and increase the performance significantly

Page 27: Giray Kömürcü

27

CONCLUSION

• Real-time approach deals with the current faults • A single replication protocol can not cope with

environmental fluctuations• AFT uses three main strategies to fullfill the

needs of the system• AFT allows the system to reconfigure and

execute under different situations and therefore tightly integrated with the environmenral changes

Page 28: Giray Kömürcü

28

REFERANCE

• Lanka R., Oda K., Yoshida T.: ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING. Autonomic and Trusted Computing, (2006)

Page 29: Giray Kömürcü

29

QUESTIONS?

THANK YOU FOR LISTENING