1 fault-tolerant computing systems #4 reliability and availability pattara leelaprute computer...
TRANSCRIPT
1
Fault-Tolerant Computing Systems#4 Reliability and Availability
Pattara LeelapruteComputer Engineering DepartmentKasetsart [email protected]
2
Reliability and Availability
Reliability The probability that a system survives till time t
(it has not fail till t)
AvailabilityThe probability that a system works properly at
time t
3
Preliminaries of Probability Discrete sample space:
Tossing a coin {head, tail} sample space
Continuous sample space: How long the pc stays up after reboot {t | t>0} sample space
Random variable A function mapping each element of sample space to
a real number Ex. heads=1, tails=0
4
Preliminaries Random variable
A function mapping each element of sample space to a real number
CDF (Cumulative distributed function) FX (t) = Pr [X ≤ t]
Pr : probability that the system has gone down by time t Pdf (Probability density function)
f(t) = dF(t) / dx Expected Value, Mean
E[X] = t f(t)dt (X≥0)
Average outcome of the random experiment expect value, mean of a random variable
5
Exponential DistributionThe most commonly used distribute function in reliability
modeling. CDF
F(t) = 1 – e-t
pdf f(t) = e-t
Mean
Memoryless property Y = X – t Gt(y) = Pr [Y ≤ y | X > t ] = 1 – e-y
Distribute of remaining life of a component does not depend on how long it has been working.
The component does not AGE !(remaining life of X does not depend on the time that has passed)
F(t) = 1 – e-2t
f(t) = 2e-2t
6
Reliability Reliability
The probability that a system survives till time t
R(t) = Pr [X > t]
= 1 – F(t) X : Random probability
variable X which represents a time to failure of the system (the life of the system)
R(t): represents probability that the system survives till time t
F(t) = 1 – e-2t
R(t) = e-2t
time 0 Xt
time t
F(t) = exponential Distribution
7
Reliability Reliability
R(t) = Pr [X > t]
= 1 – F(t) R(0) = 1
The system is initially working R() = 0
No system has infinite lifetime
F(t) = 1 – e-2t
R(t) = e-2t
time 0 Xt
time t
F(t) = exponential DistributionR(t) = reliability
8
F(t) = 1 – e-2t
R(t) = e-2t
f(t) = 2e-2t
Failure Rate
f(t)t Probability that fault
will occur in time [t, t+t]
f(t)t / R(t) Probability of
occurrence of fault at time [t, t+t], when the system is working properly at t
Failure Rate
f(t) / R(t)[t, t+t]
=
Probability that fault will occur in an interval time [t, t+t]
f(t) = probability of faultF(t) = exponential DistributionR(t) = reliability
9
Bathtub Curve Failure Rate
f(t) / R(t)
Bathtub Curve General Failure Rate
observed from the empirical data collected from mechanical and electronic component
When lifetime of a system F(t) is exponential distribution , it has a constant Failure Rate (see previous slide)
Failu
re ra
teTime
Failu
re ra
teTime
1.Initial stage:•Inherit defects•faulty design
3.last stage:•faults caused by age
2.constant failure rate
10
MTTF (Mean Time To Failure)
MTTF E[X] =
t f(t)dt = R(t)dt
X: the Expected value of the probability variable which represents time till fault occurs in the system
When R(t) = e-t (X is exponential distribution) Failure Rate = MTTF = 1 /
time 0
expected value
11
Availability The probability that a system works properly at
time t Availability is a measure that is frequently used for
describing the behavior of the system
*If the system has no repair or replacement, availability is equal to reliability R(t)
R(t): the probability that no failures have occurred during the whole period (0,t)
Operational Under repair Operational
fails repairs fails repairs
tXi Xi+1
Ui Ui+1
Xi+2
12
Availability Instantaneous availability (ทั�นทั�ทั�นใด)
A(t) = Pr [probability that the component is functioning correctly at t ]
Steady-State Availability (general meaning) A = limt→∞ A(t)
fails repairs fails repairs
tXi Xi+1
Ui Ui+1
Xi+2
13
Availability When Xi, Ui is exponential distribution
FXi(t) = 1 – e-t, FUi(t) = 1 – e-t
Instantaneous Availability
A(t) = ( e- ( +)t ) /(+ )Steady-State Availability
A = limt→∞ A(t) = /(+ )
tXi Xi+1
Ui Ui+1
Xi+2
14
MTTR (Mean Time To Repair) MTTR (mean time to repair)
MTTR = E [ Ui ] Ui : the random variable that represents the downtime for i th repair or replacementE[Ui] : the Expected value of Ui
MTTF (mean time to failure) MTTF = E [ Xi ]
Xi : the random variable that represents the duration of the i th function period.E[Xi] : the Expected value of Xi
Steady-State AvailabilityA = MTTF / (MTTF+MTTR)
= /(+ ) (Xi,Ui is the exponential distribution of parameter )
tXi Xi+1Ui Ui+1
Xi+2