a data-driven model for software reliability prediction€¦ · ieee international conference on...
TRANSCRIPT
A Data-Driven Model for Software Reliability Prediction
Author: Jung-Hua Lo
IEEE International Conference on Granular Computing (2012)
Young Taek Kim
KAIST SE Lab.
9/4/2013
Introduction
Background
Overall Approach
Detailed Process
Experimental Results
Conclusion
Discussion
Contents
2 / 31
Definition of SW Reliability
Probability of failure-free operation of a software product in a specified environment for a specified time.
SRM (Software Reliability Model)
To estimate how reliable the software is now.
To predict the reliability in the future.
Two categories of SRMs
Analytical Models: NHPP SRMs
Data-Driven Models: ARIMA, SVM
SW Reliability Prediction
3 / 31
Introduction Detailed Process Background Experimental Results Conclusion Discussion Overall Approach
Data Driven Model
4 / 31
Limitations of Analytical Models
• Software behavior changes during testing phase
Assumption of “all faults are independent & equally detectable”
is violated by the dataset.
Data Driven Models
• Much less unpractical assumptions:
developed from collected failure data.
• Easy to make abstractions and generalizations of the SW failure
process:
the approach of regression or time series analysis.
Introduction Detailed Process Background Experimental Results Conclusion Discussion Overall Approach
5 / 31
Motivation
Problems
Actual SW failure data set is rarely pure linear or nonlinear
No general model suitable for all situations
Proposed Solution
Hybrid strategy with both linear and nonlinear predicting model • ARIMA model: Good performance in predicting linear data
• SVM model: Successful application to nonlinear data
Introduction Detailed Process Background Experimental Results Conclusion Discussion Overall Approach
Statistical properties (mean, variance, covariance, etc.) are all constant over time.
6 / 31
Stationarity
2 2
(1) ( ) .
(2) ( ) [( ) ] .
(3) ( , ) .
t y
t t y y
t t k k
E y u for all t
Var y E y u for all t
Cov y y for all t
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11
≠ μ1, σ12, γ1 μ2, σ2
2, γ2
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11
Differencing = μ2, σ22, γ2 μ1, σ1
2, γ1
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
7 / 31
ACF (Autocorrelation Function)
The correlation between observations at different distances apart (lag)
where
n
t
t
n
kt
ktt
k
yy
yyyy
r
1
2
1
)(
))((
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
1
n
t
t
y
yn
8 / 31
PACF (Partial ACF)
The degree of association between yt and yt-k, when the effects of other time lags 1, 2, 3, …, k-1 are removed.
where
for j = 1, 2, … , k-1.
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
,3,2
1
,1
1
1
,1
1
1
,1
1
k if
rr
rrr
k ifr
rk
j
kjk
k
j
jkjkk
kk
jkkkkjkkj rrrr ,1,1
PACF
Differencing
Differenced series:
9 / 31
Removing Non-stationarity
1 ttt yyy
PACF
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
10 / 31
3 Prediction Models for Stationary Data
AR (Auto Regressive)
Model
MA (Moving Average)
Model
ARMA (Auto Regressive & Moving Average)
Model
• Use past values in forecast
• AR(p) 𝑦𝑡 = α1𝑦𝑡−1 + α2𝑦𝑡−2 + ⋯+α𝑝𝑦𝑡−𝑝 + 𝜀𝑡
• Use past residuals (random events) in
forecast
• MA(q) 𝑦𝑡 = 𝜀𝑡 + 𝛽1𝜀𝑡−1 + ⋯+ 𝛽𝑞𝜀𝑡−𝑞
• Combination of AR & MA
• ARMA(p, q)
𝑦𝑡 = α1𝑦𝑡−1 + α2𝑦𝑡−2 + ⋯+α𝑝𝑦𝑡−𝑝 + 𝜀𝑡
+𝛽1𝜀𝑡−1 + ⋯+ 𝛽𝑞𝜀𝑡−𝑞
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
11 / 31
AR (Auto Regressive) Model (1/2)
AR(p)
𝑦𝑡 = α1𝑦𝑡−1 + α2𝑦𝑡−2 + ⋯+α𝑝𝑦𝑡−𝑝 + 𝜀𝑡
α𝑖: 𝐴𝑢𝑡𝑜𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡
𝜀𝑡: 𝑒𝑟𝑟𝑜𝑟 𝑎𝑡 𝑡
Selection of a model
ACF decreasing exponentially • Directly: 0<a<1
• Oscillating patter: -1<a<0
PACF identifying the order
of AR model
Lag
Au
toco
rre
lati
on
50454035302520151051
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Autocorrelation Function for AR1 data series(with 5% significance limits for the autocorrelations)
Exponentially Decreasing (oscillating)
Lag
Pa
rtia
l A
uto
co
rre
lati
on
2018161412108642
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Partial Autocorrelation Function for AR1 data series(with 5% significance limits for the partial autocorrelations)
PACF
Cut off at Lag 1 AR(1)
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
12 / 31
MA (Moving Average) Model (1/2)
MA(q)
𝑦𝑡 = 𝜀𝑡 + 𝛽1𝜀𝑡−1 + ⋯+ 𝛽𝑞𝜀𝑡−𝑞
𝛽𝑖:𝑀𝐴 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
𝜀𝑡: 𝑒𝑟𝑟𝑜𝑟 𝑎𝑡 𝑡
Example
Year Sales(B$) MA(3)
2000 1000
2001 1500
2002 1250
2003 900 1250
2004 1600 1217
2005 950 1250
2006 1650 1150
2007 1750 1400
2008 1200 1450
2009 2000 1533
2010 2100 1650
2011 1767
800
1300
1800
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
MA(3)
Sales(B$) MA(3)
1000 + 1500 + 1250
3
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
Selection of a model
ACF identifying the
order of MA model
PACF decreasing
exponentially • Directly: 0<a<1
• Oscillating patter: -1<a<0
13 / 31
MA (Moving Average) Model (2/2)
PACF
Lag
Pa
rtia
l A
uto
co
rre
lati
on
2018161412108642
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Partial Autocorrelation Function for MA1 data series(with 5% significance limits for the partial autocorrelations)
Exponentially Decreasing (oscillating)
Lag
Au
toco
rre
lati
on
50454035302520151051
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Autocorrelation Function for MA1 data series(with 5% significance limits for the autocorrelations)
Cut off at Lag 1 MA(1)
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
14 / 31
ARMA Model
ARMA(p,q) = AR(p) + MA(q)
𝑦𝑡 = α1𝑦𝑡−1 + α2𝑦𝑡−2 + ⋯+α𝑝𝑦𝑡−𝑝 + 𝜀𝑡
𝛽1𝜀𝑡−1 + ⋯+ 𝛽𝑞𝜀𝑡−𝑞
Procedures for model identification
• ▶ Guideline to determine
• p, q for ARMA
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
Auto Regressive Integrated Moving Average
(By Box and Jenkins (1970))
Linear model for forecasting time series data: Future values is a linear function of several past observations.
ARIMA(p, d, q)
ARIMA Model
15 / 31
Moving average of order q
Integrated differentiation of order d (Expand to Non-Stationary Time Series)
Auto Regression of order p
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
Proposed by Vladimir N. Vapnik (1995, Rus)
An algorithm (or recipe) for maximizing a particular mathematical function with respect to a given collection of data
4 Key Concepts:
Separating hyperplane
Maximum-margin hyperplane
Soft margin
Kernel function
SVM (Support Vector Machine)
16 / 31
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
Separating Hyperplane
17 / 31
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
w x + b<0
w x + b>0
Separating Hyperplane (= Classifier)
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
Maximum Margin
18 / 31
denotes +1
denotes -1
Support Vectors are those data points that the margin pushes up Against Only Support vectors are used to specify the separating hyperplane!!
f(x,w,b) = sign(w x + b)
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
X-
x+ M=Margin Width
19 / 31
Nonlinear SVMs Datasets that are linearly separable with some noise work out
great:
But what are we going to do if the dataset is just too hard?
How about… mapping data to a higher-dimensional space:
Kernel Function (1/2)
0 x
0 x
x2
x
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
20 / 31
Nonlinear SVMs: Feature Spaces General idea: The original input space can always be mapped
to some higher-dimensional feature space where the training set is separable linearly.
Definition of Kernel Function: some function that corresponds to an inner product in some expanded feature space.
Kernel Function (2/2)
x
Φ: x → φ(x)
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
21 / 31
Genetic Algorithm
Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach
Search & Optimization technique
By J. Holland, 1975
Based on Darwin’s
Principle of Natural
Selection
Basic operations
Crossover
Mutation
Create inintial, random population
(potential solutions)
Evaluate fitness for
each population
Optimal or "good"
solution found?
Selection
or kill population
No
Crossover
Mutation
END
Overall Approach (1/2)
22 / 31
Detailed Process Introduction Experimental Results Conclusion Discussion Background Overall Approach
Model Identification
Model Estimation
Is satisfied
model checking?
No
Trained ARIMA Model
(Linear Forecasting)
Yes
ARIMA
Data Set
Trained SVM Model
Fitness Evaluation
Stop
Criteria?
Genetic Operations
No
Trained SVM Model
(Nonlinear Forecasting)
Yes
+
Software Reliability
Prediciton
Nonlinear Residual
Support Vector Machines
Initial
Parameters
Chromosome 1
Chromosome 2
Chromosome N
...
Random Initial Population
Training SVM Model
Data Set
Model Identification
Model Estimation
Is satisfied
model checking?
No
Nonlinear Residual
Initial
Parameters
Chromosome 1
Chromosome 2
Chromosome N
...
Random Initial Population
Training SVM Model
Trained SVM Model
Fitness Evaluation
Stop
Criteria?
Trained SVM Model
(Nonlinear Forecasting)
Genetic Operations
Yes
Trained ARIMA Model
(Linear Forecasting)
+
Software Reliability
Prediciton
Yes
No
Support Vector Machines ARIMA
Overall Approach (2/2)
23 / 31
Detailed Process Introduction Experimental Results Conclusion Discussion Background Overall Approach
Xt = Lt + Nt
Xt : Time series data
Lt : Linear part of time series data
Nt : Nonlinear part of time series data
After ARIMA model processing, we can get 𝑳 𝒕, 𝜺𝒕: 𝐿 𝑡: Predicted value of the ARIMA model
𝜀𝑡: residual at time t from the linear model 𝜀𝑡= Xt - 𝐿 𝑡
Finally, the residuals (𝜀𝑡) will be modeled by the SVM model with GA (Genetic Algorithm).
ARIMA Process (1/2)
24 / 31
Stationarize input data - Differencing, determine d - ACF, PACF checking
Determination of the values of p and q - ACF, PACF checking
MA(q) AR(p) ARMA(p,q)
ACF Cuts after q Tails off Tails off
PACF Tails off Cuts after p Tails off
MLE (Maximum Likelihood Estimation) - Find a set of parameters q1,q2, ..., qk to maximize L(q1,q2, ... , qk)= f(x1,x2, ... , xN;q1,q2, ... , qk)
Data Set
Model Identification
Parameter Estimation
Is satisfied
model checking?
No
SW Reliability Prediction
Yes
Introduction Background Experimental Results Conclusion Discussion Detailed Process Overall Approach
ARIMA Process (2/2)
25 / 31
Data Set
Model Identification
Parameter Estimation
Is satisfied
model checking?
No
SW Reliability Prediction
Yes
Residual randomness Check - Residuals of the well-fitted model
will be random and follow the normal distribution
- Check ACF and PACF
Introduction Background Experimental Results Conclusion Discussion Detailed Process Overall Approach
SVM Process (1/2)
26 / 31
o Due to the characteristics of input data (randomness), random initial population selected
- ex: C, ε, σ
o Data set is divided into two part: training & testing data
Introduction Background Experimental Results Conclusion Discussion Detailed Process Overall Approach
Nonlinear Residual
Initial
Parameters
Chromosome 1
Chromosome 2
Chromosome N
...
Random Initial Population
Training SVM Model
Trained SVM Model
Fitness Evaluation
Stop
Criteria?
Trained SVM Model
(Nonlinear
Forecasting)
Genetic Operations
Yes
No
SVM Process (2/2)
27 / 31
o The higher fitness value, the more survivability ability
o The high-fitness valued candidate chromosome retained, & combined to produce new offspring.
Introduction Background Experimental Results Conclusion Discussion Detailed Process Overall Approach
Nonlinear Residual
Initial
Parameters
Chromosome 1
Chromosome 2
Chromosome N
...
Random Initial Population
Training SVM Model
Trained SVM Model
Fitness Evaluation
Stop
Criteria?
Trained SVM Model
(Nonlinear
Forecasting)
Genetic Operations
Yes
No
o GA is applied to SVM parameter search
- No theoretical method for determining a kernel function and its parameter
- No a priori knowledge for setting kernel parameter C.
o Applied GA operations - Crossover operation - Mutation operation
Experimental Results (1/2)
Introduction Background Experimental Results Conclusion Discussion Overall Approach Experimental Results Detailed Process
Collected data: cumulative number of failures, 𝑥𝑖 , at time 𝑡𝑖
Data Set (DS-1) • RADC (Rome Air Development Center) Project reported by Musa
• 21 weeks tested, 136 observed failures
Output: predicted value, 𝑥𝑖+1, using (𝑥1, 𝑥2,…, 𝑥𝑖)
Goodness of fit curves Relative Error curves
28 / 31
Experimental Results (1/2)
Introduction Background Experimental Results Conclusion Discussion Overall Approach Experimental Results Detailed Process
Collected data: cumulative number of failures, 𝑥𝑖 , at time 𝑡𝑖
Data Set (DS-2) • 28 weeks SW test, 234 observed failures
Output: predicted value, 𝑥𝑖+1, using (𝑥1, 𝑥2,…, 𝑥𝑖)
Goodness of fit curves Relative Error curves
29 / 31
Conclusion
Introduction Background Experimental Results Discussion Overall Approach Conclusion Detailed Process
Proposed hybrid methodology in forecasting software reliability:
exploits unique strength of the ARIMA model and the SVM model
Test results
showed improvement of the prediction performance
30 / 31
Discussion
Introduction Background Experimental Results Overall Approach Discussion Detailed Process
Pros
Providing a possible solution of SRM selection difficulties
Improving SW reliability prediction performance
Cons
Not present detailed test methods (ex: stop criteria for SVM, parameter estimation criteria for ARIMA, etc.)
Conclusion
31 / 31
Thank you!