iceac 2011-asymmetricalsupportvectorregression
TRANSCRIPT
8/3/2019 ICEAC 2011-AsymmetricalSupportVectorRegression
http://slidepdf.com/reader/full/iceac-2011-asymmetricalsupportvectorregression 1/6
Asymmetrical and Lower Bounded Support Vector
Regression for Power EstimationMel Stockman
#1, Mariette Awad
#2, Rahul Khanna
*3
# Electrical and Computer Engineering Department, American University of Beirut
Beirut, [email protected]
* Intel Corporation,Oregon, USA
Abstract — In an energy aware environment, designers frequently
turn to advanced power reduction techniques such as power
shutoff and multi-supply-voltage architectures. In order to
implement these techniques, it is important that power estimates
be made. Power prediction is a critical necessity as chip sizes
continually decrease and the desire for low power consumption is
a foremost design objective. For such predictions, it is crucial to
avoid underestimating power since reliability issues and possiblechip damage might occur. It becomes necessary to eliminate or
strictly limit underestimations by relaxing accuracy constraints
while decreasing the likelihood that the estimation undershoots
the actual value. Our novel approach, Asymmetrical and Lower
Bounded Support Vector Regression modifies the Support
Vector Regression technique by Vapnik and provides accurate
prediction while maintaining a low number of underestimates.
We tested our approach on two different power data sets and
achieved accuracy rates of 5.72% and 5.06% relative percentage
error while keeping the number of underestimates below 2.81%
and 1.74%.
Keywords: Asymmetrical Loss Function, Support Vector
Regression, Bounded Function
I. INTRODUCTION
Power prediction is a critical necessity as chip sizes shrink
and the desire for low power consumption is a foremost
design objective. Estimates of power consumption are used in
both chip design (to implement such techniques as power
shutoff (PSO) and multi-supply-voltage (MSV) as well as inproduction environments and data centers (to determine
cooling and UPS needs). In the design area, under-predicting
power could lead to chip reliability issues, poorer than
anticipated application performance, unavailable processor
resources, needlessly powering down chip components, etc.
In a production environment, under-predicting power couldlead to insufficient cooling of data centers or inadequate UPS
power supply. In these cases, it is crucial to minimize
underestimates even at the risk of reducing the accuracy of the
estimation.
Support Vector Regression (SVR) as proposed by Vapnik [1] has proven to be an effective tool in real value function
estimation. The usual approach trains using a symmetrical loss
function which equally penalizes both high and low
misestimates. Using Vapnik’s -insensitive approach, a
flexible tube of minimal radius is formed symmetrically
around the estimated function such that the absolute values of
errors less than a certain threshold are ignored both above
and below the estimate. In this manner, points outside the
tube are penalized but those within the tube, either above or
below the function, receive no penalty.
However, when dealing with damaging consequences, it is
essential to restrict the loss function so that underestimates are
eliminated as much as possible. In this paper we propose
modifying Vapnik’s -insensitive loss function to be
asymmetrical in order to limit underestimates. The -tube is
cut in half and aligned to limit the function without allowing
points to fall below it. We bound the -tube from beneath and
apply a much higher penalty on estimates which lie under the
function. This leads to an asymmetric loss function for
training whereby a greater penalty is applied when the
misestimate is lower than when it is higher.
The remainder of this paper is organized as follows:
Section 2 discusses prior research. Section 3 gives a brief
overview of regular support vector regression (SVR). Section
4 explains our asymmetrical approach. Section 5 explains thetwo different data sets. Section 6 presents our experimental
results. Section 7 shows a comparison of the differences for
SVR and ALB-SVR and section 8 provides our conclusion.
II. PRIOR RESEARCH
Although we are not aware of prior work specifically
addressing our approach, we survey in this section some
related work available in literature. Authors in [2] use an
asymmetric -insensitive loss function in Support Vector
Quantile Regression (SVQR) in an attempt to decrease the
number of support vectors. They alter the insensitivenessaccording to the quantile and achieve a more sparse model.
Our work differs from theirs in that their aim was to decreasethe number of support vectors while maintaining the same
accuracy as a regular SVQR, while our approach specifically
seeks to limit underestimates at the possible cost to accuracy.
Asymmetrical loss functions are discussed in [3] where theauthors study different loss-functions for Bayes parameter
estimation. They use a 2-sided quadratic loss function and a
quasi-quadratic s-loss function and show the comparison and
derive results to illustrate that this modified version shows a
smaller increase of loss and can be used in real world
situations where overestimation and underestimation have
8/3/2019 ICEAC 2011-AsymmetricalSupportVectorRegression
http://slidepdf.com/reader/full/iceac-2011-asymmetricalsupportvectorregression 2/6
different importance. The authors in [4] study Bayesian risk
analysis and replace the quadratic loss-function with an
asymmetric loss-function to derive a general class of functions
which approach infinity near the origin to limit underestimates.
In [5], the authors presents a maximum margin classifier
which bounds misclassification for each class differently thus
allowing for different tolerances levels. In [6], the authors use
a smoothing strategy to modify the typical SVR approach into
a non-constrained problem thereby only solving a system of linear equations rather than a convex quadratic program. In
[7], three different loss functions are compared for economic
tolerance design: Taguchi’s quadratic loss function, Inverted
Normal Loss Function and Revised Inverted Normal LossFunction.
III. SUPPORT VECTOR REGRESSION
In Vapnik’s -insensitive SVR [1], a real value y’ is predicted
as: = ∙ + (1)
,
= 1
…,
∈
ℜ,
∈
ℜ
using a tube bounded by ± ∀ as shown in Fig. 1. The
penalty function is characterized by only assigning a penalty if
the predicted value yi is more than away from the actual
value t i, (i.e. − ≥ ). Those data points which lie
outside the -tube are Support Vectors (SVs) and are given
the same penalty whether they lie above (+) or below (−)
the tube (+ > 0, − > 0 ∀) :
≤ + + + (2)
≥ − − − (3)
The accuracy of the estimation is then measured by the lossfunction , as shown in Fig. 2:
,= 0 − ≤ − − ℎ
(4)
The empirical risk is:
=1
=1
( ,) (5)
Figure 1. SVR with -insensitive tube
Figure 2. -insensitive loss function
leading to the SVR error function:
(+=1
+ −) +1
2 2 (6)
which should be minimized subject to the constraints + ≥ 0,− ≥ 0 ∀ and (2) and (3).
IV. ASYMMETRICAL AND LOWER BOUNDED SUPPORT VECTOR
REGRESSION
Our novel approach, Asymmetrical and Lower Bounded
Support Vector Regression (ALB-SVR), modifies the SVR
loss function and corresponding error function such that the
epsilon tube is only above the function as shown in Figs. 3
and 4.The penalty parameter, C is split into C
+and C
-so that
different penalties can be applied to the upper and lower
mispredictions. Equations 3,4 and 6 are modified as follows: ≥ − − (7) − ,= 0 0 ≤ − ≤ − − − >
( − ) ℎ (8)
+
+
=1
+
−
−
=1
+1
2
2 (9)
Figure 3. ALB-SVR with -insensitive tube
Figure 4. ALB-SVR loss function
Introducing Lagrange multipliers: + ≥ 0, − ≥ 0, + ≥0, − ≥ 0 ∀
8/3/2019 ICEAC 2011-AsymmetricalSupportVectorRegression
http://slidepdf.com/reader/full/iceac-2011-asymmetricalsupportvectorregression 3/6
= + +=1
+ −−=1
+1
22
− ( +=1
+ + −−)
−+
(
=1 +
+
+ − )
− −(
=1
− − + )
(10)
which leads to: = 0 = (+ − −)
=1
(11)
= 0
(+ − −) = 0
=1
(12)
+ = 0 + = + − + (13)
− = 0 − = − − − (14)
Substituting (11) and (12) and maximizing L D with respect to + and − (+ ≥ 0,− ≥ 0 ∀) where:
= (+ − −)=1
− + −−=1
=1
− 1
2 (+ − −)(+ − −)x
,
x
(15)
Since + ≥ 0 and − ≥ 0 and (13) and (14), therefore+ ≤ + and − ≤ −. Thus we need to find
max+,−(+ − −)=1
− + −−=1
=1−
1
2 (+ − −)(+ − −)x ∙ ,
x (16)
with 0 ≤ + ≤ +, 0 ≤ − ≤ − and (+=1 − −) =
0 ∀ .Substituting (11) into (1)
′ = (+ − −) ∙ ′ + =1
(17)
Support Vector (SV) xS can be found with the indices where
0 < + < + and 0 < − < − and + = 0 (or − = 0)
and b can be derived by:
= − − (+ − − ) ∙ ′ ∈=
(18)
V.
VI. DATA SETS
A. Gladiator Data Set
The Gladiator data set from [8] consists of 640 samples of
6 attributes of telemetry data from a distributed set of physical
and logical sensors as shown in Table I along with the
corresponding power in milliwatts.
TABLE I GLADIATOR DATA SET ATTRIBUTES
CPU1 Vtt1 termination, misc IO power
CPU1 Vcc1 core power
CPU1 Vsa system agent, uncore, I/O power
CPU2 Vtt1 termination, misc IO power
CPU2 Vcc1 core power
CPU2 Vsa system agent, uncore, I/O power
B. RAPL Data Set
The data set taken from [9,10] consists of 17765 samples of
5 attributes of memory activity counters as described in Table
II with the actual corresponding power consumed in watts as
measured directly by a memory power riser.
TABLE II MEMORY POWER MODEL ATTRIBUTES
Activity Units
Activate(A) nj/Activate
Read (R) nj/Read
Write (W) nj/Write
CKE=High mW
CKE=Low mW
VII. EXPERIMENTS AND RESULTS
We modified the code in LIBSVM [11] for ALB-SVR.
For all experiments, we normalized the data and took the
average of 10 runs of 3 fold cross validation. Using an rbf
kernel, we performed a grid search combined with heuristicexperimentation for both SVR and ALB-SVR to find the best
meta parameters , g, C + and C
-. Table III and Figs. 5-8
show the results of SVR and ALB-SVR . The table shows
the various values of the Cn,Cp, and g metaparameters
along with the number of SVs, the total number of
iterations needed for the algorithm to converge to the
solution, the relative percentage error and the percentage
of estimated datapoints falling below the actual values.,As can be seen, the number of underestimates for the SVR is
around 50% which is due to the fact that SVR centers the
epsilon tube around the data. ALB-SVR positions the half
tube under the data allowing only a small number of estimated
points to fall below the actual values. This effectively
accomplishes the limiting of underestimation with a slightdecrease in accuracy of the ALB-SVR. The accuracy is
necessarily less than that of SVR because the estimation is
now skewed higher.
Model performance is evaluated by computing percentage
relative error as: =1 − 100=1 (19)
The percentage relative error for ALB-SVR on the
Gladiator data set was 5.72% and for the RAPL data set it was
8/3/2019 ICEAC 2011-AsymmetricalSupportVectorRegression
http://slidepdf.com/reader/full/iceac-2011-asymmetricalsupportvectorregression 4/6
5.06%. This is only slightly higher than the SVR which had
1.72% and 1.82% on the respective datasets. This is
acceptable since we have drastically minimized the number of
underestimates. As also can be seen, the number of support
vectors are greater in ALB-SVR than in SVR.
VIII. COMPARISON OF SVR AND ALB-SVR
Comparing ALB-SVR to SVR allows us to look at the
tradeoffs involved with using this technique.
A. Empirical Risk
By substituting the new loss function, ALB-SVR’s
empirical risk becomes:
=1−−
=1
( ,) (20)
The maximum additional empirical risk for ALB-SVR can be
computed to be:
−
∈− ≤+
∈− > (21)
B. Number of Support Vectors (SV) and Convergence
In SVR, support vectors (SV) are those points which lie
outside the epsilon tube. The smaller the value of , the morepoints will lie outside the tube and hence there will be more
SVs. In ALB-SVR, we have essentially cut the epsilon tube in
half. We no longer have the lower epsilon bound. Therefore,
for the same g and epsilon parameters, more points lie outside
the tube and there will be a larger number of SVs. This
increase in the number of SVs indicates that using ALB-SVRhas some negative effects on the complexity of the estimating
function. However, as seen in Table III, the Gladiator data set
did not show a significant increase in SVs. This may be due to
the fact that the data set is relatively small. As also can be
seen in Table III, the number of iterations was smaller in
ALB-SVR indicating the algorithm converged faster and
hence this may offset the larger number of SVs using this
approach.
For our ALB-SVR model, we used a grid search and
heuristics to determine optimal meta parameters. We
achieved the goal of limiting the underestimates to 2.71% for
the Gladiator data set and 1.74% for the RAPL data set as
compared to 50.33% and 57.54% for SVR.
IX. CONCLUSION AND FUTURE WORK
We have shown our novel approach ALB-SVR to be an
effective technique to bound an estimation such that
underestimates are greatly limited. This comes at the expense
of accuracy but nevertheless is helpful for applications which
are highly sensitive to such mispredictions such as power
estimation. We tested our approach on two different power
data sets and achieved accuracy rates of 5.72% and 5.06%
relative percentage error while keeping the number of
underestimates below 2.81% and 1.74%. Future work will
include different data sets and techniques for more accurately
selecting the meta parameters as well as improve the % error.
ACKNOWLEDGEMENTS
This work is partly supported by MER, a partnership between
Intel Corporation and King Abdul-Aziz City for Science and
Technology (KACST) to conduct and promote research in the
middle east and the University Research Board at the
American University of Beirut.
REFERENCES
[1] V. Vapnik. Statistical Learning Theory, (Wiley, New York, 1998)
[2] K Seok, D Cho, C. Hwang; and J. Shim; , "Support vector quantileregression using asymmetric e-insensitive loss function," Education
Technology and Computer (ICETC), 2010 2nd International Conferenceon , vol.1, no., pp.V1-438-V1-439, June 2010.
[3] H. Schabe, , "Bayes estimates under asymmetric loss," Reliability, IEEE Transactions on , vol.40, no.1, pp.63-67, Apr 1991.
[4] J.G. Norstrom,, "The use of precautionary loss functions in risk analysis," Reliability, IEEE Transactions on , vol.45, no.3, pp.400-403,Sep 1996.
[5] J. Saketha Nath and C. Bhattacharyya., “Maximum Margin Classifierswith Specified False Positive and False Negative Error Rates”,Proceedings of SDM Conference, Minneapolis, 2007.
[6] Yuh-Jye Lee; Wen-Feng Hsieh; Chien-Ming Huang; , "-SSVR: asmooth support vector machine for -insensitive regression," Knowledgeand Data Engineering, IEEE Transactions on , vol.17, no.5, pp. 678-685, May 2005.
[7] Jeh-Nan Pan; Jianbiao Pan; , "A Comparative Study of Various Loss
Functions in the Economic Tolerance Design," Management of Innovation and Technology, 2006 IEEE International Conference on ,vol.2, no., pp.783-787, June 2006.
[8] Intel Document “Gladiator Telemetry Harness for Energy EfficientComputing”, 2010.
[9] H. David, E. Gorbatov, U. Hanebutte, R. Khanna, and C. Le , “RAPL:Memory Power Estimation and Capping,” International Symposium on Low Power Electronics and Design (ISLPED), pp. 14-15,August, 2010.
[10] M. Stockman, M. Awad, R. Khanna, C. Le, H. David, E. Gorbatov,U.Hanebutte “A Novel Approach to Memory Power Estimation UsingMachine Learning“ , International Conference on Energy AwareComputing (ICEAC), pp. 1-3, December, 2010.
[11] C. Chang and C. Lin, LIBSVM : a library for support vector machines,2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
TABLE III COMPARATIVE RESULTS OF SVR VS. ALB-SVR
Model
# Data Model C+ C-
Rbf Kernel
param g # SV # Iter
%
relative
error
% out of
bound
1 Gladiator SVR 512 - 16 0.23 319 2723 1.72 50.33
2 Gladiator ALB-SVR 32768 32 1 0.00039 320 417 5.72 2.81
3 RAPL SVR 512 - 706 0.10 2786 571649 1.82 57.54
4 RAPL ALB-SVR 1000000 10 706 0.2 4932 58220 5.06 1.74
8/3/2019 ICEAC 2011-AsymmetricalSupportVectorRegression
http://slidepdf.com/reader/full/iceac-2011-asymmetricalsupportvectorregression 5/6
Figure 5. Power estimates for Gladiator Data with SVR (Model #1)
Figure 6. Power estimates for Gladiator Data with ALB-SVR (Model #2)
8/3/2019 ICEAC 2011-AsymmetricalSupportVectorRegression
http://slidepdf.com/reader/full/iceac-2011-asymmetricalsupportvectorregression 6/6
Fig. 7 Power estimates for RAPL Data with SVR (Model #3)
Figure 8. Power estimates for RAPL Data with ALB-SVR (Model #4)