iceac 2011-asymmetricalsupportvectorregression

8/3/2019 ICEAC 2011-AsymmetricalSupportVectorRegression

http://slidepdf.com/reader/full/iceac-2011-asymmetricalsupportvectorregression 1/6

Asymmetrical and Lower Bounded Support Vector

Regression for Power EstimationMel Stockman

#1, Mariette Awad

#2, Rahul Khanna

*3

# Electrical and Computer Engineering Department, American University of Beirut

Beirut, [email protected]

[email protected]

* Intel Corporation,Oregon, USA

[email protected]

Abstract — In an energy aware environment, designers frequently

turn to advanced power reduction techniques such as power

shutoff and multi-supply-voltage architectures. In order to

implement these techniques, it is important that power estimates

be made. Power prediction is a critical necessity as chip sizes

continually decrease and the desire for low power consumption is

a foremost design objective. For such predictions, it is crucial to

avoid underestimating power since reliability issues and possiblechip damage might occur. It becomes necessary to eliminate or

strictly limit underestimations by relaxing accuracy constraints

while decreasing the likelihood that the estimation undershoots

the actual value. Our novel approach, Asymmetrical and Lower

Bounded Support Vector Regression modifies the Support

Vector Regression technique by Vapnik and provides accurate

prediction while maintaining a low number of underestimates.

We tested our approach on two different power data sets and

achieved accuracy rates of 5.72% and 5.06% relative percentage

error while keeping the number of underestimates below 2.81%

and 1.74%.

Keywords: Asymmetrical Loss Function, Support Vector

Regression, Bounded Function

I. INTRODUCTION

Power prediction is a critical necessity as chip sizes shrink

and the desire for low power consumption is a foremost

design objective. Estimates of power consumption are used in

both chip design (to implement such techniques as power

shutoff (PSO) and multi-supply-voltage (MSV) as well as inproduction environments and data centers (to determine

cooling and UPS needs). In the design area, under-predicting

power could lead to chip reliability issues, poorer than

anticipated application performance, unavailable processor

resources, needlessly powering down chip components, etc.

In a production environment, under-predicting power couldlead to insufficient cooling of data centers or inadequate UPS

power supply. In these cases, it is crucial to minimize

underestimates even at the risk of reducing the accuracy of the

estimation.

Support Vector Regression (SVR) as proposed by Vapnik [1] has proven to be an effective tool in real value function

estimation. The usual approach trains using a symmetrical loss

function which equally penalizes both high and low

misestimates. Using Vapnik’s -insensitive approach, a

flexible tube of minimal radius is formed symmetrically

around the estimated function such that the absolute values of

errors less than a certain threshold are ignored both above

and below the estimate. In this manner, points outside the

tube are penalized but those within the tube, either above or

below the function, receive no penalty.

However, when dealing with damaging consequences, it is

essential to restrict the loss function so that underestimates are

eliminated as much as possible. In this paper we propose

modifying Vapnik’s -insensitive loss function to be

asymmetrical in order to limit underestimates. The -tube is

cut in half and aligned to limit the function without allowing

points to fall below it. We bound the -tube from beneath and

apply a much higher penalty on estimates which lie under the

function. This leads to an asymmetric loss function for

training whereby a greater penalty is applied when the

misestimate is lower than when it is higher.

The remainder of this paper is organized as follows:

Section 2 discusses prior research. Section 3 gives a brief

overview of regular support vector regression (SVR). Section

4 explains our asymmetrical approach. Section 5 explains thetwo different data sets. Section 6 presents our experimental

results. Section 7 shows a comparison of the differences for

SVR and ALB-SVR and section 8 provides our conclusion.

II. PRIOR RESEARCH

Although we are not aware of prior work specifically

addressing our approach, we survey in this section some

related work available in literature. Authors in [2] use an

asymmetric -insensitive loss function in Support Vector

Quantile Regression (SVQR) in an attempt to decrease the

number of support vectors. They alter the insensitivenessaccording to the quantile and achieve a more sparse model.

Our work differs from theirs in that their aim was to decreasethe number of support vectors while maintaining the same

accuracy as a regular SVQR, while our approach specifically

seeks to limit underestimates at the possible cost to accuracy.

Asymmetrical loss functions are discussed in [3] where theauthors study different loss-functions for Bayes parameter

estimation. They use a 2-sided quadratic loss function and a

quasi-quadratic s-loss function and show the comparison and

derive results to illustrate that this modified version shows a

smaller increase of loss and can be used in real world

situations where overestimation and underestimation have



different importance. The authors in [4] study Bayesian risk

analysis and replace the quadratic loss-function with an

asymmetric loss-function to derive a general class of functions

which approach infinity near the origin to limit underestimates.

In [5], the authors presents a maximum margin classifier

which bounds misclassification for each class differently thus

allowing for different tolerances levels. In [6], the authors use

a smoothing strategy to modify the typical SVR approach into

a non-constrained problem thereby only solving a system of linear equations rather than a convex quadratic program. In

[7], three different loss functions are compared for economic

tolerance design: Taguchi’s quadratic loss function, Inverted

Normal Loss Function and Revised Inverted Normal LossFunction.

III. SUPPORT VECTOR REGRESSION

In Vapnik’s -insensitive SVR [1], a real value y’ is predicted

as: = ∙ + (1)

,

= 1

…,

∈

ℜ,

∈

ℜ

using a tube bounded by ± ∀ as shown in Fig. 1. The

penalty function is characterized by only assigning a penalty if

the predicted value yi is more than away from the actual

value t i, (i.e. − ≥ ). Those data points which lie

outside the -tube are Support Vectors (SVs) and are given

the same penalty whether they lie above (+) or below (−)

the tube (+ > 0, − > 0 ∀) :

≤ + + + (2)

≥ − − − (3)

The accuracy of the estimation is then measured by the lossfunction , as shown in Fig. 2:

,= 0 − ≤ − − ℎ

(4)

The empirical risk is:

=1

=1

( ,) (5)

Figure 1. SVR with -insensitive tube

Figure 2. -insensitive loss function

leading to the SVR error function:

(+=1

+ −) +1

2 2 (6)

which should be minimized subject to the constraints + ≥ 0,− ≥ 0 ∀ and (2) and (3).

IV. ASYMMETRICAL AND LOWER BOUNDED SUPPORT VECTOR

REGRESSION

Our novel approach, Asymmetrical and Lower Bounded

Support Vector Regression (ALB-SVR), modifies the SVR

loss function and corresponding error function such that the

epsilon tube is only above the function as shown in Figs. 3

and 4.The penalty parameter, C is split into C

+and C

-so that

different penalties can be applied to the upper and lower

mispredictions. Equations 3,4 and 6 are modified as follows: ≥ − − (7) − ,= 0 0 ≤ − ≤ − − − >

( − ) ℎ (8)

+

+

=1

+

−

−

=1

+1

2

2 (9)

Figure 3. ALB-SVR with -insensitive tube

Figure 4. ALB-SVR loss function

Introducing Lagrange multipliers: + ≥ 0, − ≥ 0, + ≥0, − ≥ 0 ∀



= + +=1

+ −−=1

+1

22

− ( +=1

+ + −−)

−+

(

=1 +

+

+ − )

− −(

=1

− − + )

(10)

which leads to: = 0 = (+ − −)

=1

(11)

= 0

(+ − −) = 0

=1

(12)

+ = 0 + = + − + (13)

− = 0 − = − − − (14)

Substituting (11) and (12) and maximizing L D with respect to + and − (+ ≥ 0,− ≥ 0 ∀) where:

= (+ − −)=1

− + −−=1

=1

− 1

2 (+ − −)(+ − −)x

,

x

(15)

Since + ≥ 0 and − ≥ 0 and (13) and (14), therefore+ ≤ + and − ≤ −. Thus we need to find

max+,−(+ − −)=1

− + −−=1

=1−

1

2 (+ − −)(+ − −)x ∙ ,

x (16)

with 0 ≤ + ≤ +, 0 ≤ − ≤ − and (+=1 − −) =

0 ∀ .Substituting (11) into (1)

′ = (+ − −) ∙ ′ + =1

(17)

Support Vector (SV) xS can be found with the indices where

0 < + < + and 0 < − < − and + = 0 (or − = 0)

and b can be derived by:

= − − (+ − − ) ∙ ′ ∈=

(18)

V.

VI. DATA SETS

A. Gladiator Data Set

The Gladiator data set from [8] consists of 640 samples of

6 attributes of telemetry data from a distributed set of physical

and logical sensors as shown in Table I along with the

corresponding power in milliwatts.

TABLE I GLADIATOR DATA SET ATTRIBUTES

CPU1 Vtt1 termination, misc IO power

CPU1 Vcc1 core power

CPU1 Vsa system agent, uncore, I/O power

CPU2 Vtt1 termination, misc IO power

CPU2 Vcc1 core power

CPU2 Vsa system agent, uncore, I/O power

B. RAPL Data Set

The data set taken from [9,10] consists of 17765 samples of

5 attributes of memory activity counters as described in Table

II with the actual corresponding power consumed in watts as

measured directly by a memory power riser.

TABLE II MEMORY POWER MODEL ATTRIBUTES

Activity Units

Activate(A) nj/Activate

Read (R) nj/Read

Write (W) nj/Write

CKE=High mW

CKE=Low mW

VII. EXPERIMENTS AND RESULTS

We modified the code in LIBSVM [11] for ALB-SVR.

For all experiments, we normalized the data and took the

average of 10 runs of 3 fold cross validation. Using an rbf

kernel, we performed a grid search combined with heuristicexperimentation for both SVR and ALB-SVR to find the best

meta parameters , g, C + and C

-. Table III and Figs. 5-8

show the results of SVR and ALB-SVR . The table shows

the various values of the Cn,Cp, and g metaparameters

along with the number of SVs, the total number of

iterations needed for the algorithm to converge to the

solution, the relative percentage error and the percentage

of estimated datapoints falling below the actual values.,As can be seen, the number of underestimates for the SVR is

around 50% which is due to the fact that SVR centers the

epsilon tube around the data. ALB-SVR positions the half

tube under the data allowing only a small number of estimated

points to fall below the actual values. This effectively

accomplishes the limiting of underestimation with a slightdecrease in accuracy of the ALB-SVR. The accuracy is

necessarily less than that of SVR because the estimation is

now skewed higher.

Model performance is evaluated by computing percentage

relative error as: =1 − 100=1 (19)

The percentage relative error for ALB-SVR on the

Gladiator data set was 5.72% and for the RAPL data set it was



5.06%. This is only slightly higher than the SVR which had

1.72% and 1.82% on the respective datasets. This is

acceptable since we have drastically minimized the number of

underestimates. As also can be seen, the number of support

vectors are greater in ALB-SVR than in SVR.

VIII. COMPARISON OF SVR AND ALB-SVR

Comparing ALB-SVR to SVR allows us to look at the

tradeoffs involved with using this technique.

A. Empirical Risk

By substituting the new loss function, ALB-SVR’s

empirical risk becomes:

=1−−

=1

( ,) (20)

The maximum additional empirical risk for ALB-SVR can be

computed to be:

−

∈− ≤+

∈− > (21)

B. Number of Support Vectors (SV) and Convergence

In SVR, support vectors (SV) are those points which lie

outside the epsilon tube. The smaller the value of , the morepoints will lie outside the tube and hence there will be more

SVs. In ALB-SVR, we have essentially cut the epsilon tube in

half. We no longer have the lower epsilon bound. Therefore,

for the same g and epsilon parameters, more points lie outside

the tube and there will be a larger number of SVs. This

increase in the number of SVs indicates that using ALB-SVRhas some negative effects on the complexity of the estimating

function. However, as seen in Table III, the Gladiator data set

did not show a significant increase in SVs. This may be due to

the fact that the data set is relatively small. As also can be

seen in Table III, the number of iterations was smaller in

ALB-SVR indicating the algorithm converged faster and

hence this may offset the larger number of SVs using this

approach.

For our ALB-SVR model, we used a grid search and

heuristics to determine optimal meta parameters. We

achieved the goal of limiting the underestimates to 2.71% for

the Gladiator data set and 1.74% for the RAPL data set as

compared to 50.33% and 57.54% for SVR.

IX. CONCLUSION AND FUTURE WORK

We have shown our novel approach ALB-SVR to be an

effective technique to bound an estimation such that

underestimates are greatly limited. This comes at the expense

of accuracy but nevertheless is helpful for applications which

are highly sensitive to such mispredictions such as power

estimation. We tested our approach on two different power

data sets and achieved accuracy rates of 5.72% and 5.06%

relative percentage error while keeping the number of

underestimates below 2.81% and 1.74%. Future work will

include different data sets and techniques for more accurately

selecting the meta parameters as well as improve the % error.

ACKNOWLEDGEMENTS

This work is partly supported by MER, a partnership between

Intel Corporation and King Abdul-Aziz City for Science and

Technology (KACST) to conduct and promote research in the

middle east and the University Research Board at the

American University of Beirut.

REFERENCES

[1] V. Vapnik. Statistical Learning Theory, (Wiley, New York, 1998)

[2] K Seok, D Cho, C. Hwang; and J. Shim; , "Support vector quantileregression using asymmetric e-insensitive loss function," Education

Technology and Computer (ICETC), 2010 2nd International Conferenceon , vol.1, no., pp.V1-438-V1-439, June 2010.

[3] H. Schabe, , "Bayes estimates under asymmetric loss," Reliability, IEEE Transactions on , vol.40, no.1, pp.63-67, Apr 1991.

[4] J.G. Norstrom,, "The use of precautionary loss functions in risk analysis," Reliability, IEEE Transactions on , vol.45, no.3, pp.400-403,Sep 1996.

[5] J. Saketha Nath and C. Bhattacharyya., “Maximum Margin Classifierswith Specified False Positive and False Negative Error Rates”,Proceedings of SDM Conference, Minneapolis, 2007.

[6] Yuh-Jye Lee; Wen-Feng Hsieh; Chien-Ming Huang; , "-SSVR: asmooth support vector machine for -insensitive regression," Knowledgeand Data Engineering, IEEE Transactions on , vol.17, no.5, pp. 678-685, May 2005.

[7] Jeh-Nan Pan; Jianbiao Pan; , "A Comparative Study of Various Loss

Functions in the Economic Tolerance Design," Management of Innovation and Technology, 2006 IEEE International Conference on ,vol.2, no., pp.783-787, June 2006.

[8] Intel Document “Gladiator Telemetry Harness for Energy EfficientComputing”, 2010.

[9] H. David, E. Gorbatov, U. Hanebutte, R. Khanna, and C. Le , “RAPL:Memory Power Estimation and Capping,” International Symposium on Low Power Electronics and Design (ISLPED), pp. 14-15,August, 2010.

[10] M. Stockman, M. Awad, R. Khanna, C. Le, H. David, E. Gorbatov,U.Hanebutte “A Novel Approach to Memory Power Estimation UsingMachine Learning“ , International Conference on Energy AwareComputing (ICEAC), pp. 1-3, December, 2010.

[11] C. Chang and C. Lin, LIBSVM : a library for support vector machines,2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm.

TABLE III COMPARATIVE RESULTS OF SVR VS. ALB-SVR

Model

# Data Model C+ C-

Rbf Kernel

param g # SV # Iter

%

relative

error

% out of

bound

1 Gladiator SVR 512 - 16 0.23 319 2723 1.72 50.33

2 Gladiator ALB-SVR 32768 32 1 0.00039 320 417 5.72 2.81

3 RAPL SVR 512 - 706 0.10 2786 571649 1.82 57.54

4 RAPL ALB-SVR 1000000 10 706 0.2 4932 58220 5.06 1.74

http://www.csie.ntu.edu.tw/~cjlin/libsvm

http://www.csie.ntu.edu.tw/~cjlin/libsvm



Figure 5. Power estimates for Gladiator Data with SVR (Model #1)

Figure 6. Power estimates for Gladiator Data with ALB-SVR (Model #2)



Fig. 7 Power estimates for RAPL Data with SVR (Model #3)

Figure 8. Power estimates for RAPL Data with ALB-SVR (Model #4)

iceac 2011-asymmetricalsupportvectorregression

Documents