[ieee 2012 13th international conference on optimization of electrical and electronic equipment...

Load estimation for distribution systems using clustering techniques

Gheorghe Grigoraş, Florina Scarlatache, Gheorghe Cârţină “Gheorghe Asachi” Technical University of Iasi, Romania

E-mail: [email protected]

Abstract- In electrical distribution systems, except the usual measurements from stations, there are few information about the state of network. As a result, there is at any moment a generalized uncertainty about the power demand conditions and therefore about the network loading, voltage level, power losses, etc. Therefore, it is important to establish accurate the load of nodes from distribution system. An alternative to the settlement based on metered demand is load profile based settlement. In this manner, each node (distribution substation), without appropriate meters, is assigned a representative load profile. For that reason it is necessary to determine typical load profiles. In the first part of the paper, the K-means clustering method is used for typical load profiles determination. In the second part, an improved approach based on the obtained typical load profiles for the load estimation of the electric distribution systems is proposed. To demonstrate the effectiveness of load simulation of the distribution system based on the typical load profiles of the nodes, a practical example is presented.

I. INTRODUCTION

The importance of estimation/forecasting in performed management is very great. “Proper management of a system is Action based on prediction” said Edwards Deeming [1].

The estimation of the loads on different parts of the distribution system is one of the most important requirements for efficient operation of electric distribution systems, representing the basis for the system state estimation and for technical and economic calculations. Thus, load estimation, and, especially peak load demand influences various aspects of distribution system planning such as: transformer and conductor sizing, capacitor bank placement and so on. Planning engineers use load estimation to predict load shapes on distribution system [2]. For this purpose, the electric companies need by the accurate load data of the supply customers. There are several factors that influencing the customer’s load [3]: • customer factor: type of consumption, electric heating,

size of building etc; • time factor: time of day, day of week, time of year; • climate factor: temperature, humidity etc; • other electric loads correlated to the target load; • previous load values and load curve patterns.

For an electric customer, the behavior is represented by a load profile corresponding to the electric power consumption for every period of time. Availability of such data depends on the type of customer. Generally, the small customers (like residential ones) are poorly described since a communicating meter is too expensive regarding to their consumption: for these customers there are only a few points of the curve every year. For larger customers, a communicating meter is often

available for many reasons: the billing is done every month, the consumption is high and justifies the communicating meter investment, a detailed record of consumption is necessary because prices depend on the period. Analyzed load curves may correspond to individual customer curves or to aggregates over an electric substation [4].

Distribution system planners traditionally approach load modeling by estimating maximum demand values in conjunction with coincidence factors at various network levels. Although this approach has been adequate, it has several shortfalls [5][6]. • The typical load behavior during off-peak periods is

unknown. • There are inherent inaccuracies due to the use of diversity

and coincidence factors which must be guessed or linked to billing information, if available.

• Energy calculations, especially losses, are not very accurate.

• Voltages at various network positions are unknown. • Load profile dependent initiatives such as Demand Side

Management (DSM) cannot be accurately modeled and evaluated. Solving of these problems can be more efficiently using

load profiles associated nodes from electric distribution networks. The power required by consumers is represented by time series, called load profile. Today, the concept of "load profile" acquires a new dimension, due to modern ways of approach and broad prospects for effective use. Load profiles are here assumed to be described by the same number of points (24 points for a daily hourly measured curve). Each profile is thus considered as a point in a p-dimensional Euclidian space. The Euclidian distance is used as a measure of dissimilarity between profiles. If this distance is not accurate, a preliminary process normalizes the profiles so that the Euclidian distance becomes appropriate [4].

The important advantages of performing load profile based studies, from viewpoint of the operating and planning of electric distribution systems, are [5][6]: • Accurate energy and energy loss calculations can be

performed at any point in the network. • Network loadings and voltages are known for all time

intervals. • Transformer tap settings can be optimized for both peak

and off-peak periods. • Line drop compensation can be accurately modeled. • Profile dependent initiatives such as DSM can be

accurately modeled and evaluated. The effect of Time of Use (TOU) tariffs can be simulated. In the following, an improved approach for the load estimation of

the electric distribution systems, based on the typical load profiles of

301978-1-4673-1653-8/12/$31.00 '2012 IEEE

the nodes is proposed. The typical load profiles of the nodes from analyzed distribution system are obtained using the K-means clustering method.

II. K-MEANS CLUSTERING METHOD

The K-means clustering is an algorithm to classify or to group the objects based on attributes/features into K number of group (K is positive integer number). The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid [5], [6], [8]:

( )⎟⎟⎠

⎞⎜⎜⎝

⎛= ∑∑

= ∈

K

i Cxi

i

zxdE1

,min)min(

(1)

where zi is the center of cluster Ci, while d(x, zi) is the Euclidean distance between a point x and zi.

Thus, the criterion function E attempts to minimize the distance of each point from the center of the cluster to which the point belongs. More specifically, the algorithm begins by initializing a set of K cluster centers. Then, it assigns each object of the dataset to the cluster whose center is the nearest, and recomputed the centers. The process continues until the centers of the clusters stop changing. It is obvious in this algorithm that the final clusters will depend on the initial cluster centers chooses and on the values of K. As example, in Fig 1 these aspects are presented.

a.

b.

Fig. 1. Influence the choice of initial cluster number

The steps of the algorithm are the following, [7] - [10]:

• Step 1. Choose K initial clusters centres z1(0), z2

(0), …, zK(0);

• Step 2. At the k-th iterative step, distribute the samples {x} among the K clusters using the relation:

)(kiCx ∈ if ),(),( )()( k

jk

i zxdzxd <

jiKi ≠= ;,...,2,1 (2)

where Ci(k) denotes the set of samples whose cluster centre

is zi(k).

• Step 3. Compute the new cluster centres zi(k+1), i = 1, 2, …,

K. The new cluster centre is given by:

Kixn

zk

iCxi

ki ...,,2,1,1

)(

)1( == ∑∈

+ (3)

where ni is the number of objects in Ci(k).

• Step 4. Repeat steps 2 and 3 until convergence is achieved, that is until a pass through the training sample causes no new assignments.

It is obvious in this algorithm that the final clusters will depend on the initial cluster centers chooses and on the values of K. For defining of the optimal number of clusters Kopt it can used the following algorithm:

1. Determination of the maximum of clusters Kmax. The maximum of clusters Kmax should be set to satisfy the

condition 2 ≤ Kmax ≤ n , where n is the clustered objects from data base.

2. Using of the K-means clustering method with given K (2 ≤ K ≤ Kmax) for the set of objects from data base.

3. According to the obtained clusters structure, determinate partition quality is evaluated. In the paper, this is achieved through silhouette global coefficient.

4. Increase the number of clusters to the Kmax to see if K-means clustering method finds a better grouping of the data. (To repeat the steps 2 ÷ 3).

5. Show number of clusters (Kopt) that has obtained the optimal value of the silhouette global coefficient.

Evaluating and assessing the results of the K-means algorithm represents the main subject of cluster validity. In the process of cluster analysis the following properties of clusters are being examined: density, sizes and form of cluster, separability of clusters, robustness of classification. There are three main approaches to cluster validation [11] – [19]:

• external tests – the results of classification of input data are compared with the results of classification of data not participating in the basic classification.

• internal tests – only input data is used for the evaluation of classification quality. It is used for validation of the separate cluster, results of hierarchical and iterative classification.

• relative tests – several different classifications of one set of data are compared using the same algorithm of classification with different parameters.

Start z1(0)

Start z2(0)

Start z3(0)

Start z2(0)

Start z1(0)

302

Internal cluster validation tests are more popular in practice of cluster analysis. From these, the test based on the Silhouette Global Index calculation is one of the most used. This calculates the silhouette width for each sample, average silhouette width for each cluster and overall average silhouette width for a total data set. Using this approach each cluster could be represented by so-called silhouette, which is based on the comparison of its tightness and separation. The average silhouette width will be applied for evaluation of clustering validity and also will be used to decide determination of optimal number of clusters.

∑=

=K

jjS

KSC

1

1

(4)

where: Sj - silhouette local coefficient is defined as:

∑=

=jn

ii

jj s

nS

1

1 (5)

si - the silhouette width index for i-object is:

{ }ii

iii ab

abs

,max−

= (6)

ai – mean distance between object i and objects of the same class j;

bi – minimum mean distance between object i and objects in class closest to class j.

In (6) if the object i is the single object of a cluster, then the silhouette si = 0.

In reference [17] is proposed the following interpretation of the SC coefficient: • 0.71 – 1.0 A strong structure has been found; • 0.51 – 0.7 A reasonable structure has been found; • 0.26 – 0.5 The structure is weak and could be artificial; • < 0.25 No substantial structure has been found. Cluster validity checking is one of the most important issues in cluster analysis related to the inherent features of the data set under concern. It aims at the evaluation of clustering results and the selection of the scheme that best fits the underlying data.

III. DETERMINATION OF TYPICAL LOAD PROFILES

An approach to daily load profile determination of the nodes from the electric distribution systems is presented. For this purpose, the K-means clustering method is applied to classify profiles of the nodes into coherent groups – typical load profiles (TLP). The results demonstrate the ability of the proposed method to overcome problems concerning formation of load profiles of the nodes from distribution systems.

By knowing load profile of the nodes, electrical companies can simplify the demand determination for the supply zone of these. Thus, they can provide better and improve efficiency marketing strategies. An alternative to the settlement based on metered demand is load profile based settlement. In this manner, to each node from the electric distribution system, is

assigned a representative load profile. For that reason it is necessary to determine daily load profiles.

The load diagram of the nodes is reconstructed using the normalized load profile and their daily (monthly, yearly, depending the case) energy consumption. The time interval of sampling load curve data is 1 hour. The type load profile is represented by 24 load values throughout of the day.

The shape of load profiles is influenced by the type of node, and on the other hand, by the type of day or season of the year. Because a large number of load profiles regarding various nodes create unnecessary problems in handling them, they could be grouped into coherent groups, seeing that some similarities exist between load profiles. For each coherent group a typical load profile, TLP, is determined. Further, a TLP is assigned to each node, using following steps depicted in Fig. 2 [6][20].

Fig. 2. Diagram of the typical load profile determination

The algorithm is based on the load profiling process. The major steps are:

1. Measurements: In this step a representative sample of the set of load profiles is identified, the most relevant attributes to be measured, the cadence for data collection is defined. Finally the collected data is gathered in a large database.

2. Data cleaning and pre-processing: In real problems, like this, involving a large number of measurements, spread over a large geographic area, collecting data during a considerable period of time different kind of problems will affect the quality of the database. The most relevant and frequent are communication problems, outages, failure of equipment and irregular atypical behavior of some consumers. The result will be a very large database with problems like noise, missing values and outliers. These data (after being cleaned, pre-processed and reduced) are used to obtain the clustering process [23][24].

3. Classification: For realization of this classification, the K-means method is used. For every node from the database is determined the normalized load profile using a suitable normalizing factor (average power, peak power or energy

Measurements

Data cleaning and preprocessing

Classification Type of node

Assignation of the typical load profile to each node

303

over the surveyed period). Then, using a hierarchical clustering method, the normalized load profiles are refined so as to desist at the unrepresentative profiles. The typical load profile for each class is obtained by averaging the values for each hour.

4. Assignation: Finally, to the each node is made the assignation of a typical load profile.

IV. LOAD ESTIMATION OF DISTRIBUTION SYSTEM

An improved method to perform load estimation of electric distribution systems is by load simulation based on typical load profiles of the nodes. The method is based on the following hypotheses presented in [25] - [27]: • The mean loads corresponding to a cluster of nodes from

the distribution system in any hour during the analyzed period, is approximately proportional to the energy consumption of those nodes;

• The loads for any hour of during the analyzed period have a statistical distribution that can be regarded as normal. Using these two facts, the load estimation of any electric

distribution system, at any hour, is given by the following formula:

( ) ][,1 1

2kWWnpWnP

C CN

i

N

i

hiimedi

hiimedi

hS ∑ ∑

= =

+= σ (7)

where: PS

h - the load of the distribution system at the hour h, [kW]; ni - the number of the nodes from cluster i; Wmed - the average energy consumption of the nodes from the cluster i, [kWh]; pi - the average load factor of nodes from cluster i, [kW/kWh]; σi - the enhancement load factor, [kW/kWh]; NC - the number of clusters corresponding to the nodes from distribution system. The hourly values of factors pi and σi are obtained from the

typical load profiles of different types of nodes. The typical load profiles of the nodes from distribution system are obtained using K-means clustering method.

V. CASE STUDY

In the first step of study, the active power profiles of the nodes from an electric distribution system from Romania were processed for to obtain the typical load profiles. The nodes’ number from the distribution system is 34. The period of sampling load profiles is 1 hour. Thus, the type load profile is represented by 24 load values throughout of the day. Active power profiles corresponding to the considered nodes were normalized relatively to the daily energy of these. For this purpose, the following formula was used:

34,...,1;24,...,1; === ihWPp

i

hih

i (8)

where: hip - the normalized values of the active power,

[kW/kWh];

hiP - the active power demanded by the ith node at h hour,

[kW]; iW - the active energy consumed by the ith node, [kWh].

Normalization was made in relation to daily energy because it is always known. This is registered with measurement devices placed in each node. Considering the average and maximum active power as normalization factors can’t be done because they do not know.

In the following, it was applied the algorithm for to obtain the optimal number of clusters. Getting started, the maximum of clusters Kmax was calculated (Kmax = 6). Then, for the set of normalized active power profiles, the K-means clustering method with given K (2 ≤ K ≤ Kmax) is used. Finally, the silhouette global coefficient is calculated for to assess the partition quality. Because the silhouette global coefficient has the highest value for K = 3, this represents the optimal solution for clustering process, Fig. 3. For this solution, the silhouette plot is presented in the Fig. 4.

Fig. 3. The silhouette global coefficient for different values of the number of clusters

Fig. 4. The silhouette plot for Kopt = 3

In the Table I, the characteristics for each obtained cluster are indicated. Thus, it can be seen as the most consistent clusters are C2 and C3, which together accounted for about 80 % from total active power profiles of the nodes. From the viewpoint of the electric energy consumption, it can observe that the highest medium value is registered for the cluster C2, and lowest value is registered for the cluster C1.

304

TABLE I THE RESULTS OF THE CLUSTERING PROCESS

Cluster No. of nodes (%) Wmed

(kWh) WTotal (kWh)

C1 6 17.64 4218.08 330.39 C2 14 41.18 10529.69 865.12 C3 14 41.18 7097.68 631.10

After aggregation of the normalized active power profiles

of each cluster, the typical loading profiles were determined by averaging the values for each hour.

The typical load profiles corresponding to the three obtained clusters (C1, C2, and C3) are presented in the Figs. 5 – 7 and Table II.

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

p (k

W/k

Wh)

Fig. 5. Typical loading profile for C1 cluster

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

p (k

W/k

Wh)


0

0.01

0.02

0.03

0.04

0.05

0.06

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

p (k

W/k

Wh)


The signification of the coefficients p is the following:

these coefficients transform the energy consumed by the medium member of the cluster in average power demanded by it. These coefficients lead us to the typical load profiles corresponding to the active powers.

TABLE II THE HOURLY COEFFICIENTS CORRESPONDING TO THE TYPICAL LOAD PROFILES C1, C2 AND C3

Hour pC1 (kW/kWh)

pC2 (kW/kWh)

pC3

(kW/kWh)1 0.042 0.034 0.034 2 0.037 0.029 0.030 3 0.031 0.026 0.027 4 0.029 0.024 0.026 5 0.029 0.023 0.025 6 0.029 0.023 0.025 7 0.029 0.026 0.027 8 0.029 0.032 0.033 9 0.029 0.039 0.040

10 0.030 0.042 0.042 11 0.033 0.046 0.045 12 0.037 0.049 0.049 13 0.041 0.052 0.051 14 0.047 0.056 0.054 15 0.050 0.058 0.056 16 0.050 0.058 0.055 17 0.051 0.057 0.054 18 0.051 0.053 0.051 19 0.053 0.050 0.049 20 0.053 0.047 0.046 21 0.057 0.048 0.048 22 0.060 0.046 0.048 23 0.054 0.043 0.045 24 0.048 0.040 0.040

In the second step of the study, using information from

Tables I and II and relation (7), the hourly load of the analyzed distribution system was obtained.

Thus, in the Table III and Fig. 8, the real and estimated hourly values of the load corresponding to the analyzed distribution system are presented.

TABLE III THE REAL AND ESTIMATED HOURLY VALUES OF THE LOAD

CORRESPONDING TO THE DISTRIBUTION SYSTEM

Hour Preal (kW)

Pest

(kW) |Preal - Pest|·100 / Preal

(%) 1 9543.20 9518.06 0.26 2 8304.30 8255.41 0.59 3 7432.00 7458.48 0.36 4 6897.50 6935.51 0.55 5 6669.60 6677.66 0.12 6 6703.50 6727.30 0.36 7 7273.10 7343.84 0.97 8 8824.00 9155.61 3.76 9 10314.90 10876.01 5.44

10 11019.80 11608.21 5.34 11 11982.20 12609.39 5.23 12 12933.70 13594.56 5.11 13 13662.10 14300.07 4.67 14 14702.00 15278.61 3.92 15 15345.20 15907.99 3.67 16 15234.50 15793.66 3.67 17 14969.20 15470.06 3.35 18 14227.60 14598.58 2.61 19 13491.50 13710.13 1.62 20 12928.30 13037.30 0.84 21 13264.40 13328.44 0.48 22 13047.60 12991.86 0.43 23 12222.80 12204.47 0.15 24 11098.60 11118.06 0.18

Total 272091.60 278499.28 2.35

305

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Ps

[kW

]

R eal P rofile E s timated P rofile

Fig. 8. The real and estimated load profiles of the analyzed distribution system

From the Table III, it can observe that the most estimated

hourly errors are less than 5 % (approximately 83 % of values) and the average estimation error is by 2.35%. Estimated hourly errors over 5% are recorded in the hourly interval 9 -12.

VI. CONCLUSIONS

The estimation of the loads on different parts of the distribution system is one of the most important requirements for efficient operation of electric distribution systems, representing the basis for the system state estimation and for technical and economic calculations.

In the paper, an improved approach for the load estimation of the electric distribution systems, based on the typical load profiles of the nodes was proposed. The typical load profiles of the nodes from analyzed distribution system are obtained using the K-means clustering method.

A comparison of the obtained results with the real registered data indicates that the average estimation error is by 2.35 %.

REFERENCES [1] A. Leca, Energetic Management Principles (in romanian), Ed.

Tehnică, Romania, Bucharest, 1997. [2] R.P. Broadwater, Al. Sargent,, A. Yarali, H. Shaalan, and J. Nazarko,

Estimating Substation Peaks from Load Research Data, IEEE Trans. on Power Delivery, vol.12, , pp. 451-456, 1997.

[3] A. Seppãlã, Load research and load estimation in electricity distribution, [Online]. Available: www.enease.fi/asepthes.pdf.

[4] G. Hebrail, Practical Data Mining in Large Utility Company, [Online]. Available:http://www.imfres.enst.fr/~hebrail/publications/hdr/Compstat_2001. pdf.

[5] C. G. Carter-Brown, Load Profile Modeling for Integrated Energy Planning, in Proc. of Domestic Use of Electrical Energy Conference, 1999, pp.13 - 18.

[6] G. Cartina, G. Grigoraş, and E.C. Bobric, Clustering Techniques in Fuzzy Modeling. Power System Applications, (in romanian), VENUS Publishing House, Romania, Iaşi, 2005.

[7] S. Ray, and R.H. Turi, “Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation”, Proc. of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques, Calcutta, India, pp. 137-143, 1999.

[8] I. Yatskiv, and L. Gusarova, “The Methods of Cluster Analysis Results Validation”, Proc. of International Conference RelStat’04, 2004, pp. 75 - 80.

[9] G. Grigoras, G. Cartina, M. Istrate, Fl. Rotaru, The Efficiency of the Clustering Techniques in the Energy Losses Evaluation from

Distribution Networks, International Journal of Mathematical Models and Methods in Applied Sciences, vol. 5, pp. 133-140, 2011.

[10] N.R. Pal, and J.C. Bezdek, “On Clustering Validity for the Fuzzy C-mean model”, IEEE Trans. Fuzzy Systems, vol. 3, pp. 370 – 379, 1995.

[11] A.D. Gordon, Classification, 2nd ed., Chapman & Hall, NewYork, USA, 1999.

[12] M.R. Razaee, B.P.F. Lelieveldt, and J.H.C. Reiber, “A New Cluster Validity Index for the Fuzzy C-means”, Pattern Recognition Letters, vol. 19, 237 – 246, 1998.

[13] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On Clustering Validation Techniques”, Journal of Intelligent Information Systems, vol. 17, No. 2/3, pp. 107 – 145, 2001.

[14] P. Berkhin, Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, California, [Online]. Available: http://citeseer.nj.nec.com/berkhin02survey.html.

[15] M. Holgersson, “The limited value of cophenetic correlation as a clustering criterion", Pattern Recognition, vol. 10, no. 4, pp. 287 – 295, 1978.

[16] The Matlab website. Matlab and Simulink for Technical Computing, [Online]. Available: http://www.mathworks.com/help/toolbox/stats/ cophenet.html.

[17] P. J. Rousseeuw, “Silhouettes: a Graphical Aid to the Interpretation and Validation Cluster Analysis”, Journal of Computational Applied Mathematics, vol. 20, pp. 53 – 65, 1987.

[18] A.K. Jain, M.N Murty, and P.J. Flynn, “Data Clustering: A Review”, ACM Computing Surveys, vol. 31, no. 3, pp. 264 – 323, Sept. 1999.

[19] SAS Institute Inc, JMP Statistics and Graphics Guide. Version 3, Cary, NC, USA, 1995.

[20] S. Gasperic, D. Gerbek, and F. Gubina. Determination of the Consumers’ Load Profiles. [Online]. Available: www.telmark.org /2002Sep /2-5_Gasperic.pdf.

[21] G. Chicco, R. Napoli, F. Piglione, P. Postolache, M. Scutariu, and C. Toader. (2002) A Review of Concepts and Techniques for Emergent Customer Categorization. [Online]. Available: www.telmark.org/ 2002Sep/2-4_Chicco.pdf.

[22] V. Miranda, J. Pereira, and J. Saraiava, “Load Allocation in DMS with a Fuzzy State Estimator”, IEEE Transaction on Power Systems, vol. 15, no. 2, pp. 329 – 534, 2000.

[23] L. R. Garcia-Escudero, A. Gordaliza, “A proposal for robust curve clustering”, Journal of Classification, vol. 22, no. 2, pp. 185 – 201, 2005.

[24] G. Cartina, G. Grigoras, E.C. Bobric, Robust Load Models for Customer’s Consumption, Proc. of 2th International Conference on Modern Power Systems, Cluj, Romania, pp. 121 – 124, 2008.

[25] British Electricity Boards, “Report on the Design of Low Voltage Underground Networks for New Housing”, ACE Report No. 105, 1986.

[26] British Electricity Boards, “Report on the Computer Program DEBUTE for the Design of LV Radial Networks. Part 1 - General Considerations”, Part 2 – Program User Guide, Report No. 115, 1988.

[27] A. R. Hileman Probability and Statistics for Power Systems Engineers, Westinghouse Electric Corporation, Pittsburgh, 1986.

306

[ieee 2012 13th international conference on optimization of electrical and electronic equipment...

Documents