short-term load forecasting based on big data technology · distribution transformer load data of...

LOGO

ShortShort--term load term load forecasting based on big forecasting based on big

data technologydata technology

School of Electrical EngineeringBeijing Jiaotong University

Dr. Jinghan He

1

OutlineOutline

1 Background

2 Solutions

3 Case study

2

4 Conclusion

Overview

Short term load forecasting refers to forecasting load curve of coming several hours ,one day or several days ahead. It is not only ensure power system to operate safely and economically, and alsolay the foundation of formulating electricity dispatching schedule and transactions.

Short-term load forecasting accuracy has been listed as an important assessment indicator in Grid Corporation. With the development of power market, the requirement of forecasting accuracy, timeliness, reliability and intelligence become higher.

3

Current Situation

•At present, short-term load forecasting mainly targets the total load, or a deeper level substation bus load, then obtained the total load by accumulating all the bus load.

Object

•The core of power load forecasting is algorithm. Most of the current methods can be divided into classical methods, traditional forecasting methods and intelligent forecasting methods.

Method

4

Forecasting algorithms

The research of short-term load forecasting has a very long history. Many scholars have put forward different algorithms. The table below shows some of their advantages and weakness.

The research of short-term load forecasting has a very long history. Many scholars have put forward different algorithms. The table below shows some of their advantages and weakness.

Advantages & WeaknessForecasting algorithms

Advantages： Simple principle ， Fast calculationWeakness：Weather and other factors are not considered

Regression analysis . etc

Artificial Neural Network . etc

Classical Method

Traditional Method

Intelligent Method

Exponential Smoothing, Time series .etc

Advantages： Easy ApplicationWeakness：Difficult to find Regression parameters

Advantages： Able to handle non-linear and stochastic problemWeakness：Model training need much of time ; The existence of local minimum points, etc.

5

IssuesIssues

6

The existing algorithms ordinarily treat all users load as a whole namely system load. But this kind of method must ignore the individual load properties.

The existing algorithms ordinarily treat all users load as a whole namely system load. But this kind of method must ignore the individual load properties.

OutlineOutline

1 Background

2 Solutions

3 Case study

7

4 Conclusion

Methodology

Massive user load dataForecasting Forecasting

modelmodel

Research Research objectobject

Normal approach: System load

New approach: User load

User electricity User electricity behaviorbehavior Different users Different

electricity law

Different forecasting method

Research object

Forecasting model

Relationship between

influential factors and load

Proper model for

Each kind of load type

Data Mining

8

SolutionSolution FrameworkFramework

2 c

9

Step 1:Improved Hierarchical Cluster Technique

STEP 1STEP 1

Cluster 1 Cluster2EXAMPLE

Classify daily load curves of each user into several types using improved hierarchical cluster technique.

Classify daily load curves of each user into several types using improved hierarchical cluster technique.

Selection of proper distance algorithm is the key for clustering technique:

21 212

1 max

( )n

k k

k

x xdx

10

Step 2:Grey Relation Analysis

For different users, critical influential factors are not the same. For example, temperature could be a major influential factor for residential loads. But temperature may have little impact on industrial loads.

Grey association analysis is applied to determine the critical influential factors for each individual load.

For different users, critical influential factors are not the same. For example, temperature could be a major influential factor for residential loads. But temperature may have little impact on industrial loads.

Grey association analysis is applied to determine the critical influential factors for each individual load.

STEP 2STEP 2

Data process Calculate correlation Date process

11

STEP 3STEP 3

Step 3:Establish Cart Decision Tree

Based on cluster analysis and association analysis results, a decision tree is developed to establish relationship between clustering results and critical factors.

Based on cluster analysis and association analysis results, a decision tree is developed to establish relationship between clustering results and critical factors.

As shown in the figure :Input : Critical influential factors determined by association analysis;The load patterns determined by hierarchical clustering analysis are used as leave node labels.Output:

Decision tree = classification rules.

12

Step 4:Choose Appropriate SVM Parameters

• For each type of load patterns, corresponding data set is chosen (cluster results) to train SVM model in order to determine SVM parameter shown in the figure below ;

• Therefore, a set of SVM models are developed for each load. Once the forecasting day’s load pattern is determined according to the decision tree, the corresponding SVM forecasting model with appropriate parameters is used.

• For each type of load patterns, corresponding data set is chosen (cluster results) to train SVM model in order to determine SVM parameter shown in the figure below ;

• Therefore, a set of SVM models are developed for each load. Once the forecasting day’s load pattern is determined according to the decision tree, the corresponding SVM forecasting model with appropriate parameters is used.

STEP 4STEP 4

EXAMPLE

Target day:2013/1/2

Target User: #123

STEP 1 M load patterns

M load data set

STEP 2Critical influential

factors

STEP 3

Classification rules

M SVM model parameters

Use M load data set as training set

Historical load data of #123

M SVM models

Input factor value

Max Temp 21℃

Ave Humid 54.3%Ave Temp 12.75℃Day Type Saturday

Forecasting value2013/1/2 Classification rules

Select one model

Finish Forecasting one user

STEP 5STEP 5

Step 5:Forecasting Total System Load

• The total system load is forecasted based on aggregation of individual load’s forecasting results as shown in Figure below;

• Repeat the four steps for each user, then aggregate all of them.

• The total system load is forecasted based on aggregation of individual load’s forecasting results as shown in Figure below;

• Repeat the four steps for each user, then aggregate all of them.

(i)1

n

total loss useri

L L l

The forecasted total load can be calculated by adding all of the forecasted individual load together with line loss .

The forecasted total load can be calculated by adding all of the forecasted individual load together with line loss .

14

OutlineOutline

1 Background

2 Solutions

3 Case study

15

4 Conclusion

Case StudyCase Study

This area has 16 220(kV) substations. We use one distribution transformer load data of the yellow area to demonstrate our method.

The distribution map of main substations in research area

16

Historical load data:2012/1/1~2012/12/31Forecasting :2013/1/25~2013/1/31

Historical load data:2012/1/1~2012/12/31Forecasting :2013/1/25~2013/1/31

Hierarchical cluster resultsHierarchical cluster results

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Time/h

Load

/Mw

All daily curves from 1/1/2012 to 20/11/2012

Daily load curves in 2012Cluster results of 2012

• The historical daily load curves in 2012 are classified into 6 clusters；• Each color in the right page represents a cluster; • The historical daily load curves in 2012 are classified into 6 clusters；• Each color in the right page represents a cluster;

23 26 27 25 24 22275 95122 21 1365 28276 71121 94134274358 2 29123 72 96183 8113232 85 14175351344323 15 88106 19 57302 36155 43295330 78246337 20 64182316218364221 3 30 16 89114 73 4 31 32 77 37124 5 63 11 17 10109 62100 18 33110120 44 79107296 74 97176 9101 13 41 76 90273205105 45115 75 83119357 48 58 7211 50260 61108 99288 47149329 49 52 84 91 92139 12 87 93204197239 6129 42145 39118 86141 53233206 70 34 35136 40356111137225 51153 69102164126285 56127 67112140253 55117152104138150169161144130360362 82159 98309 59297128241279 60103267151245270271269325353272361181254268207338 46168263342343259321262312216346228249158234278236188193220194258347229264265292293320318322192195201210306199305131165282341308132143167244177291186311319332304191237256284301328336212310352238290283255327348349230251261345 54 81116202257174339324335214163224326 66133315 68171314178209226 80160286142125154289148280170185208298146252266287189340350248299162300196294355187333313334147231190172173213235215227242281166203217354179243363223180250157200307 38156240198303 651841353312193172472222773590

10

20

30

40

50

60

70

80

90

17

C1

C2

C3 C4 C5 C6


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Time/h

Load

/Mw

Cluster 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Time/h

Load

/Mw

Cluster 2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Time/h

Load

/Mw

Cluster 3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Time/h

Load

/Mw

Cluster 4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Time/h

Load

/Mw

Cluster 5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Time/h

Load

/Mw

Cluster 6

Cluster1 Cluster2

Cluster3 Cluster4

Cluster5 Cluster6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Time/h

Load

/Mw

All daily curves from 1/1/2012 to 20/11/2012

Cluster results

All daily load curves in 2012

18


Day type C1 C2 C3 C4 C5 C6

Monday 4 0 14 27 12 3

Tuesday 0 0 24 33 2 4

Wednesday 2 2 34 18 2 2

Thursday 2 1 31 20 2 2

Friday 0 1 39 21 0 1

Saturday 0 21 6 3 1 2

Sunday 2 22 4 2 0 7

Typical day type in 6 clustersThe sixth cluster：Special holiday New Year’s Day (30/12/2012,1/1/2012), Spring Festival (21/1/2012~28/1/2012), Qingming Festival(3/4/2012~4/4/2012), Labor Day(1/5/2012), National Day(30/9/2012~2/10/2012).

In the table，3rd cluster and 4th cluster are mainly workday. 2nd cluster are mainly weekends,. 5th cluster are mainly Monday.

Further analysis, 6th cluster are mainly special holiday, and the 1st cluster are mainly the day after special holiday;

Conclusion：cluster results can reflect that day type have strong influence on load curve shape.

In the table，3rd cluster and 4th cluster are mainly workday. 2nd cluster are mainly weekends,. 5th cluster are mainly Monday.

Further analysis, 6th cluster are mainly special holiday, and the 1st cluster are mainly the day after special holiday;

Conclusion：cluster results can reflect that day type have strong influence on load curve shape.

The first cluster：The day after special holiday:

(2/1/2012), (29/1/2012), (5/4/2012), (2/5/2012),

19

Grey association analysisGrey association analysis

Use daily load data of Jiute distribution substation and local weather information as inputs. Output the grey correlation of influential factors like below:

Use daily load data of Jiute distribution substation and local weather information as inputs. Output the grey correlation of influential factors like below:

Max temp Ave humidity

Ave temp day type wind velocity

Average precipitation

0.7512 0.6588 0.5346 0.4472 0.2214 0.2137

Gray Correlation of 6 influential factors:

• Sequence theses 6 factors ， Excluding wind velocity and average precipitation which have weak correlation;

• Conclusion：The results can reflect the correlation between different factors and load. These four critical influential factors are inputs of CART decision tree.

• Sequence theses 6 factors ， Excluding wind velocity and average precipitation which have weak correlation;

• Conclusion：The results can reflect the correlation between different factors and load. These four critical influential factors are inputs of CART decision tree.

20

Establishment of classification rulesEstablishment of classification rules

21

Set 6 cluster label as decision tree leaf node. According to the grey correlation, set four critical factors as attribute set

to split. CART decision tree algorithm is used to build the classification rules.

Set 6 cluster label as decision tree leaf node. According to the grey correlation, set four critical factors as attribute set

to split. CART decision tree algorithm is used to build the classification rules.

27/1/2013Input factor value

Max Temp 21℃

Ave Humid 54.3%Ave Temp 12.75℃Day Type Saturday

Cluster：2

Date Classification results

Date Classification results

25/1/2013 Cluster 3 29/1/2013 Cluster 5

26/1/2013 Cluster 4 30/1/2013 Cluster 1

27/1/2013 Cluster 2 31/1/2013 Cluster 6

28/1/2013 Cluster 1

Classification results of the forecasting daysClassification results of the forecasting days

Forecasting modelForecasting model

Forecasting model is SVM model：1.Use 6 cluster load data set to train 6 corresponding SVM models;2.Accoding to the classification results of forecasting days , find the

corresponding SVM model to finish forecasting;3.Add up all distribution substation forecasting results to get the total

load;

Forecasting model is SVM model：1.Use 6 cluster load data set to train 6 corresponding SVM models;2.Accoding to the classification results of forecasting days , find the

corresponding SVM model to finish forecasting;3.Add up all distribution substation forecasting results to get the total

load;

Cluster 1 0.001 20 110

Cluster2 0.0015 37 601

Cluster3 0.0071 87 115

Cluster4 0.0032 20 341

Cluster5 0.0016 28 200

Cluster6 0.0092 32 123

2 c where ，， is main factors of SVM model.

The parameter combination in left is find by Genetic Algorithms.

2 cParameter combination for each clusterParameter combination for each cluster

22

Implemented on Hadoop PlatformImplemented on Hadoop Platform

23

The former four steps are just one distribution transformer’s forecasting process. To get system load forecasting result. We need to repeat this for all distribution transformers and then add them up by big data technology.

The former four steps are just one distribution transformer’s forecasting process. To get system load forecasting result. We need to repeat this for all distribution transformers and then add them up by big data technology.

Our short-term forecasting framework was implemented on Hadoop platform using four PC servers. Each PC server has two E5-2630V2 CPU and 500G memory. The table below shows the computation time.

Tasks Size Completion Time

Cluster(offline) 1.2 million consumers (1 yrs) 24 mins

Forecasting(online) 1.2 million consumers (daily) 110 secs

Capability of data

processing

Forecasting resultsForecasting resultsJiute distribution substation load Total System load

Max relative error：5.20%Min relative error：0.39%Average relative error：1.61%

Max relative error：1.35%Min relative error ：0.07%Average relative error ：0.57%

24

0 5 10 15 200

5

10

Time/h

Load

/MW

29/4/2013 Monday

Oringinal Load New Approach Traditional Approach

0 5 10 15 200

5

10

Time/h

Rel

ativ

e E

rror/%

New Approach Traditional Approach

0 5 10 15 201600

1800

2000

2200

2400

Time/h

Load

/MW

29/4/2013 Monday

Oringinal Load New Approach Traditional Approach

0 5 10 15 200

2

4

Time/h

Rel

ativ

e E

rror/%

New Approach Traditional Approach

Add all distribution

transformers’forecasting

results

The traditional approach which use SVM model and only focus on system load: Max relative error: 3.36% Min relative error:0.51% Ave relative error:1.68%.

It means the proposed method is better than traditional approach.

The traditional approach which use SVM model and only focus on system load: Max relative error: 3.36% Min relative error:0.51% Ave relative error:1.68%.

It means the proposed method is better than traditional approach.

OutlineOutline

1 Background

2 Solutions

3 Case study

25

4 Conclusion

ConclusionConclusion

26

• The proposed new framework of performing short-term load forecasting using smart meter data based on big data technologies.

• The proposed new framework of performing short-term load forecasting using smart meter data based on big data technologies.

Model selection

Step4:

Each load patterns

One SVM Model

model parameter selection

The optimal parameter combination

Decision trees

Step3:

Classification rules

Association analysis

Critical influential factors

Step2:

Rainfall

HumidityTemperature

Rainfall

Cluster

Different load patterns

Step1:

Forecast

Step5:

Forcasting results

Each individual user’s load

System load data

Add up

Step 1to 4 are carried out offline. In distribution EMS real-time application, once

the date, forecasting weather data are available, the decision tree can find out which cluster the individual load belongs to.

Then the appropriate forecasting model is selected to forecast individual load.

Finally, the system load is forecasted. Case studies using a practical distribution system

smart meter data indicate that the proposed forecasting method can not only guarantee prediction accuracy within the predefined range but also help distribution system operators gain better understanding of which individual load causes forecasting error.

Conclusion:

Thank You!Thank You!

27

short-term load forecasting based on big data technology · distribution transformer load data of...

Documents