short-term load forecasting based on big data technology · distribution transformer load data of...
TRANSCRIPT
LOGO
ShortShort--term load term load forecasting based on big forecasting based on big
data technologydata technology
School of Electrical EngineeringBeijing Jiaotong University
Dr. Jinghan He
1
OutlineOutline
1 Background
2 Solutions
3 Case study
2
4 Conclusion
Overview
Short term load forecasting refers to forecasting load curve of coming several hours ,one day or several days ahead. It is not only ensure power system to operate safely and economically, and alsolay the foundation of formulating electricity dispatching schedule and transactions.
Short-term load forecasting accuracy has been listed as an important assessment indicator in Grid Corporation. With the development of power market, the requirement of forecasting accuracy, timeliness, reliability and intelligence become higher.
3
Current Situation
•At present, short-term load forecasting mainly targets the total load, or a deeper level substation bus load, then obtained the total load by accumulating all the bus load.
Object
•The core of power load forecasting is algorithm. Most of the current methods can be divided into classical methods, traditional forecasting methods and intelligent forecasting methods.
Method
4
Forecasting algorithms
The research of short-term load forecasting has a very long history. Many scholars have put forward different algorithms. The table below shows some of their advantages and weakness.
The research of short-term load forecasting has a very long history. Many scholars have put forward different algorithms. The table below shows some of their advantages and weakness.
Advantages & WeaknessForecasting algorithms
Advantages: Simple principle , Fast calculationWeakness:Weather and other factors are not considered
Regression analysis . etc
Artificial Neural Network . etc
Classical Method
Traditional Method
Intelligent Method
Exponential Smoothing, Time series .etc
Advantages: Easy ApplicationWeakness:Difficult to find Regression parameters
Advantages: Able to handle non-linear and stochastic problemWeakness:Model training need much of time ; The existence of local minimum points, etc.
5
IssuesIssues
6
The existing algorithms ordinarily treat all users load as a whole namely system load. But this kind of method must ignore the individual load properties.
The existing algorithms ordinarily treat all users load as a whole namely system load. But this kind of method must ignore the individual load properties.
OutlineOutline
1 Background
2 Solutions
3 Case study
7
4 Conclusion
Methodology
Massive user load dataForecasting Forecasting
modelmodel
Research Research objectobject
Normal approach: System load
New approach: User load
User electricity User electricity behaviorbehavior Different users Different
electricity law
Different forecasting method
Research object
Forecasting model
Relationship between
influential factors and load
Proper model for
Each kind of load type
Data Mining
8
SolutionSolution FrameworkFramework
2 c
9
Step 1:Improved Hierarchical Cluster Technique
STEP 1STEP 1
Cluster 1 Cluster2EXAMPLE
Classify daily load curves of each user into several types using improved hierarchical cluster technique.
Classify daily load curves of each user into several types using improved hierarchical cluster technique.
Selection of proper distance algorithm is the key for clustering technique:
21 212
1 max
( )n
k k
k
x xdx
10
Step 2:Grey Relation Analysis
For different users, critical influential factors are not the same. For example, temperature could be a major influential factor for residential loads. But temperature may have little impact on industrial loads.
Grey association analysis is applied to determine the critical influential factors for each individual load.
For different users, critical influential factors are not the same. For example, temperature could be a major influential factor for residential loads. But temperature may have little impact on industrial loads.
Grey association analysis is applied to determine the critical influential factors for each individual load.
STEP 2STEP 2
Data process Calculate correlation Date process
11
STEP 3STEP 3
Step 3:Establish Cart Decision Tree
Based on cluster analysis and association analysis results, a decision tree is developed to establish relationship between clustering results and critical factors.
Based on cluster analysis and association analysis results, a decision tree is developed to establish relationship between clustering results and critical factors.
As shown in the figure :Input : Critical influential factors determined by association analysis;The load patterns determined by hierarchical clustering analysis are used as leave node labels.Output:
Decision tree = classification rules.
12
Step 4:Choose Appropriate SVM Parameters
• For each type of load patterns, corresponding data set is chosen (cluster results) to train SVM model in order to determine SVM parameter shown in the figure below ;
• Therefore, a set of SVM models are developed for each load. Once the forecasting day’s load pattern is determined according to the decision tree, the corresponding SVM forecasting model with appropriate parameters is used.
• For each type of load patterns, corresponding data set is chosen (cluster results) to train SVM model in order to determine SVM parameter shown in the figure below ;
• Therefore, a set of SVM models are developed for each load. Once the forecasting day’s load pattern is determined according to the decision tree, the corresponding SVM forecasting model with appropriate parameters is used.
STEP 4STEP 4
EXAMPLE
Target day:2013/1/2
Target User: #123
STEP 1 M load patterns
M load data set
STEP 2Critical influential
factors
STEP 3
Classification rules
M SVM model parameters
Use M load data set as training set
Historical load data of #123
M SVM models
Input factor value
Max Temp 21℃
Ave Humid 54.3%Ave Temp 12.75℃Day Type Saturday
Forecasting value2013/1/2 Classification rules
Select one model
Finish Forecasting one user
STEP 5STEP 5
Step 5:Forecasting Total System Load
• The total system load is forecasted based on aggregation of individual load’s forecasting results as shown in Figure below;
• Repeat the four steps for each user, then aggregate all of them.
• The total system load is forecasted based on aggregation of individual load’s forecasting results as shown in Figure below;
• Repeat the four steps for each user, then aggregate all of them.
(i)1
n
total loss useri
L L l
The forecasted total load can be calculated by adding all of the forecasted individual load together with line loss .
The forecasted total load can be calculated by adding all of the forecasted individual load together with line loss .
14
OutlineOutline
1 Background
2 Solutions
3 Case study
15
4 Conclusion
Case StudyCase Study
This area has 16 220(kV) substations. We use one distribution transformer load data of the yellow area to demonstrate our method.
The distribution map of main substations in research area
16
Historical load data:2012/1/1~2012/12/31Forecasting :2013/1/25~2013/1/31
Historical load data:2012/1/1~2012/12/31Forecasting :2013/1/25~2013/1/31
Hierarchical cluster resultsHierarchical cluster results
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240
2
4
6
8
10
12
14
Time/h
Load
/Mw
All daily curves from 1/1/2012 to 20/11/2012
Daily load curves in 2012Cluster results of 2012
• The historical daily load curves in 2012 are classified into 6 clusters;• Each color in the right page represents a cluster; • The historical daily load curves in 2012 are classified into 6 clusters;• Each color in the right page represents a cluster;
23 26 27 25 24 22275 95122 21 1365 28276 71121 94134274358 2 29123 72 96183 8113232 85 14175351344323 15 88106 19 57302 36155 43295330 78246337 20 64182316218364221 3 30 16 89114 73 4 31 32 77 37124 5 63 11 17 10109 62100 18 33110120 44 79107296 74 97176 9101 13 41 76 90273205105 45115 75 83119357 48 58 7211 50260 61108 99288 47149329 49 52 84 91 92139 12 87 93204197239 6129 42145 39118 86141 53233206 70 34 35136 40356111137225 51153 69102164126285 56127 67112140253 55117152104138150169161144130360362 82159 98309 59297128241279 60103267151245270271269325353272361181254268207338 46168263342343259321262312216346228249158234278236188193220194258347229264265292293320318322192195201210306199305131165282341308132143167244177291186311319332304191237256284301328336212310352238290283255327348349230251261345 54 81116202257174339324335214163224326 66133315 68171314178209226 80160286142125154289148280170185208298146252266287189340350248299162300196294355187333313334147231190172173213235215227242281166203217354179243363223180250157200307 38156240198303 651841353312193172472222773590
10
20
30
40
50
60
70
80
90
17
C1
C2
C3 C4 C5 C6
Hierarchical cluster resultsHierarchical cluster results
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240
2
4
6
8
10
12
14
Time/h
Load
/Mw
Cluster 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240
2
4
6
8
10
12
14
Time/h
Load
/Mw
Cluster 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240
2
4
6
8
10
12
14
Time/h
Load
/Mw
Cluster 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240
2
4
6
8
10
12
14
Time/h
Load
/Mw
Cluster 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240
2
4
6
8
10
12
14
Time/h
Load
/Mw
Cluster 5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240
2
4
6
8
10
12
14
Time/h
Load
/Mw
Cluster 6
Cluster1 Cluster2
Cluster3 Cluster4
Cluster5 Cluster6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240
2
4
6
8
10
12
14
Time/h
Load
/Mw
All daily curves from 1/1/2012 to 20/11/2012
Cluster results
All daily load curves in 2012
18
Hierarchical cluster resultsHierarchical cluster results
Day type C1 C2 C3 C4 C5 C6
Monday 4 0 14 27 12 3
Tuesday 0 0 24 33 2 4
Wednesday 2 2 34 18 2 2
Thursday 2 1 31 20 2 2
Friday 0 1 39 21 0 1
Saturday 0 21 6 3 1 2
Sunday 2 22 4 2 0 7
Typical day type in 6 clustersThe sixth cluster:Special holiday New Year’s Day (30/12/2012,1/1/2012), Spring Festival (21/1/2012~28/1/2012), Qingming Festival(3/4/2012~4/4/2012), Labor Day(1/5/2012), National Day(30/9/2012~2/10/2012).
In the table,3rd cluster and 4th cluster are mainly workday. 2nd cluster are mainly weekends,. 5th cluster are mainly Monday.
Further analysis, 6th cluster are mainly special holiday, and the 1st cluster are mainly the day after special holiday;
Conclusion:cluster results can reflect that day type have strong influence on load curve shape.
In the table,3rd cluster and 4th cluster are mainly workday. 2nd cluster are mainly weekends,. 5th cluster are mainly Monday.
Further analysis, 6th cluster are mainly special holiday, and the 1st cluster are mainly the day after special holiday;
Conclusion:cluster results can reflect that day type have strong influence on load curve shape.
The first cluster:The day after special holiday:
(2/1/2012), (29/1/2012), (5/4/2012), (2/5/2012),
19
Grey association analysisGrey association analysis
Use daily load data of Jiute distribution substation and local weather information as inputs. Output the grey correlation of influential factors like below:
Use daily load data of Jiute distribution substation and local weather information as inputs. Output the grey correlation of influential factors like below:
Max temp Ave humidity
Ave temp day type wind velocity
Average precipitation
0.7512 0.6588 0.5346 0.4472 0.2214 0.2137
Gray Correlation of 6 influential factors:
• Sequence theses 6 factors , Excluding wind velocity and average precipitation which have weak correlation;
• Conclusion:The results can reflect the correlation between different factors and load. These four critical influential factors are inputs of CART decision tree.
• Sequence theses 6 factors , Excluding wind velocity and average precipitation which have weak correlation;
• Conclusion:The results can reflect the correlation between different factors and load. These four critical influential factors are inputs of CART decision tree.
20
Establishment of classification rulesEstablishment of classification rules
21
Set 6 cluster label as decision tree leaf node. According to the grey correlation, set four critical factors as attribute set
to split. CART decision tree algorithm is used to build the classification rules.
Set 6 cluster label as decision tree leaf node. According to the grey correlation, set four critical factors as attribute set
to split. CART decision tree algorithm is used to build the classification rules.
27/1/2013Input factor value
Max Temp 21℃
Ave Humid 54.3%Ave Temp 12.75℃Day Type Saturday
Cluster:2
Date Classification results
Date Classification results
25/1/2013 Cluster 3 29/1/2013 Cluster 5
26/1/2013 Cluster 4 30/1/2013 Cluster 1
27/1/2013 Cluster 2 31/1/2013 Cluster 6
28/1/2013 Cluster 1
Classification results of the forecasting daysClassification results of the forecasting days
Forecasting modelForecasting model
Forecasting model is SVM model:1.Use 6 cluster load data set to train 6 corresponding SVM models;2.Accoding to the classification results of forecasting days , find the
corresponding SVM model to finish forecasting;3.Add up all distribution substation forecasting results to get the total
load;
Forecasting model is SVM model:1.Use 6 cluster load data set to train 6 corresponding SVM models;2.Accoding to the classification results of forecasting days , find the
corresponding SVM model to finish forecasting;3.Add up all distribution substation forecasting results to get the total
load;
Cluster 1 0.001 20 110
Cluster2 0.0015 37 601
Cluster3 0.0071 87 115
Cluster4 0.0032 20 341
Cluster5 0.0016 28 200
Cluster6 0.0092 32 123
2 c where , , is main factors of SVM model.
The parameter combination in left is find by Genetic Algorithms.
2 cParameter combination for each clusterParameter combination for each cluster
22
Implemented on Hadoop PlatformImplemented on Hadoop Platform
23
The former four steps are just one distribution transformer’s forecasting process. To get system load forecasting result. We need to repeat this for all distribution transformers and then add them up by big data technology.
The former four steps are just one distribution transformer’s forecasting process. To get system load forecasting result. We need to repeat this for all distribution transformers and then add them up by big data technology.
Our short-term forecasting framework was implemented on Hadoop platform using four PC servers. Each PC server has two E5-2630V2 CPU and 500G memory. The table below shows the computation time.
Tasks Size Completion Time
Cluster(offline) 1.2 million consumers (1 yrs) 24 mins
Forecasting(online) 1.2 million consumers (daily) 110 secs
Capability of data
processing
Forecasting resultsForecasting resultsJiute distribution substation load Total System load
Max relative error:5.20%Min relative error:0.39%Average relative error:1.61%
Max relative error:1.35%Min relative error :0.07%Average relative error :0.57%
24
0 5 10 15 200
5
10
Time/h
Load
/MW
29/4/2013 Monday
Oringinal Load New Approach Traditional Approach
0 5 10 15 200
5
10
Time/h
Rel
ativ
e E
rror/%
New Approach Traditional Approach
0 5 10 15 201600
1800
2000
2200
2400
Time/h
Load
/MW
29/4/2013 Monday
Oringinal Load New Approach Traditional Approach
0 5 10 15 200
2
4
Time/h
Rel
ativ
e E
rror/%
New Approach Traditional Approach
Add all distribution
transformers’forecasting
results
The traditional approach which use SVM model and only focus on system load: Max relative error: 3.36% Min relative error:0.51% Ave relative error:1.68%.
It means the proposed method is better than traditional approach.
The traditional approach which use SVM model and only focus on system load: Max relative error: 3.36% Min relative error:0.51% Ave relative error:1.68%.
It means the proposed method is better than traditional approach.
OutlineOutline
1 Background
2 Solutions
3 Case study
25
4 Conclusion
ConclusionConclusion
26
• The proposed new framework of performing short-term load forecasting using smart meter data based on big data technologies.
• The proposed new framework of performing short-term load forecasting using smart meter data based on big data technologies.
Model selection
Step4:
Each load patterns
One SVM Model
model parameter selection
The optimal parameter combination
Decision trees
Step3:
Classification rules
Association analysis
Critical influential factors
Step2:
Rainfall
HumidityTemperature
Rainfall
Cluster
Different load patterns
Step1:
Forecast
Step5:
Forcasting results
Each individual user’s load
System load data
Add up
Step 1to 4 are carried out offline. In distribution EMS real-time application, once
the date, forecasting weather data are available, the decision tree can find out which cluster the individual load belongs to.
Then the appropriate forecasting model is selected to forecast individual load.
Finally, the system load is forecasted. Case studies using a practical distribution system
smart meter data indicate that the proposed forecasting method can not only guarantee prediction accuracy within the predefined range but also help distribution system operators gain better understanding of which individual load causes forecasting error.
Conclusion:
Thank You!Thank You!
27