rifat edi̇zkan resw-2015 redizkan 18 ocak_2
DESCRIPTION
afyon_kış_okuluTRANSCRIPT
Dr. Rifat Edizkan
Eskişehir Osmangazi University
Januray 18, 2015 Renewable Energy Systems Winter School-2015 1
SOLAR RADIATION MODELING
USING HIDDEN MARKOV MODELS
Outline
Januray 18, 2015 2
Introduction
Discrete Markov Process
Hidden Markov Model
Three Problems
HMM Model for Daily Solar Radiation
Januray 18, 2015 3
Introduction Accurate modeling and forecating of solar and wind data is important for energy planning. Mathematical models for solar radiation are used to determine its stotastical properties and long term behavior.
Januray 18, 2015 4
Mathematical Models
Triogonometric model
Cosine wave correlation
Stochastic models
Parametic model using Neural networks (NN)
2D approach for solar radiation forecasting
Artificail Neural Network (ANN)
Hidden Markov Model
Machine Learning Algorithms
• Set of states:
• Process moves from one state to another generating a sequence
of states :
• Markov chain property: probability of each subsequent state
depends only on what was the previous state:
• To define Markov model, the following probabilities have to be
specified: transition probabilities and initial
probabilities
• The output of the process is the set of states at each instant of
time
Januray 18, 2015 5
Discrete Markov Models
},,,{ 21 Nsss
,,,, 21 ikii sss
)|(),,,|( 1121 ikikikiiik ssPssssP
)|( jiij ssPa )( ii sP
By Markov chain property, probability of state sequence can be
found by the formula:
Januray 18, 2015 6
Calculation of sequence probability
)()|()|()|(
),,,()|(
),,,(),,,|(),,,(
112211
1211
12112121
iiiikikikik
ikiiikik
ikiiikiiikikii
sPssPssPssP
sssPssP
sssPssssPsssP
Januray 18, 2015 7
Markov chain is similiar to finite state automata with probabilities of transitioning from one state to another.
Transition from state to state at discrete time intervals.
The model can be in one state at any given time.
S1 S5 S2 S3 S4
0.5
0.5 0.3
0.7
0.1
0.9 0.8
0.2
Elements of Markov Model (Chain)
Januray 18, 2015 8
• clock t = {1, 2, 3, … T}
• N states Q = {1, 2, 3, … N} the single state j at time t is referred to as qt
• N events E = {e1, e2, e3, …, eN}
• initial probabilities πj = P[q1 = j] 1 j N
• transition probabilities aij = P[qt = j | qt-1 = i] 1 i, j N
Januray 18, 2015 9
• the (potentially) occupied state at time t is called qt
• a state can referred to by its index, e.g. qt = j
• 1 event corresponds to 1 state:
At each time t, the occupied state outputs (“emits”) its corresponding event.
• Markov model is generator of events.
• each event is discrete, has single output.
• in typical finite-state machine, actions occur at transitions, but in most Markov Models, actions occur at each state.
Januray 18, 2015 10
Transition Probabilities: no assumptions (full probabilistic description of system): P[qt = j | qt-1= i, qt-2= k, … , q1=m]
usually use first-order Markov Model:
P[qt = j | qt-1= i] = aij
first-order assumption: transition probabilities depend only on previous state (and time)
aij obeys usual rules:
sum of probabilities leaving a state = 1
(must leave a state)
1
0
1
ij
N
ij
j
a i, j
a i
Januray 18, 2015 11
Initial Probabilities:
• probabilities of starting in each state at time 1
• denoted by πj
• πj = P[q1 = j] 1 j N
11
N
j
j
3-State Markov Model of the Weather
S1 S2
0.25
0.4
0.7 0.5
S3
0.2
0.05
0.7
0.1
0.1
Januray 18, 2015 12
S1 = event1 = rain S2 = event2 = clouds A = {aij} = S3 = event3 = sun
What is probability of {rain, rain, rain, clouds, sun, clouds, rain}? Obs. = {r, r, r, c, s, c, r} S = {S1, S1, S1, S2, S3, S2, S1} time = {1, 2, 3, 4, 5, 6, 7} (days)
= P[S1] P[S1|S1] P[S1|S1] P[S2|S1] P[S3|S2] P[S2|S3] P[S1|S2]
= 0.5 · 0.7 · 0.7 · 0.25 · 0.1 · 0.7 · 0.4
= 0.001715
10.70.20.
10.50.40.
05.25.70. π1 = 0.5
π2 = 0.4
π3 = 0.1
3-State Markov Model of The Weather (cont.)
Januray 18, 2015 13
Januray 18, 2015 14
Hidden Markov Model
A hidden Markov model (HMM) is a statistical Markov
model in which the system being modeled is assumed to be a Markov
process with unobserved (hidden) states. It is a probabbilistic model
for sequential or temporal data.
Markov process is a random process usually characterized
as memoryless: the next state depends only on the current state and
not on the sequence of events that preceded it. Each state in the
system emits an observable output.
Januray 18, 2015 15
HMM is doubly embedded stochastic process with an underlying
stoachastic process that is not observable (it is hidden), but can only
be observed through another set of stochastic process that produce
the sequence of observations (Rabbiner, 1989) .
L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech
Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989.
HMM
Januray 18, 2015 16
• more than 1 event associated with each state.
• all events have some probability of emitting at each state.
• given a sequence of outputs, we can’t determine exactly the state sequence.
• We can compute the probabilities of different state sequences given an output sequence.
Doubly stochastic (probabilities of both emitting events and
transitioning between states); exact state sequence is
“hidden.”
HMM
Januray 18, 2015 17
The observation is turned to be a probabilistic function (discrete
or continuous) of a state instead of an one-to-one correspondence
of a state
• Each state randomly generates one of M observations (or visible
states)
HMM Example
Januray 18, 2015 18
HMM Generation Process
Januray 18, 2015 19
Hidden Information
Januray 18, 2015 20
Elements of HMM
Januray 18, 2015 21
• clock t = {1, 2, 3, … T}
• N states Q = {1, 2, 3, … N}
• M events E = {e1, e2, e3, …, eM}
• initial probabilities πj = P[q1 = j] 1 j N
• transition probabilities aij = P[qt = j | qt-1 = i] 1 i, j N
• observation probabilities bj(k)=P[ot = ek | qt = j] 1 k M
bj(ot)=P[ot = ek | qt = j] 1 k M
• A = matrix of aij values, B = set of observation probabilities,
π = vector of πj values.
Entire Model: = (A,B,π)
Basic Problems for HMM
Januray 18, 2015 22
1. Given , how to compute P(O|) observing sequence O = O1O2…OT . Evaluation problem
2. Given observation sequence O = O1O2…OT and , how to choose state sequence Q = q1q2…qt.
P(O,Q|) ? Most probable path decoding
3. How to estimate =(A,B,) so as to maximize
P(O| )? Parameter Reestimation
Baum-Welch (Expectation maximization)
Problem 1: P(O|) ?
Januray 18, 2015 23
What is probability of a set of observations and a specific state sequence, given an
HMM?
)|(),|()|,(
)|(
)()()(),|(
),|(),|(
)(
)(
132211
21 21
1
21
21
qqOqO
q
oooqO
oqO
q
oooO
PPP
aaaP
bbbP
qPP
qqq
TT
T
qqqqqqq
Tqqq
T
t
tt
T
T
The joint probability of O and q simultaneously is
(assume independence of observations)
Januray 18, 2015 24
T
TTT
qqq
Tqqqqqqqq bababP
PPP
PPP
21
122111
,
21
all
)()()()|(
)|(),|()|(
)|(),|()|,(
oooO
qqOO
qqOqO
q
This requires O(2T·NT) calculations! Impractical for large values of T (long utterances) Solution: use an inductive procedure (similar to Viterbi search) called the Forward Procedure
Forward Procedure
Januray 18, 2015 25
Span a lattice of N states and T times
Keep the sum of probabilities of all the paths comming to each
state i at time t
Januray 18, 2015 26
Forward Probability:
• Define variable which has meaning of “the probability of observations o1 through ot and being in state i at time t, given our HMM”
)|,()( 21 iqPi ttt ooo
Compute and P(O|) with the following procedure:
N
i
T
tj
N
i
ijtt
ii
iP
Nj
Ttbaij
Nibi
1
1
1
1
11
)()|(
1
11)()()(
1)()(
O
o
o
Induction:
Termination:
Backward Procedure
Januray 18, 2015 27
Idea
Span a lattice of N states and T times
Keep the sum of probabilities of all the outgoing paths at each
state i time t
Januray 18, 2015 28
Backward Procedure:
• Define variable which has meaning of “the probability of observations oT through ot+1, given that we’re in state i at time t, and given our HMM”
),|()( 21 iqPi tTttt ooo
Compute with the following procedure:
Ni
TTtjbai
N
j
ttjijt
1
1,2,1)()()(
1
11
o
NiiT 11)(
Problem 2: Most probable path decoding
Januray 18, 2015 29
Given model
Given observation sequence O = O1O2…OT
P(O,Q|) ?
Q*=argmaxQ P(O,Q|)=argmaxQ P(O|Q,) P(Q|)
Viterbi Algorithm
Januray 18, 2015 30
Analysis for internal processing result
The best, the most likely state sequence
Internal segmentation
Best sequence defined as:
]|...,,...[,...,
max)( 21121
121
tttt
t iqqqqPqqq
i ooo
best score along a single path, up to time t ending in state i
Viterbi Algorithm
Januray 18, 2015 31
(1) Initialization:
(2) Recursion:
0)(
1)()(
1
11
i
Nibi ii
o
Nj
Ttai
Nij
Nj
Ttbai
Nij
ijtt
tjijtt
1
2)(
1
maxarg)(
1
2)()(
1
max)(
1
1
o
Viterbi Algorithm
Januray 18, 2015 32
(3) Termination:
(4) Backtracking:
)(1
maxarg
)(1
max
*
*
iNi
q
iNi
P
TT
T
1,...,2,1 *
11
* TTtqq ttt
Usually this algorithm is done in log domain, to avoid underflow errors.
Problem 3: Parameter Reestimation
Januray 18, 2015 33
No optimal way to do this, so find local maximum
Baum-Welch algorithm (EM)
-iterative procedure that locally maximes P(O|).
-convergence proven
EM Algorithm for Training
Januray 18, 2015 34
Januray 18, 2015 35
Januray 18, 2015 36
Januray 18, 2015 37
Types of HMM
Januray 18, 2015 38
Discrete HMM (DHMM)
Condtinuous Density HMM (CDHMM)
Discrete HMM operate on quantized data or symbols.
CDHMMs operate on continuous data. Common distribution functions:
Gaussian, Poisson, and mixture of Gaussian.
Features = observations = data points = ot
features are vectors of real numbers:
Januray 18, 2015 39
Vector Quantization
For HMMs, compute the probability that observation ot is generated by each state j. Here, there are two states, red and blue:
feature value 1 for state j
feat
ure
valu
e 2
for
stat
e j
• bj(k) = number of vectors with codebook index k in state j number of vectors in state j
Januray 18, 2015 40
• One way of creating such a smooth model is to use a mixture of Gaussian probability density functions (pdf).
• The detail of the model is related to the number of Gaussian components
• This Gaussian Mixture Model (GMM) is characterized by (a) the number of components, (b) the mean and standard deviation of each component, (c) the weight (height) of each component
Continuous Probability Distribution
p(x)
x
Januray 18, 2015 41
• Typical HMMs for speech are continuous-density HMMs • Use Gaussian Mixture Models (GMMs) to estimate “probability” of “emitting” each observation ot given the speech category (state).
Gaussian Mixture Models
feature value = ot
“pro
babi
lity
”
• Features observations, “probability” of feature = bj(ot)
Januray 18, 2015 42
Gaussian Mixture Models
M
k
jkjktjktj Νcb1
),;()( oo
Equations for GMMs:
22 2/)(e
2
1),;( jkjkto
jk
jkjktoN
(b) multi-dimensional case: n is dimension of feature vector becomes vector , becomes covariance matrix .
)()(2
1 1
e||)2(
1),;(
jktjkt
jk
njkjktN
μoμo
μo
assume is diagonal matrix:
n
i
ii
1
2||
211
1 0 0
0 0
0 0
-1 = 222
1
233
1
T=transpose,
not end time
number of mixture components;
different from number of events
mixture weights (a) single-dimensional case:
Januray 18, 2015 43
Gaussian Mixture Models
Comparing continuous (GMM) and discrete (VQ) HMMs:
• Continuous HMMs: assume independence of features for diagonal matrix require large number of components to represent arbitrary function large number of parameters = relatively slow, can’t always train well
• Discrete HMMs: quantization errors at boundaries relies on how well VQ partitions the space sometimes problems estimating probabilities when unusual input vector not seen in training
HMM Topologies
Januray 18, 2015 44
• Ergodic (fully-connected)
• Bakis (left-to-right)
0.3
0.3
0.6 0.2
0.1
0.1
0.6
0.5
0.3
S1 S2
S3
1 = 0.4 2 = 0.2 3 = 0.4
0.3
0.6 0.4
S1 S2
1 = 1.0 2 = 0.0 3 = 0.0 4 = 0.0
S3
0.1
0.4 S4
1.0
0.9
0.1 0.2
Januray 18, 2015 45
HMM Topologies
• Many varieties are possible:
• Topology defined by the state transition matrix (If an element of this matrix is zero, there is no transition between those two states).
0.3 0.7
S1
0.4
S2 1 = 0.5 2 = 0.0 3 = 0.0 4 = 0.5 5 = 0.0 6 = 0.0
S6
1.0 0.3
0.6
S4
0.2
S3
0.3
0.2
S5
0.3
0.5
0.4
0.8
a11 a12 a13 0 0.0 a22 a23 a24
0.0 0.0 a33 a34
0.0 0.0 0.0 a44
A =
Januray 18, 2015 46
HMM Topologies
• The topology must be specified in advance by the system designer
• Common use in speech is to have one HMM per phoneme, and three states per phoneme. Then, the phoneme-level HMMs can be connected to form word-level HMMs
1 = 1.0 2 = 0.0 3 = 0.0
0.4
0.6 0.3
A1 A2 A3
0.5
0.7
0.5
0.5 0.2
B1 B2 B3
0.3
0.8 0.4
0.6 0.3
A1 A2 A3
0.5
0.7 0.4
0.6 0.2
T1 T2 T3
4.0
0.8
0.5
0.7 0.5 0.6
Some HMM Applications
Januray 18, 2015 47
Stock market
Speech recognition
Digital signal processing
Bioinformatics
Hocaoğlu, F.O. 2011. Stochastic approach for daliy solar
radiation modeling. Solar Energy,85, 278-287.
Januray 18, 2015 48
Stochastic approach for daliy solar
radiation modeling
Januray 18, 2015 49
Hidden Markov Model (HMM) has also been used for the prediction of solar radiation (Hocaoğlu, 2011). In the study, the temperature is taken as the observation sequence while the solar radiation is assumed as the hidden states and the Viterbi algorithm is used as a tool for the prediction of solar radiation.
Solar Radiation Data
Hourly measured solar radiation and temperature data from İzmir and Antalya regions in 2005.
The data are obtained from Turkish State Meteorological Service (DMI) for the year, 2005
Preprocessing
Januray 18, 2015 50
Interpolation are applied if solar radiation data is bigger than extraterrestrial radiation.
Interpolation is used for estimating missing temperature and solar radiation.
the hourly measured data are converted to daily data by taking daily averages of the hourly data.
Scaling in Viterbi Algorithm
For long observation sequence, the score value in Viterbi tends to go
zero. This problem is overcommed by scaling the probabilities.
1
1
( ).
( ).
t ij
t iji
j a
j amax
Januray 18, 2015 51
The temperature values are taken as the observation process
that affects the hidden process of solar radiation.
Both solar and temperature data are quantized and different
HMM are built.
After the model is constructed, the temperatur data is
applied to the model and daily solar radiations are predicted.
In an HMM at any time t, the observation state of the model
is assumed to be known but the state of the system is taken as
unknown variable to be estimated.
The aim of the study is to find probable solar radiation state
sequence when a temperature sequence is observed.
Januray 18, 2015 52
Daily solar radiation measured data from (a) Antalya, and (b) İzmir
HMM Approach for solar radiation modeling
Januray 18, 2015 53
Hidden States Solar Radiation
Observation sequence Temperature values
Discrete HMM is used.
Solar radiation and temperature data are quantized.
Quantization Step
Q(1) : minimum value of radiation or temperature
R: measured value of radiation or temperature
DSN: Desired number of state number for solar radiation or desired number temperature data.
max( R ) min( R )
DSN
Januray 18, 2015 54
Observations are known at each state.
State of the system is hidden. Most probable sequence is
determined by Viterbi algorithm.
Januray 18, 2015 55
Daily Solar Radiation Measured
Antalya Region (2005)
Daily Temperature Measured
Januray 18, 2015 56
Daily Temperature Measured
İzmir Region (2005)
Daily Solar Radiation Measured
Januray 18, 2015 57
Using modified Viterbi algorithm, states of solar radiations
are obtained from each model.
These states are then remapped to their original numerical
values using inverse quantization.
The performances of the models are compared using mean
absolute percentage error (MAPE), mean absolute bias error
(MABE), root mean squared error (RMSE) and correlation
coefficient (r) criteria.
Januray 18, 2015 58
HMMs used in the study
HMM-1 to HMM-5
The number of event state is set to 20 and the quantization level for temperature data is
increased from 20 to 40 by 5.
HM-6 to HMM-10
The number of state increased from 25 to 45 by 5. The number of quantization level for
temperature is set to 25.
Januray 18, 2015 59
Results
Januray 18, 2015 60
The model accuracy is improved by increasing the observations
Large number of observation is generally advised in this study.
It is also observed that increase in the number of states has
positive effects on the model accuracy.
(a) Antalya (b) İzmir
Model Generated and measured data for (a) Antalya, (b) İzmir
Januray 18, 2015 61
Januray 18, 2015 62
The prediction error plots obtained from HMM-10 for Antalya and Izmir data are shown below.
HMM-10 Number of states for solar radiation: 45
Number of levels for temperature : 25
Antalya İzmir
Januray 18, 2015 63
The best results with minimum error are obtained from HMM-
10 for both regions.
Robustness of the proposed HMM-10 is tested with data
recorded from two different cities of Turkey: Kayseri and
Konya.
Solar radiation is modeled with HMM.
Most probable states are obtained from the temperature data(
event).
Januray 18, 2015 64
The proposed approach models the data with a reasonable
accuracy.
The proposed model can easily be used to model solar
radiation data obtained from any region in the world by
changing its parameters.
Januray 18, 2015 65
Another study that uses HMM to cluster the data vectors
according to thier shapes
Saurabh Bhardwaj, Vikrant Sharma, Smriti Srivastava, O.S.
Sastry, B. Bandyopadhyay, S.S. Chandel, J.R.P. Gupta,
Estimation of solar radiation using a combination of
Hidden Markov Model and generalized Fuzzy model,
Solar Energy, Volume 93, July 2013, Pages 43-54.
References
Januray 18, 2015 66
[1] L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications
in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989.
[2] Hocaoğlu, F.O. 2011. Stochastic approach for daliy solar radiation modeling.
Solar Energy,85, 278-287.
[3] Hosom, J.P. 2010, Speech Recognition with Hidden Markov Models, Lecture
Slides.
[4] Cho, S.J. 2005. Introduction to Hidden Markov Model and Its Application,
Samsung Advaced Institude of Technology (SAIT).
[5] Liu, X.S. Hidden Markov Model, lecture slides.
Januray 18, 2015 67
THANK YOU