rifat edi̇zkan resw-2015 redizkan 18 ocak_2

Dr. Rifat Edizkan

Eskişehir Osmangazi University

Januray 18, 2015 Renewable Energy Systems Winter School-2015 1

SOLAR RADIATION MODELING

USING HIDDEN MARKOV MODELS

Outline

Januray 18, 2015 2

Introduction

Discrete Markov Process

Hidden Markov Model

Three Problems

HMM Model for Daily Solar Radiation

Januray 18, 2015 3

Introduction Accurate modeling and forecating of solar and wind data is important for energy planning. Mathematical models for solar radiation are used to determine its stotastical properties and long term behavior.

Januray 18, 2015 4

Mathematical Models

Triogonometric model

Cosine wave correlation

Stochastic models

Parametic model using Neural networks (NN)

2D approach for solar radiation forecasting

Artificail Neural Network (ANN)

Hidden Markov Model

Machine Learning Algorithms

• Set of states:

• Process moves from one state to another generating a sequence

of states :

• Markov chain property: probability of each subsequent state

depends only on what was the previous state:

• To define Markov model, the following probabilities have to be

specified: transition probabilities and initial

probabilities

• The output of the process is the set of states at each instant of

time

Januray 18, 2015 5

Discrete Markov Models

},,,{ 21 Nsss

,,,, 21 ikii sss

)|(),,,|( 1121 ikikikiiik ssPssssP

)|( jiij ssPa )( ii sP

By Markov chain property, probability of state sequence can be

found by the formula:

Januray 18, 2015 6

Calculation of sequence probability

)()|()|()|(

),,,()|(

),,,(),,,|(),,,(

112211

1211

12112121

iiiikikikik

ikiiikik

ikiiikiiikikii

sPssPssPssP

sssPssP

sssPssssPsssP

Januray 18, 2015 7

Markov chain is similiar to finite state automata with probabilities of transitioning from one state to another.

Transition from state to state at discrete time intervals.

The model can be in one state at any given time.

S1 S5 S2 S3 S4

0.5

0.5 0.3

0.7

0.1

0.9 0.8

0.2

Elements of Markov Model (Chain)

Januray 18, 2015 8

• clock t = {1, 2, 3, … T}

• N states Q = {1, 2, 3, … N} the single state j at time t is referred to as qt

• N events E = {e1, e2, e3, …, eN}

• initial probabilities πj = P[q1 = j] 1 j N

• transition probabilities aij = P[qt = j | qt-1 = i] 1 i, j N

Januray 18, 2015 9

• the (potentially) occupied state at time t is called qt

• a state can referred to by its index, e.g. qt = j

• 1 event corresponds to 1 state:

At each time t, the occupied state outputs (“emits”) its corresponding event.

• Markov model is generator of events.

• each event is discrete, has single output.

• in typical finite-state machine, actions occur at transitions, but in most Markov Models, actions occur at each state.

Januray 18, 2015 10

Transition Probabilities: no assumptions (full probabilistic description of system): P[qt = j | qt-1= i, qt-2= k, … , q1=m]

usually use first-order Markov Model:

P[qt = j | qt-1= i] = aij

first-order assumption: transition probabilities depend only on previous state (and time)

aij obeys usual rules:

sum of probabilities leaving a state = 1

(must leave a state)

1

0

1

ij

N

ij

j

a i, j

a i

Januray 18, 2015 11

Initial Probabilities:

• probabilities of starting in each state at time 1

• denoted by πj

• πj = P[q1 = j] 1 j N

11

N

j

j

3-State Markov Model of the Weather

S1 S2

0.25

0.4

0.7 0.5

S3

0.2

0.05

0.7

0.1

0.1

Januray 18, 2015 12

S1 = event1 = rain S2 = event2 = clouds A = {aij} = S3 = event3 = sun

What is probability of {rain, rain, rain, clouds, sun, clouds, rain}? Obs. = {r, r, r, c, s, c, r} S = {S1, S1, S1, S2, S3, S2, S1} time = {1, 2, 3, 4, 5, 6, 7} (days)

= P[S1] P[S1|S1] P[S1|S1] P[S2|S1] P[S3|S2] P[S2|S3] P[S1|S2]

= 0.5 · 0.7 · 0.7 · 0.25 · 0.1 · 0.7 · 0.4

= 0.001715

10.70.20.

10.50.40.

05.25.70. π1 = 0.5

π2 = 0.4

π3 = 0.1

3-State Markov Model of The Weather (cont.)

Januray 18, 2015 13

Januray 18, 2015 14

Hidden Markov Model

A hidden Markov model (HMM) is a statistical Markov

model in which the system being modeled is assumed to be a Markov

process with unobserved (hidden) states. It is a probabbilistic model

for sequential or temporal data.

Markov process is a random process usually characterized

as memoryless: the next state depends only on the current state and

not on the sequence of events that preceded it. Each state in the

system emits an observable output.

Januray 18, 2015 15

HMM is doubly embedded stochastic process with an underlying

stoachastic process that is not observable (it is hidden), but can only

be observed through another set of stochastic process that produce

the sequence of observations (Rabbiner, 1989) .

L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech

Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989.

HMM

Januray 18, 2015 16

• more than 1 event associated with each state.

• all events have some probability of emitting at each state.

• given a sequence of outputs, we can’t determine exactly the state sequence.

• We can compute the probabilities of different state sequences given an output sequence.

Doubly stochastic (probabilities of both emitting events and

transitioning between states); exact state sequence is

“hidden.”

HMM

Januray 18, 2015 17

The observation is turned to be a probabilistic function (discrete

or continuous) of a state instead of an one-to-one correspondence

of a state

• Each state randomly generates one of M observations (or visible

states)

HMM Example

Januray 18, 2015 18

HMM Generation Process

Januray 18, 2015 19

Hidden Information

Januray 18, 2015 20

Elements of HMM

Januray 18, 2015 21

• clock t = {1, 2, 3, … T}

• N states Q = {1, 2, 3, … N}

• M events E = {e1, e2, e3, …, eM}

• initial probabilities πj = P[q1 = j] 1 j N

• transition probabilities aij = P[qt = j | qt-1 = i] 1 i, j N

• observation probabilities bj(k)=P[ot = ek | qt = j] 1 k M

bj(ot)=P[ot = ek | qt = j] 1 k M

• A = matrix of aij values, B = set of observation probabilities,

π = vector of πj values.

Entire Model: = (A,B,π)

Basic Problems for HMM

Januray 18, 2015 22

1. Given , how to compute P(O|) observing sequence O = O1O2…OT . Evaluation problem

2. Given observation sequence O = O1O2…OT and , how to choose state sequence Q = q1q2…qt.

P(O,Q|) ? Most probable path decoding

3. How to estimate =(A,B,) so as to maximize

P(O| )? Parameter Reestimation

Baum-Welch (Expectation maximization)

Problem 1: P(O|) ?

Januray 18, 2015 23

What is probability of a set of observations and a specific state sequence, given an

HMM?

)|(),|()|,(

)|(

)()()(),|(

),|(),|(

)(

)(

132211

21 21

1

21

21

qqOqO

q

oooqO

oqO

q

oooO

PPP

aaaP

bbbP

qPP

qqq

TT

T

qqqqqqq

Tqqq

T

t

tt

T

T

The joint probability of O and q simultaneously is

(assume independence of observations)

Januray 18, 2015 24

T

TTT

qqq

Tqqqqqqqq bababP

PPP

PPP

21

122111

,

21

all

)()()()|(

)|(),|()|(

)|(),|()|,(

oooO

qqOO

qqOqO

q

This requires O(2T·NT) calculations! Impractical for large values of T (long utterances) Solution: use an inductive procedure (similar to Viterbi search) called the Forward Procedure

Forward Procedure

Januray 18, 2015 25

Span a lattice of N states and T times

Keep the sum of probabilities of all the paths comming to each

state i at time t

Januray 18, 2015 26

Forward Probability:

• Define variable which has meaning of “the probability of observations o1 through ot and being in state i at time t, given our HMM”

)|,()( 21 iqPi ttt ooo

Compute and P(O|) with the following procedure:

N

i

T

tj

N

i

ijtt

ii

iP

Nj

Ttbaij

Nibi

1

1

1

1

11

)()|(

1

11)()()(

1)()(

O

o

o

Induction:

Termination:

Backward Procedure

Januray 18, 2015 27

Idea

Span a lattice of N states and T times

Keep the sum of probabilities of all the outgoing paths at each

state i time t

Januray 18, 2015 28

Backward Procedure:

• Define variable which has meaning of “the probability of observations oT through ot+1, given that we’re in state i at time t, and given our HMM”

),|()( 21 iqPi tTttt ooo

Compute with the following procedure:

Ni

TTtjbai

N

j

ttjijt

1

1,2,1)()()(

1

11

o

NiiT 11)(

Problem 2: Most probable path decoding

Januray 18, 2015 29

Given model

Given observation sequence O = O1O2…OT

P(O,Q|) ?

Q*=argmaxQ P(O,Q|)=argmaxQ P(O|Q,) P(Q|)

Viterbi Algorithm

Januray 18, 2015 30

Analysis for internal processing result

The best, the most likely state sequence

Internal segmentation

Best sequence defined as:

]|...,,...[,...,

max)( 21121

121

tttt

t iqqqqPqqq

i ooo

best score along a single path, up to time t ending in state i

Viterbi Algorithm

Januray 18, 2015 31

(1) Initialization:

(2) Recursion:

0)(

1)()(

1

11

i

Nibi ii

o

Nj

Ttai

Nij

Nj

Ttbai

Nij

ijtt

tjijtt

1

2)(

1

maxarg)(

1

2)()(

1

max)(

1

1

o

Viterbi Algorithm

Januray 18, 2015 32

(3) Termination:

(4) Backtracking:

)(1

maxarg

)(1

max

*

*

iNi

q

iNi

P

TT

T

1,...,2,1 *

11

* TTtqq ttt

Usually this algorithm is done in log domain, to avoid underflow errors.

Problem 3: Parameter Reestimation

Januray 18, 2015 33

No optimal way to do this, so find local maximum

Baum-Welch algorithm (EM)

-iterative procedure that locally maximes P(O|).

-convergence proven

EM Algorithm for Training

Januray 18, 2015 34

Januray 18, 2015 35

Januray 18, 2015 36

Januray 18, 2015 37

Types of HMM

Januray 18, 2015 38

Discrete HMM (DHMM)

Condtinuous Density HMM (CDHMM)

Discrete HMM operate on quantized data or symbols.

CDHMMs operate on continuous data. Common distribution functions:

Gaussian, Poisson, and mixture of Gaussian.

Features = observations = data points = ot

features are vectors of real numbers:

Januray 18, 2015 39

Vector Quantization

For HMMs, compute the probability that observation ot is generated by each state j. Here, there are two states, red and blue:

feature value 1 for state j

feat

ure

valu

e 2

for

stat

e j

• bj(k) = number of vectors with codebook index k in state j number of vectors in state j

Januray 18, 2015 40

• One way of creating such a smooth model is to use a mixture of Gaussian probability density functions (pdf).

• The detail of the model is related to the number of Gaussian components

• This Gaussian Mixture Model (GMM) is characterized by (a) the number of components, (b) the mean and standard deviation of each component, (c) the weight (height) of each component

Continuous Probability Distribution

p(x)

x

Januray 18, 2015 41

• Typical HMMs for speech are continuous-density HMMs • Use Gaussian Mixture Models (GMMs) to estimate “probability” of “emitting” each observation ot given the speech category (state).

Gaussian Mixture Models

feature value = ot

“pro

babi

lity

”

• Features observations, “probability” of feature = bj(ot)

Januray 18, 2015 42


M

k

jkjktjktj Νcb1

),;()( oo

Equations for GMMs:

22 2/)(e

2

1),;( jkjkto

jk

jkjktoN

(b) multi-dimensional case: n is dimension of feature vector becomes vector , becomes covariance matrix .

)()(2

1 1

e||)2(

1),;(

jktjkt

jk

njkjktN

μoμo

μo

assume is diagonal matrix:

n

i

ii

1

2||

211

1 0 0

0 0

0 0

-1 = 222

1

233

1

T=transpose,

not end time

number of mixture components;

different from number of events

mixture weights (a) single-dimensional case:

Januray 18, 2015 43


Comparing continuous (GMM) and discrete (VQ) HMMs:

• Continuous HMMs: assume independence of features for diagonal matrix require large number of components to represent arbitrary function large number of parameters = relatively slow, can’t always train well

• Discrete HMMs: quantization errors at boundaries relies on how well VQ partitions the space sometimes problems estimating probabilities when unusual input vector not seen in training

HMM Topologies

Januray 18, 2015 44

• Ergodic (fully-connected)

• Bakis (left-to-right)

0.3

0.3

0.6 0.2

0.1

0.1

0.6

0.5

0.3

S1 S2

S3

1 = 0.4 2 = 0.2 3 = 0.4

0.3

0.6 0.4

S1 S2

1 = 1.0 2 = 0.0 3 = 0.0 4 = 0.0

S3

0.1

0.4 S4

1.0

0.9

0.1 0.2

Januray 18, 2015 45

HMM Topologies

• Many varieties are possible:

• Topology defined by the state transition matrix (If an element of this matrix is zero, there is no transition between those two states).

0.3 0.7

S1

0.4

S2 1 = 0.5 2 = 0.0 3 = 0.0 4 = 0.5 5 = 0.0 6 = 0.0

S6

1.0 0.3

0.6

S4

0.2

S3

0.3

0.2

S5

0.3

0.5

0.4

0.8

a11 a12 a13 0 0.0 a22 a23 a24

0.0 0.0 a33 a34

0.0 0.0 0.0 a44

A =

Januray 18, 2015 46

HMM Topologies

• The topology must be specified in advance by the system designer

• Common use in speech is to have one HMM per phoneme, and three states per phoneme. Then, the phoneme-level HMMs can be connected to form word-level HMMs

1 = 1.0 2 = 0.0 3 = 0.0

0.4

0.6 0.3

A1 A2 A3

0.5

0.7

0.5

0.5 0.2

B1 B2 B3

0.3

0.8 0.4

0.6 0.3

A1 A2 A3

0.5

0.7 0.4

0.6 0.2

T1 T2 T3

4.0

0.8

0.5

0.7 0.5 0.6

Some HMM Applications

Januray 18, 2015 47

Stock market

Speech recognition

Digital signal processing

Bioinformatics

Hocaoğlu, F.O. 2011. Stochastic approach for daliy solar

radiation modeling. Solar Energy,85, 278-287.

Januray 18, 2015 48

Stochastic approach for daliy solar

radiation modeling

Januray 18, 2015 49

Hidden Markov Model (HMM) has also been used for the prediction of solar radiation (Hocaoğlu, 2011). In the study, the temperature is taken as the observation sequence while the solar radiation is assumed as the hidden states and the Viterbi algorithm is used as a tool for the prediction of solar radiation.

Solar Radiation Data

Hourly measured solar radiation and temperature data from İzmir and Antalya regions in 2005.

The data are obtained from Turkish State Meteorological Service (DMI) for the year, 2005

Preprocessing

Januray 18, 2015 50

Interpolation are applied if solar radiation data is bigger than extraterrestrial radiation.

Interpolation is used for estimating missing temperature and solar radiation.

the hourly measured data are converted to daily data by taking daily averages of the hourly data.

Scaling in Viterbi Algorithm

For long observation sequence, the score value in Viterbi tends to go

zero. This problem is overcommed by scaling the probabilities.

1

1

( ).

( ).

t ij

t iji

j a

j amax

Januray 18, 2015 51

The temperature values are taken as the observation process

that affects the hidden process of solar radiation.

Both solar and temperature data are quantized and different

HMM are built.

After the model is constructed, the temperatur data is

applied to the model and daily solar radiations are predicted.

In an HMM at any time t, the observation state of the model

is assumed to be known but the state of the system is taken as

unknown variable to be estimated.

The aim of the study is to find probable solar radiation state

sequence when a temperature sequence is observed.

Januray 18, 2015 52

Daily solar radiation measured data from (a) Antalya, and (b) İzmir

HMM Approach for solar radiation modeling

Januray 18, 2015 53

Hidden States Solar Radiation

Observation sequence Temperature values

Discrete HMM is used.

Solar radiation and temperature data are quantized.

Quantization Step

Q(1) : minimum value of radiation or temperature

R: measured value of radiation or temperature

DSN: Desired number of state number for solar radiation or desired number temperature data.

max( R ) min( R )

DSN

Januray 18, 2015 54

Observations are known at each state.

State of the system is hidden. Most probable sequence is

determined by Viterbi algorithm.

Januray 18, 2015 55

Daily Solar Radiation Measured

Antalya Region (2005)

Daily Temperature Measured

Januray 18, 2015 56

Daily Temperature Measured

İzmir Region (2005)

Daily Solar Radiation Measured

Januray 18, 2015 57

Using modified Viterbi algorithm, states of solar radiations

are obtained from each model.

These states are then remapped to their original numerical

values using inverse quantization.

The performances of the models are compared using mean

absolute percentage error (MAPE), mean absolute bias error

(MABE), root mean squared error (RMSE) and correlation

coefficient (r) criteria.

Januray 18, 2015 58

HMMs used in the study

HMM-1 to HMM-5

The number of event state is set to 20 and the quantization level for temperature data is

increased from 20 to 40 by 5.

HM-6 to HMM-10

The number of state increased from 25 to 45 by 5. The number of quantization level for

temperature is set to 25.

Januray 18, 2015 59

Results

Januray 18, 2015 60

The model accuracy is improved by increasing the observations

Large number of observation is generally advised in this study.

It is also observed that increase in the number of states has

positive effects on the model accuracy.

(a) Antalya (b) İzmir

Model Generated and measured data for (a) Antalya, (b) İzmir

Januray 18, 2015 61

Januray 18, 2015 62

The prediction error plots obtained from HMM-10 for Antalya and Izmir data are shown below.

HMM-10 Number of states for solar radiation: 45

Number of levels for temperature : 25

Antalya İzmir

Januray 18, 2015 63

The best results with minimum error are obtained from HMM-

10 for both regions.

Robustness of the proposed HMM-10 is tested with data

recorded from two different cities of Turkey: Kayseri and

Konya.

Solar radiation is modeled with HMM.

Most probable states are obtained from the temperature data(

event).

Januray 18, 2015 64

The proposed approach models the data with a reasonable

accuracy.

The proposed model can easily be used to model solar

radiation data obtained from any region in the world by

changing its parameters.

Januray 18, 2015 65

Another study that uses HMM to cluster the data vectors

according to thier shapes

Saurabh Bhardwaj, Vikrant Sharma, Smriti Srivastava, O.S.

Sastry, B. Bandyopadhyay, S.S. Chandel, J.R.P. Gupta,

Estimation of solar radiation using a combination of

Hidden Markov Model and generalized Fuzzy model,

Solar Energy, Volume 93, July 2013, Pages 43-54.

References

Januray 18, 2015 66

[1] L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications

in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989.

[2] Hocaoğlu, F.O. 2011. Stochastic approach for daliy solar radiation modeling.

Solar Energy,85, 278-287.

[3] Hosom, J.P. 2010, Speech Recognition with Hidden Markov Models, Lecture

Slides.

[4] Cho, S.J. 2005. Introduction to Hidden Markov Model and Its Application,

Samsung Advaced Institude of Technology (SAIT).

[5] Liu, X.S. Hidden Markov Model, lecture slides.

Januray 18, 2015 67

THANK YOU

rifat edi̇zkan resw-2015 redizkan 18 ocak_2

Documents