definition of univariate b-splines - uni-hamburg.de · definition of univariate b-splines the...

63
Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be different from each other (periodical model). Visually, the selection of k (the order of the B-splines) determines the following factors of the fuzzy sets for modeling the linguistic terms. Assume x is a general input variable of a control system that is defined on the universe of discourse [x 1 ,x m ]. Given a sequence of ordered parameters (knots): x 1 ,x 2 ,... , the ith B-spline N i,k of order k (degree k - 1) is recursively defined as follows: N i,k (x)= 1 for x [x i ,x i+1 ) 0 otherwise if k =1 x-x i x i+k-1 -x i N i,k-1 (x)+ x i+k -x x i+k -x i+1 N i+1,k-1 (x) if k> 1 (1) with i =1,...,m - k . Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 263

Upload: others

Post on 25-Jan-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Definition of univariate B-splines

The B-splines are employed to specify the linguistic terms, and knots are chosento be different from each other (periodical model). Visually, the selection of k (theorder of the B-splines) determines the following factors of the fuzzy sets formodeling the linguistic terms.

Assume x is a general input variable of a control system that is defined on theuniverse of discourse [x1, xm]. Given a sequence of ordered parameters (knots):x1, x2, . . . , the ith B-spline Ni,k of order k (degree k − 1) is recursively defined asfollows:

Ni,k(x) =

{

1 for x ∈ [xi, xi+1)0 otherwise

if k = 1

x−xixi+k−1−xi

Ni,k−1(x) + xi+k−x

xi+k−xi+1Ni+1,k−1(x) if k > 1

(1)

with i = 1, . . . ,m− k.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 263

Page 2: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Therefore, m knots xi(i = 1, . . . ,m) form l = m− k B-splines (Figure 1).

Abbildung 1: Nine B-splines of order 3 defined over 12 non-uniformly distributedknots.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 264

Page 3: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Examples of B-splines of order 1, 2, 3 and 4 with their knots are shown in Figure 2.

Abbildung 2: Nonuniform univariate B-splines of oder 1 to 4 defined on a parameterx.

In each interval [xj, xj+1], k non-zero B-splines overlap.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 265

Page 4: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

The example of order 3 (cubic B-splines) is shown in Figure 3.

Abbildung 3: Cubic B-splines [xj, xj+1] defined on a parameter x.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 266

Page 5: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Properties of B-Splines

Recursive definition is one basic feature of B-splines, which enables the generationof B-splines of arbitrary orders with the incremental smoothness for a given set ofknots. The other most important properties of B-splines, in respect to modelingand control are:

Partition of unity:∑l

i=0 Ni,k(x) = 1.

Positivity: Ni,k(x) ≥ 0 for all x.

Local support: Ni,k(x) = 0 for x /∈ [xi, xi+k].

Ck−2 continuity: If the knots {xi} are pairwise different fromeach other, then Ni,k(x) ∈ Ck−2, i.e., Ni,k(x)is (k − 2) times continuously differentiable.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 267

Page 6: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Lattice

Abbildung 4: The B-spline model – a two-dimensional illustration.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 268

Page 7: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Each n-dimensional rectangle (n > 1) of the lattice is covered by the jth

multivariate B-spline N jk(x) which is formed by taking the tensor product of n

univariate B-splines:

N jk(x) =

n∏j=1

N jij,kj

(xj) (2)

Therefore the shape of each B-spline, and thus the shape of multivariate ones(Figure 5), is implicitly set by their order and their given knot distribution on eachinput interval.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 269

Page 8: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

(a) Tensor product of two, order 2univariate B-splines.

(b) Tensor product of one order3 and one order 2 univariate B-splines.

(c) Tensor product of two univa-riate B-splines of order 3.

Abbildung 5: Bivariate B-splines formed by taking the tensor product of twounivariate B-splines.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 270

Page 9: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Fuzzy-Controller eines MISO-Systems - I

Conditions of B-spline Fuzzy Controllers:

• periodical B-spline basis functions as membership functions for inputs,

• fuzzy singletons as membership functions for outputs,

• “product” as fuzzy conjunctions,

• “centroid” as defuzzification method,

• addition of “virtual linguistic terms” at both ends of each input variable and

• extension of the rule base for the “virtual linguistic terms” by copying theoutput values of the “nearest” neighbourhood.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 271

Page 10: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

B-Spline-Fuzzy-Controller eines MISO-Systems - II

A MISO system with n inputs x1, x2, . . . , xn, rules with the n conjunctive terms in the premise

are given in the following form:

{Rule(i1, i2, . . . , in): IF (x1 is N1i1,k1

) and (x2 is N2i2,k2

) and . . . and (xn is Nnin,kn

) THEN y

is Yi1i2...in},

where

• xj: the j-th input (j = 1, . . . , n),

• kj: the order of the B-spline basis functions used for xj,

• N jij,kj

: the i-th linguistic term of xj defined by B-spline basis functions,

• ij = 0, . . . ,mj, representing how fine the j-th input is fuzzy partitioned,

• Yi1i2...in: the control vertex (deBoor points) of Rule(i1, i2, . . . , in).

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 272

Page 11: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Fuzzy-Controller eines MISO-Systems - III

The output y of a MISO fuzzy controller is:

y =

∑m1i1=0 . . .

∑mnin=0(Yi1,...,in

∏nj=1 N j

ij,kj(xj))∑m1

i1=0 . . .∑mn

in=0

∏nj=1 N j

ij,kj(xj)

(3)

=

m1∑i1=0

. . .

mn∑in=0

(Yi1,...,in

n∏j=1

Njij,kj

(xj)) (4)

This is called a general NUBS hypersurface, which possesses the followingproperties:

• If the B-functions of order k1, k2, . . . , kn are employed to specify the linguistic terms of the

input variables x1, x2, . . . , xn, it can be guaranteed that the output variable y is (kj − 2)

times continuously differentiable in respect to the input variable xj, j = 1, . . . , n.

• If the input space is partitioned fine enough and at the correct positions, the interpolation with

the B-spline hypersurface can reach a given precision.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 273

Page 12: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

B-spline Type: SISO Systems

A SISO system with B-functions of order 2 (Xi(x): firing strength of rule i; yi: thecontribution of rule i to the output).

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 274

Page 13: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 275

Page 14: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

MISO Systems - A 2D Example

An example with two input variables (x and y) and an output z. The controlvertices of the output are Z1, Z2, Z3, Z4.

The linguistic terms of the inputs:

The linguistic terms of the output:

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 276

Page 15: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

A 2D Example - The Rule Base

The rule base consists of four rules:Rule

1) IF x is X1 and y is Y1 THEN z is Z1

2) IF x is X1 and y is Y2 THEN z is Z2

3) IF x is X2 and y is Y1 THEN z is Z3

4) IF x is X2 and y is Y2 THEN z is Z4

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 277

Page 16: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

A 2D Example - Inference

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 278

Page 17: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 279

Page 18: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 280

Page 19: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 281

Page 20: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

A 2D Example - Defuzzification

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 282

Page 21: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Supervised Learning

Supervised learning assumes that a “teacher” provides the complete desiredsystem output for each input datum.

Based on the complete set of these input/output vectors, B-spline type fuzzycontrollers can be trained very rapidly.

Computing parameters of such a B-spline fuzzy system is divided into two steps:for the IF-part and for the THEN-part.

Considering the granularity of the input space and the maximal point distributionof the control space if known, the fuzzy sets can be generated using the recursivecomputation of B-spline basis functions.

The control vertices of the THEN parts can be automatically achieved through alearning procedure.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 283

Page 22: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Learning algorithm - I

Assume {(X, yd)} is a set of training data, where

• X = (x1, x2, . . . , xn) : the input data vector,

• yd : the desired output for X.

The squared error is computed as:

E =12(yr − yd)2, (5)

where yr is the current real output value during training.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 284

Page 23: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

The parameters to be found are Yi1,i2,...,in, which make the error in (5) as small aspossible, i.e.

E =12(yr − yd)2 ≡ MIN. (6)

Each control vertex Yi1,...,in can be modified by using the gradient descentmethod:

∆Yi1,...,in = −ε∂E

∂Yi1,...,in

(7)

= ε(yr − yd)n∏

j=1

N jij,kj

(xj) (8)

where 0 < ε ≤ 1.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 285

Page 24: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

The gradient descent method guarantees that the learning algorithm converges tothe global minimum of the error function because the second partial differentiationin respect to the quadratic error function Yi1,i2,...,in is constant:

∂2E

∂2Yi1,...,in

= (n∏

j=1

N jij,kj

(xj))2 ≥ 0. (9)

This means that the error function (5) is convex in the space Yi1,i2,...,in andtherefore possesses only one (global) minimum.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 286

Page 25: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Immediate learning by self-evaluation

A fuzzy system can learn under supervision.

Such a learning process needs a teacher, i.e. for each input vector, the desiredoutput should be known. Then the fuzzy controller attempts to interpolate theseinput/output vectors to provide a continuous (hyper-)surface for the whole controlspace.

In reality, it is not always simple to find the goal function of the output for acomplex system. An unsupervised learning approach should therefore bedeveloped.

Based on a B-spline fuzzy controller, the parameters to be learned are still mainlythe control vertices of the “THEN” part.

The key problem of unsupervised learning with such a model is then how tomodify the control vertices after each learning step, i.e. the change direction (+ or-) and the change magnitude.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 287

Page 26: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Inspiration by Supervised Learning

We first discuss a control system with (X1, X2, . . . , Xn) as input and Y as output.Let us rewrite the modification of the control vertices for supervised learning:

∆yi1,...,im = −ε∂E

∂yi1,...,im

= ε(yr − yd).m∏

j=1

Xij,kj(xj)

= sign(yr − yd) ε .|yr − yd|.m∏

j=1

Xij,kj(xj) (10)

sign(yr − yd) indicates the direction of the modification of yi1,...,im in eachlearning step, while the product ε · |yr − yd| ·

∏mj=1 Xij,kj

(xj) determines themagnitude of the modification.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 288

Page 27: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Evaluation Function - I

In unsupervised learning, it is usually possible to define an “evaluation function”.Such an evaluation function should describe how “good” the current system state((x1, x2, . . . , xn), y) is.

For each input vector, an output is generated. With this output, the systemtransits to another state. The new state is compared with the old one; anadaptation is performed if necessary.

Assume the evaluation function, denoted by V (·), possesses a bigger value for abetter state, i.e. for two states st and st+1, if st is better than st+1, thenV (st) ≥ V (st+1). The adaptation of the control vertices can be performed with asimilar representation as in supervised learning.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 289

Page 28: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Evaluation Function - II

Let us reconsider the modification of the control vertices through the equation(10). State st transits to st+1 by the output yr. The desired state is sd. Wereplace yr in (10) with V (st+1), yd with V (sd).

Assume two system states st and st+1, and st is better than st+1, i.e.V (st) ≥ V (st+1), where V (·) is the evaluation function.

We consider those systems, for which a function V (·) can be found which fulfillsthe following condition:

Assume st is the current state and y an arbitrary output. With y the systemtransits to the state st+1. If another output y′ fulfills y × y′ ≤ 0, and with y′

the system transits to s′t+1, the following relation of the evaluation functions isvalid:

( V (st+1) − V (st) )× ( V (s′t+1) − V (st) ) ≤ 0. (11)

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 290

Page 29: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Modifying Control Vertices in Reinforcement Learning - I

At the moment t the system has the state st. The ideal state of the moment t + 1would be sd.

With the controller output yr generated at the moment t, the system transits tothe state st+1.

Considering the state transition from st to st+1, the constellation of st, st+1 andsd:

(a) (b) (c)

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 291

Page 30: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Modifying Control Vertices in Reinforcement Learning - II

(a): The system state becomes worse, i.e. the system acts incorrectly. According tothe condition in (11) the change direction is −sign(y).

(b): The system acts in the correct direction. The value of the output should beenlarged. The change direction is then sign(y).

(c): This case is the inverse case of the case (b). The change direction should be−sign(y).

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 292

Page 31: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

These three cases can be synthesized by

S = sign(V (st)− V (st+1)) ∗ sign(V (st+1)− V (sd)) ∗ sign(y). (12)

The change of control vertices can finally be written as:

∆yi1,...,im = S . ε . |V (st+1)− V (sd)| .m∏

j=1

Xij,kj(xj). (13)

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 293

Page 32: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Learning of Cart-Pole Balancing

The pendulum possesses an initial state (θ, θ). To be solved is the force f to beexerted, which is able to bring the cart-pole system to the balanced final stateθ = 0 and θ = 0.

The inputs of the system are:

• angle: θ(◦) ∈ [−15,+15] and

• angle velocity: θ(◦/s) ∈ [−20,+20].

Each of the two input variables are covered with 7 B-spline basis functions oforder 3.

The output of the system is the force f to be exerted on the cart.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 294

Page 33: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

For learning we choose the evaluation function as:

V (st) = V (θ, θ)def= −|2 ∗ θ + θ|, and the relation of the evaluation functions of

the desired state sd and A: V (sd)def= 0.5 ∗ V (st).

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 295

Page 34: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

CP-Balancing: Control Surfaces

at the beginning: after 100 learning steps after 3000 learning steps

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 296

Page 35: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

CP-Balancing - Validation

The motion profiles of the pendulum from the starting state (θ=-10, θ=10):

angle:

angle velocity:

applied force:

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 297

Page 36: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 298

Page 37: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Inverses Pendel: I

Problem Balanciere Pendel P durch Steuerung des Motors M

Eingang: zwei Zustand-Variablen:

• Winkel θ;• Winkelgeschwindigkeit θ

als Differenz ∆θt = θt − θt−1

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 299

Page 38: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Ausgang: eine Steuer-Variable Motor-Storm→ Motor-Geschwindigkeit v

Quantisierung von drei linguistischen Variablen in jeweils sieben Fuzzy-Mengen(linguistischen Termen):

{NB, NM,NS,Z, PS, PM, PB}

Beispiel: Regel (NM, Z; PM)

Wenn der Winkel θ in seinem mittleren negativen Bereich istund die Winkelgeschwindigkeit θ ungefahr Null ist,

Dann sollte die Motor-Geschwindigkeit v in ihrem mittleren positiven Bereich sein.

Die Regelbasis in Tabellenform:

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 300

Page 39: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

θNB NM NS Z PS PM PB

NB PBNM PMNS PS NS

∆θ Z PB PM PS Z NS NM NBPS PS NSPM NMPB NB

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 301

Page 40: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Miniatur–Roboter KHEPERA

• Motorola 68331 Micro–Controller

• 128 KByte RAM, 128 KByte ROM

• Verbindung zur Außenwelt uber ein serielles Kabel

• 2 Schrittmotoren, 600 Schritte/Umdrehung, d.h. ein Schritt entspricht 1/12mm

• 8 Nahbereichssensoren (Infrarot), Siemens SFH900, Empfindlichkeit maximal5cm

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 302

Page 41: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

KHEPERA — Sensoren

Eingabe fur Regelung: IR Sensoren

0: SL85, 1: SL45, Mittelwert von 2 und 3: SLR0,

4: SR45, 5: SR85

Sensor Meßwerte gegen deren Distanz:

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 303

Page 42: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 304

Page 43: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Problem der Hindernisvermeidnung

Ausgabe: Geschwindigkeiten des linken und rechten Motors⇒ Robotergeschwindigkeit v, Steuerwinkel s

Ziel: Kollisionsvermeidung, d.h., moglichst“sanftes” Umfahren von Hindernissen

Struktur des Fuzzy-Reglers:

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 305

Page 44: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

ZF der Ein- und Ausgange

IR-Sensorwerte:

Robotergeschwindigkeit v:

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 306

Page 45: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Steuerwinkel s:

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 307

Page 46: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Die Regeln des Systems: I

Ausweichmanover im freien Raum beim Erkennen eines Hindernisses von rechts:

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 308

Page 47: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Fuzzy–Eingangsvariablen Ausgangsvar.

SL85 SL45 SLR0 SR45 SR85 speed steer

vl vl vl vl low high n

vl vl vl low low low nm

vl vl low low low low nb

vl low low low low low nb

vl vl vl vl high low nm

vl vl vl low high vl nb

vl vl low low high vl nb

vl vl vl high high vl nb

vl vl high high high vl nb

vl vl vl vl vh vl nb

vl vl vl low vh vl nb

vl vl vl high vh vl nb

vl vl low high vh vl nb

vl low high high vh vl nb

vl vl vl vh vh vl nb

vl vl low vh vh vl nb

vl vl vh vh vh vl nb

vl low vh vh vh vl nb

low high vh vh vh vl nb

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 309

Page 48: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Autonome mobile Roboter: 1

Ziel: Zielfahrt und Kollisionsvermeidung

Besonderheiten:

• Fuzzyfikation der Sensorsignale;

(b) Laser range finder

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 310

Page 49: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Autonome mobile Roboter: 2

• Fuzzy-Regeln fur die Realisierung von Verhaltensmustern (“behaviors”);

GO → SC 1 Regel

OP → SC 4 Regeln

GO → TC 3 Regeln

“Far” OP → TC 2 Regeln

“Near” OP → TC 2 Regeln

“Very close” OP → TC 3 Regeln

wobei SC (“speed control”) und TC (“turn control”) Funktionen von GO (“goal orientation”)

und OP (“obstacle proximity”) sind.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 311

Page 50: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Autonome mobile Roboter: 3

• Darstellung des Verhaltens “goal-tracking”:

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 312

Page 51: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

• On-Board-VLSI-Chip

→ Alle Regeln konnen in 30 µs verarbeitet werden.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 313

Page 52: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Reinforcement Learning

Der Roboter erhalt in jedem Regelungszyklus sowohl Sensordaten als auch einReinforcement-Signal, dann fuhrt er eine Aktion aus, welche seinen Zustandverandert.

Reinforcement Learning liegt zwischen uberwachtem Lernen und unuberwachtemLernen.

Der Roboteragent kann auch uber ein “delayed reinforcement” Signal lernen.Dabei wird auch eine Aktion des Roboteragenten belohnt, wenn sie nur indirektzum Ziel gefuhrt hat. Dies kann der Fall sein, wenn die entsprechende Aktionausgefuhrt werdenmußte, um weitere Aktionen in Richtung des Zielzustandesausfuhren zu konnen.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 314

Page 53: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Erwerb von Fertigkeiten eines Roboters

Skill acquisition: “Verbesserung mototischer oder kognitiver Fahigkeiten durchTraining. Lesen einer Anleitung stellt nur das initiale Wissen dar, das dannsukzessiv verbessert und verfeinert werden muss.”

(Carbonell et. al. 1983”)

Illustration des Reinforcement-Lernproblems:

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 315

Page 54: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Markov-Entscheidungsprozeß

(“Markov Decision Process” MDP)

Ein MDP ist gegeben durch

• Eine Menge S diskreter Zustande (states),

• Eine Menge A moglicher Handlungen (actions),

• eine Reward-Funktion rt = r(st, at),

• Eine Successor-Funktion st+1 = δ(st, at),

Die Funktion r und δ sind Teil der Umgebung und dem Agenten nicht notwendigbekannt.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 316

Page 55: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Graph zu einem Markov-Entscheidungsprozeß

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 317

Page 56: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Das Problem der unvollstandigen Zustandsinformation

.

Man spricht auch von verborgenen Zustanden (engl.: hidden states).

Beispiel fur unvollstandige Zustandsinformation:

a) b) c) d)

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 318

Page 57: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Ablauf des MDPs

Zu jedem Zeitschritt t durchlauft der Agent folgende Schritte:

1. Bestimme den aktuellen Zustand st.

2. Wahle eine Handlung at.

3. Fuhre at aus.

4. Erhalte Reward rt = r(st, at).

Die Umgebung geht als Reaktion auf at in einen neuen Zustand st+1 = δ(st, at)uber.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 319

Page 58: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Policy

Eine Funktionπ : S → A

wird Policy genannt.

Sie stellt eine Strategie dar, wie der Agent in einem bestimmten Zustand s eineHandlung a = π(s) auswahlt.

Die Aufgabe besteht darin, diese Funktion π zu lernen.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 320

Page 59: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Kumulativer Wert

Der kumulative WertV π(st)

ist die kumulierte Reward, die der Agent erzielt, wenn er von einem Zustand st

startet und einer Policy π folgt.

Es gibt unterschiedliche Definitionen fur V π(st), die zukunftige Rewards inunterschiedlicher Weise mit einbeziehen.

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 321

Page 60: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Definitionen fur V π(st)

• “Dicount cumulative reward”: V π(st) =∑∞

i=0 γirt+i

• “Finite horizon reward”: V π(st) =∑h

i=0 rt+i

• “Average reward”: V π(st) = limh→∞1h

∑hi=0 rt+i

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 322

Page 61: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Optimale Policy

Eine Policy, die V π(st) fur alle Zustande s maximiert, wird optimale Policy π∗

genannt:

π∗ ≡ arg maxπ

V π(s),∀s

Der kumulative Wert einer optimalen Policy wird auch mit V ∗(s) bezeichnet:

V ∗(s) ≡ V π∗(s)

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 323

Page 62: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Lernen der optimalen Policy

Aus der Definition von V π(st)

V π(st) =∞∑

i=0

γirt+i

folgt sofort fur π∗(s):

π∗(s) = arg maxa

[r(s, a) + γV ∗(δ(s, a))]

D.h.: Die optimale Policy kann erlernt werden, indem V ∗ gelernt wird, falls r undδ bekannt sind.

Aber dies ist oft nicht der Fall!

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 324

Page 63: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be

Modellbasiert oder modellfrei?

Modellbasiertes Reinforcement-Lernen:

z.B. mit dynamischer Programmierung.

Vergleich mit A*-Suche.

Anwendungsbeispiel: z.B. kollisionsfreie Bahnplanung unter bekannterUmgebungsdarstellung.

Modellfreies Reinforcement-Lernen:

r und δ sind unbekannt.

⇒: Q-Lernen

Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 325