ol strategies learning human contr hcs model

16
Learning human control strategies system state human control environment uk () z 1 z 1 z 1 z 1 z 1 uk 1 ( ) uk n u 1 + ( ) xk () xk 1 ( ) xk n x 1 + ( ) zk () uk 1 + ( ) HCS Model environment model control system state

Upload: others

Post on 07-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Lear

ning

hum

an c

ontr

ol s

trat

egie

s

syst

em s

tate

hum

an c

ontr

ol

envi

ronm

ent

uk

()

z

1

z

1

z

1

z

1

z

1

uk

1

()

uk

n

u

1

+

()

xk

()

xk

1

()

xk

n

x

1

+

()

zk

()

uk

1

+

()

HCS Model

envi

ronm

ent

mod

el c

ontr

olsy

stem

sta

te

Human control strategy

Dynamic, reaction

skill, with reasonably well-defined I/O representation

• Control gains

• Control structure/approach

• Interaction with environment/surrounding

• No abstract or high-level reasoning

• Example: human driving

0 50 100 150 200 250 300-8000

-6000

-4000

-2000

0

2000

4000

0 50 100 150 200 250 300

App

lied

forc

e (N

)

Time (sec)Time (sec)

Stan Oliver

Learning human control strategy

• Difficult to model human control strategy

analytically

.

• Unknown individual controller properties

• Structure

• Order

• Human control strategy modeling complications

• Dynamic

• Stochastic — errors and inconsistencies

• Possibly discontinuous

• Possibly nonlinear

• Continuous learning approach: neural networks

HC

S m

odel

exa

mpl

e:

Ck

time

time

velocity lateral offset steering applied force

Gro

ucho

’s c

ontr

olC

k m

odel

con

trol

010

020

030

040

050

060

0405060708090

010

020

030

040

050

060

0

-4-2024

010

020

030

040

050

060

0-0

.2

-0.10

0.1

0.2

010

020

030

040

050

060

0-8

000

-600

0

-400

0

-200

00

2000

4000

010

020

030

040

050

060

0405060708090

010

020

030

040

050

060

0

-4-2024

010

020

030

040

050

060

0-0

.2

-0.10

0.1

0.2

010

020

030

040

050

060

0-8

000

-600

0

-400

0

-200

00

2000

4000

HC

S m

odel

exa

mpl

e #2

:

Ck

time

time

velocity lateral offset steering applied force

Har

po’s

con

trol

Ck

mod

el c

ontr

ol

010

020

030

040

050

060

070

0405060708090

010

020

030

040

050

060

070

0

-4-2024

010

020

030

040

050

060

070

0-0

.2

-0.10

0.1

0.2

010

020

030

040

050

060

070

0-8

000

-600

0

-400

0

-200

00

2000

4000

010

020

030

040

050

060

0405060708090

010

020

030

040

050

060

0

-4-2024

010

020

030

040

050

060

0-0

.2

-0.10

0.1

0.2

010

020

030

040

050

060

0-8

000

-600

0

-400

0

-200

00

2000

4000

Additional hidden units in

neural network

0 100 200 300 400 500 600 700-6000

-4000

-2000

0

2000

4000

0 100 200 300 400 500 600-8000

-6000

-4000

-2000

0

2000

4000

time

appl

ied

forc

eap

plie

d fo

rce

time

linear model

one additional hidden unit

Need for discontinuous learning

Cq

,

Ck

models are convergent (stable) strategies, but are not “similar.”

• In reduced dimensional plot, “switches” overlap “nonswitches.”

• Continuous neural network may have difficulty modeling discontinuous (e.g. switching) control strategies.

-3 -2 -1 0 1 2 3

0

1

2

3

4

-3 -2 -1 0 1 2 3

0

1

2

3

1st PC2n

d P

C1st PC

2nd

PC

Current command = brake Current command = gas

Hybrid continuous/discontinuous control

Continuous cascade network steering control

Signal-to-symbol

conversion

Road description

Stochastic action choice

Discontinuous HMM acceleration control

r

x

r

y

,{ }

λ

1

λ

2

λ

N

O

P O

∗ λ

1

( )

P A

i

O

∗( )

P O

∗ λ

2

( )

P O

∗ λ

N

( ) δ

φ

v

ξ

v

η

ω δ φ, , , ,{ }

v

ξ

v

η

ω δ φ, , , ,{ }

r

x

r

y

,{

}

O

O

P A

i

( )

Actions

• Applied force > 0 (gas is currently active):

• : do nothing.

• : increase applied force.

• : decrease applied force, but keep > 0.

• : switch to braking

• Applied force < 0 (brake is currently active):

• : do nothing

• : increase braking force

• : decrease braking force

• : switch to accelerator

A

1

A

2

A

3

A

4

A

5

A

6

A

7

A

8

Statistical model and priors

• Single-state HMMs (encode arbitrary statistical distribution of inputs).

• Priors: frequency of occurrence for action in human data set

P O

A

i

( )

P O

∗ λ

i

( )≡

A

i

P A

i

O

∗( )

P O

∗ λ

i

( )

P A

i

( )∝

HC

S m

odel

exa

mpl

e: h

ybri

d

time

time

velocity lateral offset steering applied force

Gro

ucho

’s c

ontr

olhy

brid

mod

el c

ontr

ol

010

020

030

040

050

060

0405060708090

010

020

030

040

050

060

0

-4-2024

010

020

030

040

050

060

0-0

.2

-0.10

0.1

0.2

010

020

030

040

050

060

0-8

000

-600

0

-400

0

-200

00

2000

4000

010

020

030

040

050

060

0405060708090

010

020

030

040

050

060

0

-4-2024

010

020

030

040

050

060

0-0

.2

-0.10

0.1

0.2

010

020

030

040

050

060

0-8

000

-600

0

-400

0

-200

00

2000

4000

HC

S m

odel

exa

mpl

e #2

: hyb

rid

time

time

velocity lateral offset steering applied force

Har

po’s

con

trol

hybr

id m

odel

con

trol

010

020

030

040

050

060

070

0405060708090

010

020

030

040

050

060

070

0

-4-2024

010

020

030

040

050

060

070

0-0

.2

-0.10

0.1

0.2

010

020

030

040

050

060

070

0-8

000

-600

0

-400

0

-200

00

2000

4000

010

020

030

040

050

060

070

0405060708090

010

020

030

040

050

060

070

0

-4-2024

010

020

030

040

050

060

070

0-0

.2

-0.10

0.1

0.2

010

020

030

040

050

060

070

0-8

000

-600

0

-400

0

-200

00

2000

4000

A c

lose

r lo

ok

200

210

220

230

240

-800

0

-600

0

-400

0

-200

00

2000

4000

PA

1

O

()

PA

2

O

()

PA

4

O

()

PA

5

O

()

PA

6

O

()

PA

8

O

()

t

sec

()

φ

N

()

switc

h (g

as to

bra

ke)

switc

h (b

rake

to g

as)

Sample Groucho turning maneuver

0 2 4 6 8 10 12

60

65

70

75

80

85

0 2 4 6 8 10 12

-4

-2

0

2

4

0 2 4 6 8 10 12-0.2

-0.1

0

0.1

0.2

0 2 4 6 8 10 12-8000

-6000

-4000

-2000

0

2000

4000

v

(mi/h

)

δ

(rad

)

φ

(N)

d

ξ

(m)

t (sec)t (sec)

t (sec) t (sec)

Human-to-model validation

• Quantify qualitative observations

• Hybrid model exhibits greater similarity than either of the purely NN-based modeling approaches.

Larry Moe Curly Harpo0

0.1

0.2

0.3

0.4

0.5

Hybrid model

Ck model

Cq model

sim

ilari

ty

Model-to-human classification

Table 1: Hybrid model-to-human matching

Larry Moe Groucho Harpo

Larry’s model

0.450

0.329 0.315 0.069

Moe’s model

0.126

0.555

0.338 0.217

Groucho’s model

0.152 0.377

0.457

0.206

Harpo’s model

0.013 0.134 0.127

0.578

Table 2:

Ck

model-to-human matching

Larry Moe Groucho Harpo

Larry’s model

0.161

0.166

0.118 0.157

Moe’s model

0.056

0.088

0.063 0.041

Groucho’s model

0.056 0.066

0.096

0.040

Harpo’s model

0.006 0.008

0.012

0.003

σ

σ