an early warning system for ambient assisted living

An Early Warning System For Ambient Assisted Living

Andrea Monacchi

School of Computer ScienceReykjavik University

Menntavegur 1, IS-101, Iceland

[email protected]

June 4th, 2012

1 Introduction

2 Background

3 Related Work

4 Approach

5 Implementation

6 Evaluating the solution

7 Conclusions

Introduction

Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions

Motivation

Life expectancy increased significantly → more and more elderly people

http://www.minutewomen.net

Many elderly people live on their own.

may be affected by a cognitive or physical impairment

may need assistance to ensure their health, safety and well-being

Assistive technologies help reducing costs of dedicated caregivers.

Motivation

Daily life activities at home can generate dangers that may lead to accidents.

http://www.boomers-with-elderly-parents.com/

People with impairments find difficult to notice those situations.

Discovering dangers and warning users is important for preventing accidents.

Problem

Problem statement

Monitoring world changes:

being aware of the current context

predicting intentions leading to dangers

“Let’s suppose we have a way to recognize the current state and user’s goal.”

An early warning system is about:

Finding a safe path leading to the goal (i.e. simulating the user)

Disclosing dangers close to the user

Preventing dangers by alerting the user beforehand

Problem

Problem statement

Monitoring world changes:

being aware of the current context

predicting intentions leading to dangers

“Let’s suppose we have a way to recognize the current state and user’s goal.”

An early warning system is about:

Finding a safe path leading to the goal (i.e. simulating the user)

Disclosing dangers close to the user

Preventing dangers by alerting the user beforehand

Problem

One day in the future

It is like giving a look to the future to improve the present.

http://s2.thisnext.com

“Here is the thing about the future.Every time you look at, it changes,because you looked at it, and that

changes everything else.”. Next movie.

Research statement

Design a system that:

Gets a representation of the environment as input

Learns to evaluate states according to their danger levelExplores/interacts with the environment modelStores its experience

Guides and Alerts the user to prevent potential dangers

unitedshutdownsafety.com

We need to:

Represent the environment in terms of properties

Implement a decision maker that evaluates the danger level

Evaluate the effectiveness of the system

Research statement

We need to:

Research statement

We need to:

Research statement

We need to:

Research statement

We need to:

Background

Context-aware computing

Context is the way to produce unobtrusive systems.

Situational information (environment, user, ICT)

Understanding the human intent in order to act properly

Reducing the interaction and disappear into the environment

Context adaptation: (adaptive systems)

Planning agents

Machine learning agentslearning user’s preferencestailored and adaptive service

Context prediction: (proactive systems)

Anticipating future contexts

Proactive adaptation of services

e.g. heating based on next activity

Planning agents

Specifying dynamical systems

Knowledge representation

e.g. Situation calculus, Event calculus, Fluent calculus

The Game Description Language

First order logic and purely axiomatic language

Deterministic and fully observable games (I), imperfect information (II)

Games as state machinesState: set of fluents (holding properties)

Each Player selects an action to modify the global state

Specification of multiagent societies as gamesDeclarative language: agents learn to behave from rules

Knowledge representation

e.g. Situation calculus, Event calculus, Fluent calculus

The Game Description Language

First order logic and purely axiomatic language

Deterministic and fully observable games (I), imperfect information (II)

Games as state machinesState: set of fluents (holding properties)

Each Player selects an action to modify the global state

Specification of multiagent societies as gamesDeclarative language: agents learn to behave from rules

GDL relations: an example

role(?r) ?r is a player

init(?f) ?f holds in the initial po-sition

true(?f) ?f holds in the currentposition

legal(?r,?m) role ?r can perform themove ?m

does(?r,?m) role ?r does move ?m

next(?f) ?f holds in the next po-sition

terminal the state is terminal

goal(?r,?v) role ?r gets the reward?v

sees(?r,?p) the role ?r perceives ?pin the next turn

random the random player

(role x)(role o)

(init (cell 1 1 b))

(⇐ (legal ?player (mark ?x ?y))(true (cell ?x ?y b))(true (control ?player)))

(⇐ (next (cell ?x ?y ?player))(does ?player (mark ?x ?y)))

(⇐ (goal ?player 100)(line ?player))

(⇐ terminal(role ?player)(line ?player))

(⇐ terminal(not open))

GDL relations: an example

role(?r) ?r is a player

init(?f) ?f holds in the initial po-sition

true(?f) ?f holds in the currentposition

legal(?r,?m) role ?r can perform themove ?m

does(?r,?m) role ?r does move ?m

next(?f) ?f holds in the next po-sition

terminal the state is terminal

goal(?r,?v) role ?r gets the reward?v

sees(?r,?p) the role ?r perceives ?pin the next turn

random the random player

(role x)(role o)

(init (cell 1 1 b))

(⇐ (legal ?player (mark ?x ?y))(true (cell ?x ?y b))(true (control ?player)))

(⇐ (next (cell ?x ?y ?player))(does ?player (mark ?x ?y)))

(⇐ (goal ?player 100)(line ?player))

(⇐ terminal(role ?player)(line ?player))

(⇐ terminal(not open))

Learning to make complex decisions

Decision making: making a choice among several alternatives.

The real environment is stochastic.

Acting may imply unexpected effects

The same behaviour may yield different scores

Deterministic planners (e.g. online replanning) may not be enough

Various solutions:

Markov Decision Processes (MDPs)MDP = (S ,A,R,T )

Partially Observable MDPssensor model for (belief) states

Computing a policy:

Dynamic programming → complete transition model

Optimization methods → search for a policy

Reinforcement Learning

Various solutions:

Computing a policy:

Various solutions:

Computing a policy:

Various solutions:

Computing a policy:

Related Work

Related work

RL-GGP: integrating GGP and reinforcement learning (Jocular+RL-Glue)

Assisted Living with MDPsHandwashing tutoring system

Prompting aidsMinimizing intrusiveness and maximizing completed handwashingQuestionnaire for system-caregiver comparison

Control tasksSmart light control (e.g. MAVHOME)Energy savingMaximizing comfort and minimizing interaction

Visual and audio cues to notify dangers beforehandRule-based risk assessmentUser study to understand how users perceive and react to notifications

Approach

Modeling a domestic environment

Simulating user’s behaviour

Classification of people’s actions in domestic environments:

Action ExamplesPosition changes left, right, forward and backwardManipulation of passive objects

Take an apple

Hold an apple

Release the apple

Interaction with active objects

Switch a stove on/off

Open/Close a cupboard

Table: Actions in a domestic context

Game Description Language for modeling the domestic setting.

Using tools from the General Game Playing context

Leading expertise of Reykjavik University

Designing an early warning system

Guiding the user

Planning problem: finding a path of actions leading to the goal

Search driven by danger level: the path must avoid dangers

Environment is stochastic → Deterministic planners may not be enough

Probabilistic planning by means of MDPs and POMDPsSolution is a behaviour/policy covering each state

Guiding the user

Computing a policy

TD-learning → off-model algorithms (no prior knowledge required)

Q-learning

Off-policy methodThe selection can be guided by a pseudorandom strategy (e.g. ε-greedy)Thus more flexible, less realistic, and slower than on-policy ones

General way to perform planning in stochastic environments

Storing experience:

Tabular version (e.g. Hash table)Knowledge as entries (state,action) → valueRequires filling entries → infeasible for big state spaces

Function approximator: allows the learner to generalize from experiencelinear (e.g. weighted sum of features)non-linear (e.g. neural network)

Computing a policy

Q-learning

Storing experience:

Computing a policy

Q-learning

Storing experience:

Warning the user

Monitoring user’s sphere of protection

Finding dangerous states within sphere

Alerting the user when too closeDistance = number of actions to riskFirst action of each risky sequence

Variant of breadth-first search → limited depth

Implementation

Practical reasoning with GDL

An overview of the system

The system consists of:

Practical reasoning with GDL

Game dynamics as state machineAutomatic reasoning tool: The General Game Playing Base package

Language modificationsgoal → reward

danger relation

appliance and object to use certain rules

other agents as roles (e.g. telephone ringing)

Implementing a warning agent

Plenty of libraries and frameworks

However:Need for a customizable toolSimple implementation and learningexperience

QBox libraryTD(0), Q(0), Watkins Q(λ), SARSA The QBox logo

The QBox organization

The warning agent

Warning process: running an episode

Tabular Q(λ) agent + depth-limited breadth-first search

Experience stored in the brain used to evaluate and guide actual user’sbehaviour

System returns:Last action evaluationBest actionDanger levelAction to avoid

The user interface

Testing the system:

Providing awareness of currentstate

showing a viewusing visual indicators

Simulating particular situations

Solution:

Virtual environments forsimulating smart environments

Rapid prototyping technique inHCIFlexible, fast and cheap

jMonkey engineThe GUI during a simulation

Evaluating the solution

Evaluating the system: environment

Optimal policy specified going through state space

Deviation increases for unexplored states and wrong orders

1 Experiment = 20 policies trained for 200 episodes

Results reported as charts (jFreeChart library)

ExpDev(%) = (AvgDEV /AN) ∗ 100,

AvgDEV =∑N

k=1 devk/N

Acc(%) = 100− ExpDev

Evaluating the system: scenario

Domestic scenario as testing environment

User’s goal: cooking - using the pot and the stove

Danger: a flammable cleaning product

Evaluating the system: exploring the state space

Exploration of the state space: ε = 0.1, 0.3, 0.5, 0.7, 0.9, exponential decay0.9999.

Parameter Valueα (learning rate) 0.2α-decay 0.8α-decay type exponential (ensures convergence)γ (discount factor) 0.95λ (decay rate) 0.9

Evaluating the system: defining rewards

Rewards determines system behaviourDifficult taskMay produce cycles in the policy

Main behaviours:Take the bottle away from danger (danger matters)Stove set on without the pot (goal matters)

No-danger/Goal Danger/No-Goal No-danger/No-goal Danger/Goal Accuracy-1.0 1.0 -0.01 0.0 44.09%0.0 1.0 -0.01 0.0 39.63%1.0 0.0 -0.01 0.0 71.01%1.0 -1.0 -0.01 0.0 84.35%0.0 -1.0 -0.01 0.0 75.88%

Table: Results for different reward functions

Results may be improved by increasing exploration

Assessing the interaction with users

Conclusions

System able to prevent users from getting too close to dangers

General solution: GDL definitions

Danger is evaluated automatically

Indicators report suggestions and warning notifications to users

Future work

Future work: learning to intervene

Need for a dynamic threshold to decide whether to intervene

Adapting to different preferences and awareness faculties

System trained by the end user accepting or rejecting the intervention

Tailored serviceLack of generalityRequires interaction with actual usersFuture work

Future work

Future work: learning to intervene

Need for a dynamic threshold to decide whether to intervene

Adapting to different preferences and awareness faculties

System trained by the end user accepting or rejecting the intervention

Tailored serviceLack of generalityRequires interaction with actual usersFuture work

Future work

Implementing a function approximator and/or tile coding to scale thesolution

Exploiting hierarchical approaches

Assigning rewards through apprenticeship learning

Taking habits into account for the exploration

Learning to intervene to minimize discomfort

Speeding the reasoning process up by using FPGAs

Using virtual environments as time machines for simulating future events

Questions

Thanks for your attention.

“An early warning system for Ambient Assisted Living”

Andrea [email protected]

http://andreamonacchi.tk

an early warning system for ambient assisted living

Data & Analytics