an early warning system for ambient assisted living
TRANSCRIPT
An Early Warning System For Ambient Assisted Living
Andrea Monacchi
School of Computer ScienceReykjavik University
Menntavegur 1, IS-101, Iceland
June 4th, 2012
Index
1 Introduction
2 Background
3 Related Work
4 Approach
5 Implementation
6 Evaluating the solution
7 Conclusions
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Motivation
Motivation
Life expectancy increased significantly → more and more elderly people
http://www.minutewomen.net
Many elderly people live on their own.
may be affected by a cognitive or physical impairment
may need assistance to ensure their health, safety and well-being
Assistive technologies help reducing costs of dedicated caregivers.
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Motivation
Motivation
Daily life activities at home can generate dangers that may lead to accidents.
http://www.boomers-with-elderly-parents.com/
People with impairments find difficult to notice those situations.
Discovering dangers and warning users is important for preventing accidents.
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Problem
Problem statement
Monitoring world changes:
being aware of the current context
predicting intentions leading to dangers
“Let’s suppose we have a way to recognize the current state and user’s goal.”
An early warning system is about:
Finding a safe path leading to the goal (i.e. simulating the user)
Disclosing dangers close to the user
Preventing dangers by alerting the user beforehand
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Problem
Problem statement
Monitoring world changes:
being aware of the current context
predicting intentions leading to dangers
“Let’s suppose we have a way to recognize the current state and user’s goal.”
An early warning system is about:
Finding a safe path leading to the goal (i.e. simulating the user)
Disclosing dangers close to the user
Preventing dangers by alerting the user beforehand
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Problem
One day in the future
It is like giving a look to the future to improve the present.
http://s2.thisnext.com
“Here is the thing about the future.Every time you look at, it changes,because you looked at it, and that
changes everything else.”. Next movie.
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Research statement
Research statement
Design a system that:
Gets a representation of the environment as input
Learns to evaluate states according to their danger levelExplores/interacts with the environment modelStores its experience
Guides and Alerts the user to prevent potential dangers
unitedshutdownsafety.com
We need to:
Represent the environment in terms of properties
Implement a decision maker that evaluates the danger level
Evaluate the effectiveness of the system
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Research statement
Research statement
Design a system that:
Gets a representation of the environment as input
Learns to evaluate states according to their danger levelExplores/interacts with the environment modelStores its experience
Guides and Alerts the user to prevent potential dangers
unitedshutdownsafety.com
We need to:
Represent the environment in terms of properties
Implement a decision maker that evaluates the danger level
Evaluate the effectiveness of the system
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Research statement
Research statement
Design a system that:
Gets a representation of the environment as input
Learns to evaluate states according to their danger levelExplores/interacts with the environment modelStores its experience
Guides and Alerts the user to prevent potential dangers
unitedshutdownsafety.com
We need to:
Represent the environment in terms of properties
Implement a decision maker that evaluates the danger level
Evaluate the effectiveness of the system
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Research statement
Research statement
Design a system that:
Gets a representation of the environment as input
Learns to evaluate states according to their danger levelExplores/interacts with the environment modelStores its experience
Guides and Alerts the user to prevent potential dangers
unitedshutdownsafety.com
We need to:
Represent the environment in terms of properties
Implement a decision maker that evaluates the danger level
Evaluate the effectiveness of the system
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Research statement
Research statement
Design a system that:
Gets a representation of the environment as input
Learns to evaluate states according to their danger levelExplores/interacts with the environment modelStores its experience
Guides and Alerts the user to prevent potential dangers
unitedshutdownsafety.com
We need to:
Represent the environment in terms of properties
Implement a decision maker that evaluates the danger level
Evaluate the effectiveness of the system
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Context-aware computing
Context-aware computing
Context is the way to produce unobtrusive systems.
Situational information (environment, user, ICT)
Understanding the human intent in order to act properly
Reducing the interaction and disappear into the environment
Context adaptation: (adaptive systems)
Planning agents
Machine learning agentslearning user’s preferencestailored and adaptive service
Context prediction: (proactive systems)
Anticipating future contexts
Proactive adaptation of services
e.g. heating based on next activity
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Context-aware computing
Context-aware computing
Context is the way to produce unobtrusive systems.
Situational information (environment, user, ICT)
Understanding the human intent in order to act properly
Reducing the interaction and disappear into the environment
Context adaptation: (adaptive systems)
Planning agents
Machine learning agentslearning user’s preferencestailored and adaptive service
Context prediction: (proactive systems)
Anticipating future contexts
Proactive adaptation of services
e.g. heating based on next activity
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Context-aware computing
Context-aware computing
Context is the way to produce unobtrusive systems.
Situational information (environment, user, ICT)
Understanding the human intent in order to act properly
Reducing the interaction and disappear into the environment
Context adaptation: (adaptive systems)
Planning agents
Machine learning agentslearning user’s preferencestailored and adaptive service
Context prediction: (proactive systems)
Anticipating future contexts
Proactive adaptation of services
e.g. heating based on next activity
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
Specifying dynamical systems
Knowledge representation
e.g. Situation calculus, Event calculus, Fluent calculus
The Game Description Language
First order logic and purely axiomatic language
Deterministic and fully observable games (I), imperfect information (II)
Games as state machinesState: set of fluents (holding properties)
Each Player selects an action to modify the global state
Specification of multiagent societies as gamesDeclarative language: agents learn to behave from rules
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
Specifying dynamical systems
Knowledge representation
e.g. Situation calculus, Event calculus, Fluent calculus
The Game Description Language
First order logic and purely axiomatic language
Deterministic and fully observable games (I), imperfect information (II)
Games as state machinesState: set of fluents (holding properties)
Each Player selects an action to modify the global state
Specification of multiagent societies as gamesDeclarative language: agents learn to behave from rules
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
GDL relations: an example
role(?r) ?r is a player
init(?f) ?f holds in the initial po-sition
true(?f) ?f holds in the currentposition
legal(?r,?m) role ?r can perform themove ?m
does(?r,?m) role ?r does move ?m
next(?f) ?f holds in the next po-sition
terminal the state is terminal
goal(?r,?v) role ?r gets the reward?v
sees(?r,?p) the role ?r perceives ?pin the next turn
random the random player
(role x)(role o)
(init (cell 1 1 b))
(⇐ (legal ?player (mark ?x ?y))(true (cell ?x ?y b))(true (control ?player)))
(⇐ (next (cell ?x ?y ?player))(does ?player (mark ?x ?y)))
(⇐ (goal ?player 100)(line ?player))
(⇐ terminal(role ?player)(line ?player))
(⇐ terminal(not open))
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
GDL relations: an example
role(?r) ?r is a player
init(?f) ?f holds in the initial po-sition
true(?f) ?f holds in the currentposition
legal(?r,?m) role ?r can perform themove ?m
does(?r,?m) role ?r does move ?m
next(?f) ?f holds in the next po-sition
terminal the state is terminal
goal(?r,?v) role ?r gets the reward?v
sees(?r,?p) the role ?r perceives ?pin the next turn
random the random player
(role x)(role o)
(init (cell 1 1 b))
(⇐ (legal ?player (mark ?x ?y))(true (cell ?x ?y b))(true (control ?player)))
(⇐ (next (cell ?x ?y ?player))(does ?player (mark ?x ?y)))
(⇐ (goal ?player 100)(line ?player))
(⇐ terminal(role ?player)(line ?player))
(⇐ terminal(not open))
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
Learning to make complex decisions
Decision making: making a choice among several alternatives.
The real environment is stochastic.
Acting may imply unexpected effects
The same behaviour may yield different scores
Deterministic planners (e.g. online replanning) may not be enough
Various solutions:
Markov Decision Processes (MDPs)MDP = (S ,A,R,T )
Partially Observable MDPssensor model for (belief) states
Computing a policy:
Dynamic programming → complete transition model
Optimization methods → search for a policy
Reinforcement Learning
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
Learning to make complex decisions
Decision making: making a choice among several alternatives.
The real environment is stochastic.
Acting may imply unexpected effects
The same behaviour may yield different scores
Deterministic planners (e.g. online replanning) may not be enough
Various solutions:
Markov Decision Processes (MDPs)MDP = (S ,A,R,T )
Partially Observable MDPssensor model for (belief) states
Computing a policy:
Dynamic programming → complete transition model
Optimization methods → search for a policy
Reinforcement Learning
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
Learning to make complex decisions
Decision making: making a choice among several alternatives.
The real environment is stochastic.
Acting may imply unexpected effects
The same behaviour may yield different scores
Deterministic planners (e.g. online replanning) may not be enough
Various solutions:
Markov Decision Processes (MDPs)MDP = (S ,A,R,T )
Partially Observable MDPssensor model for (belief) states
Computing a policy:
Dynamic programming → complete transition model
Optimization methods → search for a policy
Reinforcement Learning
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
Learning to make complex decisions
Decision making: making a choice among several alternatives.
The real environment is stochastic.
Acting may imply unexpected effects
The same behaviour may yield different scores
Deterministic planners (e.g. online replanning) may not be enough
Various solutions:
Markov Decision Processes (MDPs)MDP = (S ,A,R,T )
Partially Observable MDPssensor model for (belief) states
Computing a policy:
Dynamic programming → complete transition model
Optimization methods → search for a policy
Reinforcement Learning
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Related work
Related work
RL-GGP: integrating GGP and reinforcement learning (Jocular+RL-Glue)
Assisted Living with MDPsHandwashing tutoring system
Prompting aidsMinimizing intrusiveness and maximizing completed handwashingQuestionnaire for system-caregiver comparison
Control tasksSmart light control (e.g. MAVHOME)Energy savingMaximizing comfort and minimizing interaction
Visual and audio cues to notify dangers beforehandRule-based risk assessmentUser study to understand how users perceive and react to notifications
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Modeling a domestic environment
Simulating user’s behaviour
Classification of people’s actions in domestic environments:
Action ExamplesPosition changes left, right, forward and backwardManipulation of passive objects
Take an apple
Hold an apple
Release the apple
Interaction with active objects
Switch a stove on/off
Open/Close a cupboard
Table: Actions in a domestic context
Game Description Language for modeling the domestic setting.
Using tools from the General Game Playing context
Leading expertise of Reykjavik University
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Guiding the user
Planning problem: finding a path of actions leading to the goal
Search driven by danger level: the path must avoid dangers
Environment is stochastic → Deterministic planners may not be enough
Probabilistic planning by means of MDPs and POMDPsSolution is a behaviour/policy covering each state
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Guiding the user
Planning problem: finding a path of actions leading to the goal
Search driven by danger level: the path must avoid dangers
Environment is stochastic → Deterministic planners may not be enough
Probabilistic planning by means of MDPs and POMDPsSolution is a behaviour/policy covering each state
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Guiding the user
Planning problem: finding a path of actions leading to the goal
Search driven by danger level: the path must avoid dangers
Environment is stochastic → Deterministic planners may not be enough
Probabilistic planning by means of MDPs and POMDPsSolution is a behaviour/policy covering each state
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Computing a policy
TD-learning → off-model algorithms (no prior knowledge required)
Q-learning
Off-policy methodThe selection can be guided by a pseudorandom strategy (e.g. ε-greedy)Thus more flexible, less realistic, and slower than on-policy ones
General way to perform planning in stochastic environments
Storing experience:
Tabular version (e.g. Hash table)Knowledge as entries (state,action) → valueRequires filling entries → infeasible for big state spaces
Function approximator: allows the learner to generalize from experiencelinear (e.g. weighted sum of features)non-linear (e.g. neural network)
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Computing a policy
TD-learning → off-model algorithms (no prior knowledge required)
Q-learning
Off-policy methodThe selection can be guided by a pseudorandom strategy (e.g. ε-greedy)Thus more flexible, less realistic, and slower than on-policy ones
General way to perform planning in stochastic environments
Storing experience:
Tabular version (e.g. Hash table)Knowledge as entries (state,action) → valueRequires filling entries → infeasible for big state spaces
Function approximator: allows the learner to generalize from experiencelinear (e.g. weighted sum of features)non-linear (e.g. neural network)
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Computing a policy
TD-learning → off-model algorithms (no prior knowledge required)
Q-learning
Off-policy methodThe selection can be guided by a pseudorandom strategy (e.g. ε-greedy)Thus more flexible, less realistic, and slower than on-policy ones
General way to perform planning in stochastic environments
Storing experience:
Tabular version (e.g. Hash table)Knowledge as entries (state,action) → valueRequires filling entries → infeasible for big state spaces
Function approximator: allows the learner to generalize from experiencelinear (e.g. weighted sum of features)non-linear (e.g. neural network)
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Warning the user
Monitoring user’s sphere of protection
Finding dangerous states within sphere
Alerting the user when too closeDistance = number of actions to riskFirst action of each risky sequence
Variant of breadth-first search → limited depth
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Practical reasoning with GDL
An overview of the system
The system consists of:
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Practical reasoning with GDL
Practical reasoning with GDL
Game dynamics as state machineAutomatic reasoning tool: The General Game Playing Base package
Language modificationsgoal → reward
danger relation
appliance and object to use certain rules
other agents as roles (e.g. telephone ringing)
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
QBox
Plenty of libraries and frameworks
However:Need for a customizable toolSimple implementation and learningexperience
QBox libraryTD(0), Q(0), Watkins Q(λ), SARSA The QBox logo
The QBox organization
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Implementing a warning agent
The warning agent
Warning process: running an episode
Tabular Q(λ) agent + depth-limited breadth-first search
Experience stored in the brain used to evaluate and guide actual user’sbehaviour
System returns:Last action evaluationBest actionDanger levelAction to avoid
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
The user interface
Testing the system:
Providing awareness of currentstate
showing a viewusing visual indicators
Simulating particular situations
Solution:
Virtual environments forsimulating smart environments
Rapid prototyping technique inHCIFlexible, fast and cheap
jMonkey engineThe GUI during a simulation
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: environment
Optimal policy specified going through state space
Deviation increases for unexplored states and wrong orders
1 Experiment = 20 policies trained for 200 episodes
Results reported as charts (jFreeChart library)
ExpDev(%) = (AvgDEV /AN) ∗ 100,
AvgDEV =∑N
k=1 devk/N
Acc(%) = 100− ExpDev
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: scenario
Domestic scenario as testing environment
User’s goal: cooking - using the pot and the stove
Danger: a flammable cleaning product
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: exploring the state space
Exploration of the state space: ε = 0.1, 0.3, 0.5, 0.7, 0.9, exponential decay0.9999.
Parameter Valueα (learning rate) 0.2α-decay 0.8α-decay type exponential (ensures convergence)γ (discount factor) 0.95λ (decay rate) 0.9
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: defining rewards
Rewards determines system behaviourDifficult taskMay produce cycles in the policy
Main behaviours:Take the bottle away from danger (danger matters)Stove set on without the pot (goal matters)
No-danger/Goal Danger/No-Goal No-danger/No-goal Danger/Goal Accuracy-1.0 1.0 -0.01 0.0 44.09%0.0 1.0 -0.01 0.0 39.63%1.0 0.0 -0.01 0.0 71.01%1.0 -1.0 -0.01 0.0 84.35%0.0 -1.0 -0.01 0.0 75.88%
Table: Results for different reward functions
Results may be improved by increasing exploration
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: defining rewards
Rewards determines system behaviourDifficult taskMay produce cycles in the policy
Main behaviours:Take the bottle away from danger (danger matters)Stove set on without the pot (goal matters)
No-danger/Goal Danger/No-Goal No-danger/No-goal Danger/Goal Accuracy-1.0 1.0 -0.01 0.0 44.09%0.0 1.0 -0.01 0.0 39.63%1.0 0.0 -0.01 0.0 71.01%1.0 -1.0 -0.01 0.0 84.35%0.0 -1.0 -0.01 0.0 75.88%
Table: Results for different reward functions
Results may be improved by increasing exploration
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: defining rewards
Rewards determines system behaviourDifficult taskMay produce cycles in the policy
Main behaviours:Take the bottle away from danger (danger matters)Stove set on without the pot (goal matters)
No-danger/Goal Danger/No-Goal No-danger/No-goal Danger/Goal Accuracy-1.0 1.0 -0.01 0.0 44.09%0.0 1.0 -0.01 0.0 39.63%1.0 0.0 -0.01 0.0 71.01%1.0 -1.0 -0.01 0.0 84.35%0.0 -1.0 -0.01 0.0 75.88%
Table: Results for different reward functions
Results may be improved by increasing exploration
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: defining rewards
Rewards determines system behaviourDifficult taskMay produce cycles in the policy
Main behaviours:Take the bottle away from danger (danger matters)Stove set on without the pot (goal matters)
No-danger/Goal Danger/No-Goal No-danger/No-goal Danger/Goal Accuracy-1.0 1.0 -0.01 0.0 44.09%0.0 1.0 -0.01 0.0 39.63%1.0 0.0 -0.01 0.0 71.01%1.0 -1.0 -0.01 0.0 84.35%0.0 -1.0 -0.01 0.0 75.88%
Table: Results for different reward functions
Results may be improved by increasing exploration
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Assessing the interaction with users
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Conclusions
Conclusions
System able to prevent users from getting too close to dangers
General solution: GDL definitions
Danger is evaluated automatically
Indicators report suggestions and warning notifications to users
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Future work
Future work: learning to intervene
Need for a dynamic threshold to decide whether to intervene
Adapting to different preferences and awareness faculties
System trained by the end user accepting or rejecting the intervention
Tailored serviceLack of generalityRequires interaction with actual usersFuture work
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Future work
Future work: learning to intervene
Need for a dynamic threshold to decide whether to intervene
Adapting to different preferences and awareness faculties
System trained by the end user accepting or rejecting the intervention
Tailored serviceLack of generalityRequires interaction with actual usersFuture work
Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Future work
Future work
Implementing a function approximator and/or tile coding to scale thesolution
Exploiting hierarchical approaches
Assigning rewards through apprenticeship learning
Taking habits into account for the exploration
Learning to intervene to minimize discomfort
Speeding the reasoning process up by using FPGAs
Using virtual environments as time machines for simulating future events
Questions
Thanks for your attention.
“An early warning system for Ambient Assisted Living”
Andrea [email protected]
http://andreamonacchi.tk