cheap user modeling for adaptive systems

Cheap User Modeling for Adaptive Systems

Presented by: Frank HinesTopics in CSSpring 2011

Primary reference:Orwant, J. (1996). For want of a bit the user was lost: Cheap user modeling. IBM Systems Journal, 35(3,4), 398-416.

Limitless Information

100’s of channels

One Size Fits All?“People have limited cognitive time they want to

spend on picking a movie.”

- Reed Hastings, CEO Netflix

Information Overload!

Paradox of choiceIncreased dissatisfactionIncreased fatigueIncreased anxietyLowered productivityLowered concentrationLowered quality

Amount of Information Con-sidered

Deci

sion

-mak

ing

Qua

lity

Information Overload

Can we Limit to the Most Relevant Info?

Sensors

Presentation U1

{a,c,d,f}

User Models

{U1,U2}

Content{a,b,c,d,

e,f}

Processing & Filtering

Learning

Toolbox

Presentation U2

{b,c,e}

User Modeling NOT strictly content filtering!

• Timing/Performance• Prioritization• Formatting

Doppelgänger

OverviewWhat is meant by adaptation? What is a user model?What can we predict?Just how predictable are we?

AdaptationAdaptation is a sign of intelligenceAdaptation in nature

Current Software

Usability vs. PersonalizationCommonalit

iesDifferenc

es

Adaptation in Software

Newsmap

Jadeite“One of the worst software design blunders in the annals of computing” – Smithsonian Magazine

Adaptation in Conversation

Human-Human interaction (discourse)

Human-Computer interaction

• Vocabulary (age)• Speech volume (noise)• Speech rate (time pressure)• Syntactic structure (cultural affiliation)• Topic (interests, knowledge)

Models“The sciences do not try to explain, they

hardly even try to interpret, they mainly make models.”

- John von Neumann

User Model

Models typically include:Knowledge

Beliefs

Goals

Plans

Schedules

Behaviors

Abilities

Preferences

Framework to “simulate” a user & predict that user’s actions

A mathematic relationship among variables

NOT necessarily a cognitive representation

GRUNDY (Rich, 1979) - book recommendations from personality traits

What can we predict?EventsInterestLocationBehavior

How can we predict an event?

0

6

12

18

24Reading Times

Observations

Tim

e of

Day

(24-

hr c

lock

)

f(n-1)f(n-1, n-2)

f(n-1, n-2, …, n-j)

Linear PredictionDiscrete time series

Predicts future values from linear function of past values

Canonical example: Tidal activityOther examples:

SunspotsSpeech processingStock pricesBranch predictionOil detection

{s[n 1],s[n 2],...,s[n j]}

s[n] f (s[n 1],s[n 2],...,s[n j])

Linear Prediction1. Compute autocorrelation vector R

R(L) 1N

snwnsnLwnLn0

N 1 L

2. Compute autocorrelation coefficients ak

akR(L k)k1

p

R(L)

3. Compute next observation sn

ˆ s n aksn kk1

p

CorrelationNo Shift

Shifted by two

observations

Shifted by one

observation

Shifted by n observation

s

Use in DoppelgängerInter-arrival time

Session duration

Relevant news chosen and collated beforehandTailored to length of time user has availableCan determine when user is expected to read emailProblems:

Confidence decreases as predictions advance into future

How can we predict interest?

Sports articles

4 out of 10 ‘Likes’

Technology articles

9 out of 10 ‘Likes’

Article

0 1 2 3 4 5 6 7 8 9

Rating

Like Like Like Hate

Like Like Like Like Like Like

Article

0 1 2 3 4 5 6 7 8 9

Rating

Like Hate

Hate

Hate

Like Like Hate

Like Hate

Hate

News Topic Interest by Section

Gene

ral N

ews

Spor

tsEd

itoria

lsBu

sines

s/Fina

nce

Clas

sified

Comi

csFo

od/C

ookin

gMo

vies

TV/R

adio

Tech

nolog

y

0

0.5

1

Section

Topi

c In

tere

st

Beta Distribution

(x) c(h,m)x h 1(1 x)m 1

for 0 x 1,where

c(h,m) (h m 1)!

(h 1)!(m 1)!

E(x) h

h m

2 hm

(h m)2(h m 1)

Confidence Mean Rating

Variance

Description of uncertainty of a probabilityBased on Hits & MissesNormalizes function so area under curve = 1

Rating and Confidence

H=1, M =1 H=2, M =2 H=5, M =5 H=10, M =10

H=5, M =25 H=25, M =5

As observations increase, confidence (height) increases and variance (width) decreases

Rating skews relative to hit/miss distribution

Use in DoppelgängerMeasuring topical interest

Problems:Equal weight on ratings over timeBinary classification of topicsCredit assignment when multiple classificationBinary feedback of yes/no

How can we keep track of location/state?We can use Markov Models

Markov ModelsDirected Graph

Set of statesInitial probabilitiesTransition probabilities

For each discrete time step, state advancesStationary random processMarkov property: No memory of past states traversed

0 1

2

3.3

.5.

2

.9

.1

.4

.61.

0

P(st 1 | st ,st 1,...,s0) P(st1 | st )

For statei, Aij 1j

Modeling a Student

Probability Transition Matrix

E H ST T SL E

H ST T SL

P(Eat | Sleep) .1

In general, P(r s ) Ast 1stt1

T

P(Eat,Study,HangOut,HangOut) P(Eat)* P(Study | Eat)* P(HangOut | Study)*P(HangOut | HangOut)

Uses in DoppelgängerPhysical location tracking

Printing priorityPhone call routing

Pre-fetching contentWebsite page navigation

Media Lab Locations

What to do if we can not observe the underlying states?

Can we infer state based on observable output?Yes, we can use “Hidden” Markov Models!We can use this technique to infer behavior

Hidden Markov Models

Symbol Emission Probabilities

For statei, ei x 1x

x

Hidden States

Extremely Useful TechniqueSpeech RecognitionPart of Speech TaggingDNA SequencingBiological Particle IdentificationToo many other areas to list!

Questions We Can AskWhat is the probability of a symbol sequence?

What is the most likely state sequence to generate a symbol sequence?

What are the most likely transition/emission probabilities that maximize a symbol sequence?

Forward Algorithm (evaluation)

Viterbi Algorithm (decoding)

Baum-Welch Algorithm (learning)

Forward Algorithm

But, exponential # of state sequencesHow do we solve in polynomial time?

P(x1x2K xT ) P(x i | si)P(si | si 1)i1

T

s1K sT

Dynamic Programming: Forward Algorithm

s1

s2

s3

x1 x2 x3 x4Output symbols

P(x1x2x3x4 ) P(x4 | s1,t4 ) P(x4 | s2, t4 ) P(x4 | s3, t4 )

Viterbi Algorithm

Via dynamic programming (similar to Forward)Instead of summing all previous paths, only max probability storedStore backpointer at each step for path reconstruction

s1*s2

*K sT* argmax

s1K sT

P(x i | si)P(si | si 1)i1

T

s1

s2

s3

x1 x2 x3 x4

Most probable state sequence:

s2, s1, s3, s2

Use in Doppelgänger

Hidden States Output symbols

Hacking HMM

Determine the “working” (i.e., psychological) state

Class of task being performedMore importantly, how much attention is demanded

What do we do if we do not have enough data about a particular user?

Substitute small amount of information from many other users

Cluster AnalysisMore computationally expensive than previous tools

But doesn’t change as often

Useful when little/no info about a userBased on correlations between users

Construct communitiesGather a few bits from many peopleSimilar to popular “collaborative filtering” techniques

K-Means Clustering

Prediction ToolboxLinear Prediction

Events

Beta DistributionInterest

Markov ModelLocation

Hidden Markov ModelBehavior

Cluster AnalysisWhen all else fails

Just how predictable are we?

Netflix competition (2006)Improve recommendation algorithm (Cinematch) by 10% for $1,000,000Winner: BellKor’s Pragmatic ChaosSolution: Independent convergence

Fusing 107 independent algorithmic predictions

The ‘Napoleon Dynamite’ Effect

“Human beings are very quirky and individualistic, and wonderfully idiosyncratic. And while I love that about human beings, it makes it hard to figure out what they like.”

- Reed Hastings, CEO of Netflix

Criticisms of Primary Article

Empirical evaluation of techniques?

vs. other techniques?vs. other cheap or expensive?vs. non-adaptive systems?

Concessions: Orwant’s motivation: galvanize cheap user modeling techniquesTechniques validated in other realms and in industry

ReferencesOrwant, J. (1996). For want of a bit the user was lost: Cheap user modeling. IBM Systems Journal, 35(3,4), 398-416.Makhoul, J. (1975). Linear Prediction: A Tutorial Review. Proceedings of the IEEE, 63(4), 561-580.Rabiner, L.R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2), 257-286.Singh, V., Marinescu, D.C., & Baker, T.S. (2004). Image Segmentation for Automatic Particle Identification in Electron Micrographs Based on Hidden Markov Random Field Models and Expectation Maximization. Journal of Structural Biology, 145, 123-141.Many other references not shown here

If interested, email me at [email protected]

Jon OrwantPh.D.

C.T.O.

Engineering Mgr.

Sharing Standards & Privacy

Protocol developmentUser Markup Language

Passive sensors as an invasion of privacy

Informed consent

Access to personal dataAccessor keywordsAccess Control Lists

cheap user modeling for adaptive systems

Documents

user models

user modelmodels

length of time user

confidence h

cognitive time

n observations17use

limitless information

future values