cheap user modeling for adaptive systems
DESCRIPTION
Cheap User Modeling for Adaptive Systems. Presented by: Frank Hines Topics in CS Spring 2011. Primary reference : Orwant, J. (1996). For want of a bit th e user was lost: Cheap user modeling. IBM Systems Journal , 35 (3 , 4), 398-416. Limitless Information. 100’s of channels. - PowerPoint PPT PresentationTRANSCRIPT
Cheap User Modeling for Adaptive Systems
Presented by: Frank HinesTopics in CSSpring 2011
Primary reference:Orwant, J. (1996). For want of a bit the user was lost: Cheap user modeling. IBM Systems Journal, 35(3,4), 398-416.
Limitless Information
100’s of channels
One Size Fits All?“People have limited cognitive time they want to
spend on picking a movie.”
- Reed Hastings, CEO Netflix
Information Overload!
Paradox of choiceIncreased dissatisfactionIncreased fatigueIncreased anxietyLowered productivityLowered concentrationLowered quality
Amount of Information Con-sidered
Deci
sion
-mak
ing
Qua
lity
Information Overload
Can we Limit to the Most Relevant Info?
Sensors
Presentation U1
{a,c,d,f}
User Models
{U1,U2}
Content{a,b,c,d,
e,f}
Processing & Filtering
Learning
Toolbox
Presentation U2
{b,c,e}
User Modeling NOT strictly content filtering!
• Timing/Performance• Prioritization• Formatting
Doppelgänger
OverviewWhat is meant by adaptation? What is a user model?What can we predict?Just how predictable are we?
AdaptationAdaptation is a sign of intelligenceAdaptation in nature
Current Software
Usability vs. PersonalizationCommonalit
iesDifferenc
es
Adaptation in Software
Newsmap
Jadeite“One of the worst software design blunders in the annals of computing” – Smithsonian Magazine
Adaptation in Conversation
Human-Human interaction (discourse)
Human-Computer interaction
• Vocabulary (age)• Speech volume (noise)• Speech rate (time pressure)• Syntactic structure (cultural affiliation)• Topic (interests, knowledge)
Models“The sciences do not try to explain, they
hardly even try to interpret, they mainly make models.”
- John von Neumann
User Model
Models typically include:Knowledge
Beliefs
Goals
Plans
Schedules
Behaviors
Abilities
Preferences
Framework to “simulate” a user & predict that user’s actions
A mathematic relationship among variables
NOT necessarily a cognitive representation
GRUNDY (Rich, 1979) - book recommendations from personality traits
What can we predict?EventsInterestLocationBehavior
How can we predict an event?
0
6
12
18
24Reading Times
Observations
Tim
e of
Day
(24-
hr c
lock
)
f(n-1)f(n-1, n-2)
f(n-1, n-2, …, n-j)
Linear PredictionDiscrete time series
Predicts future values from linear function of past values
Canonical example: Tidal activityOther examples:
SunspotsSpeech processingStock pricesBranch predictionOil detection
{s[n 1],s[n 2],...,s[n j]}
s[n] f (s[n 1],s[n 2],...,s[n j])
Linear Prediction1. Compute autocorrelation vector R
R(L) 1N
snwnsnLwnLn0
N 1 L
2. Compute autocorrelation coefficients ak
akR(L k)k1
p
R(L)
3. Compute next observation sn
ˆ s n aksn kk1
p
CorrelationNo Shift
Shifted by two
observations
Shifted by one
observation
Shifted by n observation
s
Use in DoppelgängerInter-arrival time
Session duration
Relevant news chosen and collated beforehandTailored to length of time user has availableCan determine when user is expected to read emailProblems:
Confidence decreases as predictions advance into future
How can we predict interest?
Sports articles
4 out of 10 ‘Likes’
Technology articles
9 out of 10 ‘Likes’
Article
0 1 2 3 4 5 6 7 8 9
Rating
Like Like Like Hate
Like Like Like Like Like Like
Article
0 1 2 3 4 5 6 7 8 9
Rating
Like Hate
Hate
Hate
Like Like Hate
Like Hate
Hate
News Topic Interest by Section
Gene
ral N
ews
Spor
tsEd
itoria
lsBu
sines
s/Fina
nce
Clas
sified
Comi
csFo
od/C
ookin
gMo
vies
TV/R
adio
Tech
nolog
y
0
0.5
1
Section
Topi
c In
tere
st
Beta Distribution
(x) c(h,m)x h 1(1 x)m 1
for 0 x 1,where
c(h,m) (h m 1)!
(h 1)!(m 1)!
E(x) h
h m
2 hm
(h m)2(h m 1)
Confidence Mean Rating
Variance
Description of uncertainty of a probabilityBased on Hits & MissesNormalizes function so area under curve = 1
Rating and Confidence
H=1, M =1 H=2, M =2 H=5, M =5 H=10, M =10
H=5, M =25 H=25, M =5
As observations increase, confidence (height) increases and variance (width) decreases
Rating skews relative to hit/miss distribution
Use in DoppelgängerMeasuring topical interest
Problems:Equal weight on ratings over timeBinary classification of topicsCredit assignment when multiple classificationBinary feedback of yes/no
How can we keep track of location/state?We can use Markov Models
Markov ModelsDirected Graph
Set of statesInitial probabilitiesTransition probabilities
For each discrete time step, state advancesStationary random processMarkov property: No memory of past states traversed
0 1
2
3.3
.5.
2
.9
.1
.4
.61.
0
P(st 1 | st ,st 1,...,s0) P(st1 | st )
For statei, Aij 1j
Modeling a Student
Probability Transition Matrix
E H ST T SL E
H ST T SL
P(Eat | Sleep) .1
In general, P(r s ) Ast 1stt1
T
P(Eat,Study,HangOut,HangOut) P(Eat)* P(Study | Eat)* P(HangOut | Study)*P(HangOut | HangOut)
Uses in DoppelgängerPhysical location tracking
Printing priorityPhone call routing
Pre-fetching contentWebsite page navigation
Media Lab Locations
What to do if we can not observe the underlying states?
Can we infer state based on observable output?Yes, we can use “Hidden” Markov Models!We can use this technique to infer behavior
Hidden Markov Models
Symbol Emission Probabilities
For statei, ei x 1x
x
Hidden States
Extremely Useful TechniqueSpeech RecognitionPart of Speech TaggingDNA SequencingBiological Particle IdentificationToo many other areas to list!
Questions We Can AskWhat is the probability of a symbol sequence?
What is the most likely state sequence to generate a symbol sequence?
What are the most likely transition/emission probabilities that maximize a symbol sequence?
Forward Algorithm (evaluation)
Viterbi Algorithm (decoding)
Baum-Welch Algorithm (learning)
Forward Algorithm
But, exponential # of state sequencesHow do we solve in polynomial time?
P(x1x2K xT ) P(x i | si)P(si | si 1)i1
T
s1K sT
Dynamic Programming: Forward Algorithm
s1
s2
s3
x1 x2 x3 x4Output symbols
P(x1x2x3x4 ) P(x4 | s1,t4 ) P(x4 | s2, t4 ) P(x4 | s3, t4 )
Viterbi Algorithm
Via dynamic programming (similar to Forward)Instead of summing all previous paths, only max probability storedStore backpointer at each step for path reconstruction
s1*s2
*K sT* argmax
s1K sT
P(x i | si)P(si | si 1)i1
T
s1
s2
s3
x1 x2 x3 x4
Most probable state sequence:
s2, s1, s3, s2
Use in Doppelgänger
Hidden States Output symbols
Hacking HMM
Determine the “working” (i.e., psychological) state
Class of task being performedMore importantly, how much attention is demanded
What do we do if we do not have enough data about a particular user?
Substitute small amount of information from many other users
Cluster AnalysisMore computationally expensive than previous tools
But doesn’t change as often
Useful when little/no info about a userBased on correlations between users
Construct communitiesGather a few bits from many peopleSimilar to popular “collaborative filtering” techniques
K-Means Clustering
Prediction ToolboxLinear Prediction
Events
Beta DistributionInterest
Markov ModelLocation
Hidden Markov ModelBehavior
Cluster AnalysisWhen all else fails
Just how predictable are we?
Netflix competition (2006)Improve recommendation algorithm (Cinematch) by 10% for $1,000,000Winner: BellKor’s Pragmatic ChaosSolution: Independent convergence
Fusing 107 independent algorithmic predictions
The ‘Napoleon Dynamite’ Effect
“Human beings are very quirky and individualistic, and wonderfully idiosyncratic. And while I love that about human beings, it makes it hard to figure out what they like.”
- Reed Hastings, CEO of Netflix
Criticisms of Primary Article
Empirical evaluation of techniques?
vs. other techniques?vs. other cheap or expensive?vs. non-adaptive systems?
Concessions: Orwant’s motivation: galvanize cheap user modeling techniquesTechniques validated in other realms and in industry
ReferencesOrwant, J. (1996). For want of a bit the user was lost: Cheap user modeling. IBM Systems Journal, 35(3,4), 398-416.Makhoul, J. (1975). Linear Prediction: A Tutorial Review. Proceedings of the IEEE, 63(4), 561-580.Rabiner, L.R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2), 257-286.Singh, V., Marinescu, D.C., & Baker, T.S. (2004). Image Segmentation for Automatic Particle Identification in Electron Micrographs Based on Hidden Markov Random Field Models and Expectation Maximization. Journal of Structural Biology, 145, 123-141.Many other references not shown here
If interested, email me at [email protected]
Jon OrwantPh.D.
C.T.O.
Engineering Mgr.
Sharing Standards & Privacy
Protocol developmentUser Markup Language
Passive sensors as an invasion of privacy
Informed consent
Access to personal dataAccessor keywordsAccess Control Lists