radhika thesis
TRANSCRIPT
1
Context-Aware Middleware for Activity
Recognition
Masters Thesis DefenseRadhika Dharurkar
Advisor: Dr. Tim FininCommittee: Dr. Anupam Joshi Dr. Yelena Yesha Dr. Laura Zavala
2
Overview• Motivation• Problem Statement• Related work • Approach• Implementation• Experiments and Results• Contribution• Limitations• Future Work• Conclusion
3
Mobile Market
• 5.3 Billion mobile subscribers (77% of world’s population)
• Smart Phone Market - Predicted 30% growth/year
• 85% mobile handsets access mobile web
Pictures Courtesy: Mobile Youth
4
Motivation • Enhance User Experience
o Richer notion of context that includes functional and social aspects• Co-located social organizations• Nearby devices and people• Typical and inferred activities• Roles of the people
• Device understanding “Geo-Social Location” and perhaps Activity
• System by Service Providers and Administratorso Collaborationo Privacyo Trust
5
Motivation • Platys Project
Conceptual Place
• Tasks• Semantic Context Modeling• Mobility Tracking• Collaborative Localization• Privacy and Information Sharing• Context Representation, reasoning, and inference• Activity Recognition
6
Problem• Predict Activity of the user with the use of
“Smart Phone” • Capture data from different sensors present in
smart phone (atmospheric, transitional, temporal, etc.)
• Capture information of surrounding devices• Capture statistics about usage of phone (e.g.
battery usage, call list)• Capture information from other sources of
information (e.g. calendar)• Developed prototype system which can predict
almost 10 activities with better precision.
7
Platys Ontology
8
Activity Hierarchy
9
Related Work• Roy Want , Veronica Falcao , Jon Gibbons. “The
Active Badge Location System” (1992)• Guanling Chen, David Kotz. “A survey of context-
aware mobile computing research” (2000)• Gregory D. Abowd, Anind K. Dey, Peter J. Brown,
Nigel Davies, Mark Smith, and Pete Steggles. “Towards a better understanding of context and context-awareness” (1999)
• Stefano Mizzaro, Elena Nazzi, and Luca Vassena. “Retrieval of context-aware applications on mobile devices: how to evaluate?”(2008)
10
Related Work• Nicholas D. Lane, Emiliano Miluzzo, Hong Lu, Daniel Peebles,
Tanzeem Choudhury, and Andrew T. Campbell. “A Survey of Mobile Phone Sensing”, (2010)
• Hong Lu, Jun Yang, Zhigang Liu, Nicholas D. Lane, Tanzeem Choudhury, Andrew T. Campbell.
“The Jigsaw Continuous Sensing Engine for Mobile Phone Applications”, (2010)• Nathan Eagle, Alex (Sandy) Pentland, and David Lazer.
“Inferring friendship network structure by using mobile phone data”, (2009)
• Locale• “ActiveCampus”. William G. Griswold, Patricia Shanahan
Steven W. Brown, Robert T. Boyer, UCSD (2003)• Context COBRA. Harry Chen, Tim Finin, Anupam Joshi (2002)
11
Background: Context
Pictures Courtesy: 1) Mobile Youth 2) Zimmermann, A., Lorenz, A., Oppermann, R.: An operational definition of
context.
12
Approacho Automatically extract data from various data sources
with the help of smart phone
o Provide context modeling • Representation of context as ontologies• Represent the contextual information in a database
o Learning and Reasoning • Supervised learning approach• Identify feature set • Prediction of the Activity of the user
13
Architecture
14
Data Collection
User Tagging
Sensor Values
15
Data Collection
16
Data Extraction and Cleanup
17
Extracting Features
18
Classification
19
Toy Experiment• Data collected though framework developed by
eBiquity member which stored it in MySQL DB.• We added data from Google Calendar data • Data collected for one Student and one staff
member• Automated understanding of Calendar data• Manual cleaning up of data• Labeled instances to find “Conceptual Place”
o Student : 422 -Home, Lab, Class, Else whereo Staff Member : 280 – Home Vs. Office
20
Google Calendar
21
Toy Experiment• Data collected though framework developed by
senior members (Tejas) which stored in MySQL DB.
• Captured Google Calendar data • Data collected for one Student and one staff
member• Automated understanding of Calendar data• Manual cleaning up of data• Labeled instances
o Student : 422 -Home, Lab, Class, Else whereo Staff Member : 280 – Home, Office
22
Toy ExperimentSr. No
Captured Data
1 Device Id
2 Timestamp
3 Latitude
4 Longitude
5 Wi-Fi Status
6 Wi-Fi Count
7 Wi-Fi ID
8 Battery Status
9 Light
10 Proximity
11 Power Connected
12 User Present
13 Handset Plugged
14 Calendar Data
15 Temperature
23
Toy Experiment
Naïve Bayes J48 trees Random Trees Bayes Net Random Forest0
10
20
30
40
50
60
70
80
90
100
StudentPost Doc
Classifier
%
Accuracy
24
Analysis• Only few activities –> therefore good accuracy
• Data Sparse -> cannot do proper training
• Presence of Noise
• Artificially high decision-value to the information
• Overfitting
25
Experiment 1- Statistics
• Data collected though Application built for Android phone by Dr. Laura Zavala
• Added Bluetooth devices capture functionality• Data collected every 12 min for duration of 1 min (Notification)• Last activity saved, if user ignores.• Collects data from different
o Sensors o Nearby Wi-Fi deviceso Nearby Bluetooth devices (Paired, not paired)o GPS coordinates, Geo-locationo Call historyo User tagging for place and activity
26
Experiment 1- Statistics
• Collected data for 2 users for 2 weeks continuously.
• Captured Fine detailed activitieso 19 for Studento 14 or staff member
• Parsing for raw text data• Cleaning up the data• Transformation of data into feature vector• Use of Discretization techniques for continuous
attributes
27
Experiment 1- Accuracy
Naïve Bayes J48 trees Random Trees
Bayes Net Random Forest
0
10
20
30
40
50
60
70
80
90
100
StudentPost Doc
Classifier
%
Accuracy
28
Experiment 1- Analysis• Comparing with TOY experiment accuracy
o Similar accuracy for Naïve Bayes and Decision Trees in Toy Exp.
o Big drop in accuracy for decision trees here
• In Toy Experiment o Overfittingo Noiseo Missing Data
• In This Experimento We tried to work on cleanupo Discretization for sensor valueso Still have timestamp, Wi-Fi ids, such attributes as 1 feature.
29
Confused ActivitiesTotal Main Activity Conflicted Conflicted
54 Coffee/Snacks Working/Studying 12 Sleeping 5
218 Working/Studying Coffee/Snacks 5 Sleeping 8, Chatting 8
39 Reading Working/Studying 19 Sleeping 4
26 Cleaning Working/Studying 10 Sleeping 2
195 Sleeping Working/Studying 9
17 Cooking Working/Studying 5 Sleeping 3, Cleaning 2
49 Chatting/Talking on Phone Working/Studying 14 Sleeping 2 ,Coffee/Snacks 2
6 Class-Listening Class-TakingNotes 2
3 Talk-Listening Class-TakingNotes 1 Working/Studying 1
1 Watching Movie Sleeping 1
3 Dinner Working/Studying 3
9 Watching TV Working/Studying 3 Sleeping 6
1 Shopping Working/Studying 1
Student Data
30
Confused Activities
Total
Main Activity Conflicted Conflicted
525 Working/Studying Other/Idle 9 Sleeping 4 , Watching TV 6
9 Lunch Working/Studying 3 Other/Idle 1
72 Sleeping Working/Studying 19 Other/Idle 2
11 Cooking Working/Studying 3 Sleeping 2
78 Other/Idle Working/Studying 13 Walking 1
18 Watching TV Working/Studying 7 Other/Idle 1
2 Shopping Cooking 1
Staff Data
31
Experiment 2- Statistics
• Collected data for users for a month continuously.• Finer detailed activities captured
o 19 for Student
• Some activities were hard to distinguish -> reduced to small set of 9 activities for prediction
• Parsing for raw text data• Cleaned up the data• Use of Discretization techniques for continuous
attributes• Used “Bag of Words” approach
o Wi-Fio Geo-location o Bluetootho Timestamp
32
Experiment 2- Accuracy
Naïve Bayes J48 trees Bagging + J48 trees
LibSVM LibLinear0
10
20
30
40
50
60
70
80
90
Percentage split 66%
Cross Validation 10 FoldsClassifier
%
Accuracy
33
Experiment 2- Confusion Matrix
a b c d e f g h i j k <-- classified as 677 1 0 0 0 0 4 0 0 0 2 | a = [Sleeping] 0 186 0 0 20 0 3 0 5 0 0 | b = [Walking] 0 0 27 0 0 0 0 0 0 0 0 | c = [In Meeting] 0 2 0 65 0 4 0 0 0 0 0 | d = [Playing] 0 37 0 0 37 0 0 0 4 0 0 | e = [Driving/Transporting] 0 0 0 2 0 146 1 0 0 2 0 | f = [Class-Listening] 8 0 0 0 0 2 52 2 0 0 8 | g = [Lunch] 9 0 0 0 0 0 8 11 0 0 0 | h = [Cooking] 0 11 0 0 6 0 0 0 13 0 0 | i = [Shopping] 0 2 0 0 0 5 0 0 0 7 0 | j = [Talk-Listening] 5 0 0 0 0 0 1 0 0 0 34 | k = [Watching Movie]
34
Experiment 2- Analysis• Small Set of Activities analyzed• Individual basis• Naïve Bayes performance reduced
o More features includedo Less functional independence
• Decision Trees Accuracy Improvedo Bag of words approacho Concept Hierarchyo ConjunctionsInline with Research 1) “Physical Activity monitoring” by Aminian, Robert2) “Activity Recognition from user annotated accelerometer data” by Bao, intille
• Recognition accuracy is highest for Decision tree classifier => Proved Best for our Model
35
Accuracy for Models
11 Activities Stationary Vs. Moving 10 Activities In Meeting Vs. In Class
Home Vs. School Vs. Else Where
Home Vs. School82
84
86
88
90
92
94
96
98
100
% Accuracy
Classification for Activities
36
Small subset of Activities
• These activities do not have simple characteristics and are easily confused with other activities.o Phone kept on table while working, lunch, coffeeo Driving and Walking in school
• Not more sensor data to capture some activities• Model mostly relies on features like
o Wi-Fi IDso Geographic locationo Bluetooth Ids o Time of day
• Therefore, Hard to predict activities across users o E.g In Class, cooking (Does not predict relying on sound levels)
37
General Model
38
Classifiers Evaluating Our Data
Machine Learning Algorithm Evaluation Problems
Naive Bayes classifier Independence Assumption
Support vector machines Noise and Missing values
Decision trees Robust to errors, missing values, conjunctions
Random Trees No Pruning
Ensembles of classifiers Reduces Variance
39
Discretization• Filters – unsupervised attribute
• Binning
• Concept Hierarchy
• Division in intervals
• Smoothening the data
40
Bagging with J48• Ensemble Learning Algorithms
• Averaging over bootstrap samples reduces error from variance, esp. when small differences in training set can produce big difference between hypotheses.
41
Example J48+BaggingPlace = Home: Sleeping (9.0/2.0)Place = ITE346: In Meeting (1.0)Place = Outdoors| G1 = False| | Morning = True: Walking (5.0/2.0)| | Morning = False: Driving/Transporting (17.0/2.0)| G1 = True: Walking (2.0)Place = Home| Evening = False: Sleeping (20.0)| Evening = True| | noise = '(-inf-28.19588]': Cooking (0.0)| | noise = '(28.19588-32.71862]': Cooking (2.0)| | noise = '(32.71862-inf)': Watching Movie (1.0)Place = Restaurant: Lunch (5.0)Place = Movie Theater: Watching Movie (2.0)Place = Elsewhere: Walking (1.0)Place = ITE325: Talk-Listening (4.0)Place = ITE3338/ITE377: In Meeting (2.0)Place = Groceries store: Shopping (1.0)
loc2 = '(-inf-39.17259]': Watching Movie (2.0)loc2 = '(39.17259-39.18528]': Sleeping (0.0)loc2 = '(39.18528-39.19797]': Lunch (4.0)loc2 = '(39.24873-39.26142]': Walking (9.0/2.0)
Afternoon = False| Evening = False| | Place = Outdoors: Walking (1.0)| | Place = Elsewhere: Sleeping (0.0)| Evening = True: Walking (4.0)Afternoon = True| Wifi Id8 = True: In Meeting (3.0)| Wifi Id8 = False| | Place = Home: Lunch (0.0)| | Place = Restaurant: Lunch (4.0)| | Place = Movie Theater: Watching Movie (2.0)| | Place = Work/School: Working (1.0)| | Place = ITE346: Lunch (0.0)| | Place = Outdoors: Walking (1.0)| | Place = ITE3338/ITE377: Lunch (0.0)
Wifi Id8 = True: In Meeting (6.0/1.0)Wifi Id8 = False| Afternoon = False| | Evening = False: Sleeping (24.0/1.0)| | Evening = True: Walking (5.0)| Afternoon = True| | Place = Work/School: Working (1.0)| | Place = ITE346: Lunch (0.0)| | Place = Outdoors: Walking (1.0)| | Place = Home: Lunch (0.0)| | Place = ITE3338/ITE377: Lunch (0.0)
42
Contribution• Smart phone usage for Mid-level Activity
recognition (Supervised Learning Approach)• High level notion of context
• Accuracy of 88% for 9 Activities for a user• Accuracy Inline with other researches
o Home Vs Work 100% compared to 95% accuracy- MIT project using HMM
o Mid-level detailed activity recognition – Bao and Intille (MIT).o Highest Recognition Accuracy for Decision Tree classifier - Bao and
intille (MIT)
• General Model
43
Applications
Walking 1Working 2In Meeting 3Driving 4Other/Idle 5Watching TV 6Sleeping 7Cooking 8Talk-Listening 9Lunch 10Watching Movie 11Reading 12Shopping 13Coffee/Snacks 14
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Activity Distribution over a Week
Activity
Day
Mon
Tue
Wed
Thu
Fri
Sat
Sun
44
Applications
0:00
1:12
2:24
3:36
4:48
6:00
7:12
8:24
9:36
10:4
8
12:0
0
13:1
2
14:2
4
15:3
6
16:4
8
18:0
0
19:1
2
20:2
4
21:3
6
22:4
80
1
2
3
4
5
6
7
8
9
10
11
Weekday Activity Distribution
Timeline
Activity
Sleeping 1
Studying 2Coffee/Snacks 3
Reading 4
Driving/Transporting 5
Walking 6
In Meeting 7
Lunch 8Class-Listening 9
Class-Taking Notes 10
Chatting 11
45
Applications
0:00
1:12
2:24
3:36
4:48
6:00
7:12
8:24
9:36
10:4
8
12:0
0
13:1
2
14:2
4
15:3
6
16:4
8
18:0
0
19:1
2
20:2
4
21:3
6
22:4
80
1
2
3
4
5
6
7
8
9
10
Timeline
Activity
Walking 5
Studying 2
Transporting 6
Chatting 8
Playing 9
Sleeping 1
Other 10
Reading 4
Shopping 7
Coffee/Snacks 3
Weekend Activity Distribution
46
Applications• Understand Pattern of Activities for users• Keep a check on time spent
o Planner o Study Scheduleso Program Meetings
• Update Phone settings according to context• Recommendation Systems• Locate specific service nearby• Adjust presence of user• Update Calendar of a user
47
Limitations• Set of Experiments
o Duration of Data captureo Number of users for capturing data
• Information captured through Phone
• Audio, sound processing
• Training on data from different individuals for general model
48
Future• Robust General Model• Multiple feature sets for different kind of
predictions• Roles management• Rules for some ground truths or profiles• Collaborative activity inference• Models to incorporate sequence of activities
49
Thank you
50
ES – Decision Trees• Each node = attribute• End leaf gives classification results• Root node = Most information gain(Claude
Shannon) If there are equal numbers of yeses and no's, then there is a great deal of entropy in that value. In this situation, information reaches a maximum Info = -SUMi=1tom p1logp1
• attr 2 yes, 3 no=I([2,3])= -2/5 x log 2/5 - 3/5 x log 3/5
• Average them n subtract frm I(whole)
51
Classification via Decision Trees
• Effective with Nominal data • Pruning – correct potential overfitting• Confidence Factor = 0.25• Minimum number of Objects = 2• Error Estimation = (e+1)/(N+m)• Reduced Error Pruning - False• Sub tree Raising - True
“Decision Tree Analysis using Weka”- Sam Drazin, Matt Montag