radhika thesis

1

Context-Aware Middleware for Activity

Recognition

Masters Thesis DefenseRadhika Dharurkar

Advisor: Dr. Tim FininCommittee: Dr. Anupam Joshi Dr. Yelena Yesha Dr. Laura Zavala

2

Overview• Motivation• Problem Statement• Related work • Approach• Implementation• Experiments and Results• Contribution• Limitations• Future Work• Conclusion

3

Mobile Market

• 5.3 Billion mobile subscribers (77% of world’s population)

• Smart Phone Market - Predicted 30% growth/year

• 85% mobile handsets access mobile web

Pictures Courtesy: Mobile Youth

4

Motivation • Enhance User Experience

o Richer notion of context that includes functional and social aspects• Co-located social organizations• Nearby devices and people• Typical and inferred activities• Roles of the people

• Device understanding “Geo-Social Location” and perhaps Activity

• System by Service Providers and Administratorso Collaborationo Privacyo Trust

5

Motivation • Platys Project

Conceptual Place

• Tasks• Semantic Context Modeling• Mobility Tracking• Collaborative Localization• Privacy and Information Sharing• Context Representation, reasoning, and inference• Activity Recognition

6

Problem• Predict Activity of the user with the use of

“Smart Phone” • Capture data from different sensors present in

smart phone (atmospheric, transitional, temporal, etc.)

• Capture information of surrounding devices• Capture statistics about usage of phone (e.g.

battery usage, call list)• Capture information from other sources of

information (e.g. calendar)• Developed prototype system which can predict

almost 10 activities with better precision.

7

Platys Ontology

8

Activity Hierarchy

9

Related Work• Roy Want , Veronica Falcao , Jon Gibbons. “The

Active Badge Location System” (1992)• Guanling Chen, David Kotz. “A survey of context-

aware mobile computing research” (2000)• Gregory D. Abowd, Anind K. Dey, Peter J. Brown,

Nigel Davies, Mark Smith, and Pete Steggles. “Towards a better understanding of context and context-awareness” (1999)

• Stefano Mizzaro, Elena Nazzi, and Luca Vassena. “Retrieval of context-aware applications on mobile devices: how to evaluate?”(2008)

10

Related Work• Nicholas D. Lane, Emiliano Miluzzo, Hong Lu, Daniel Peebles,

Tanzeem Choudhury, and Andrew T. Campbell. “A Survey of Mobile Phone Sensing”, (2010)

• Hong Lu, Jun Yang, Zhigang Liu, Nicholas D. Lane, Tanzeem Choudhury, Andrew T. Campbell.

“The Jigsaw Continuous Sensing Engine for Mobile Phone Applications”, (2010)• Nathan Eagle, Alex (Sandy) Pentland, and David Lazer.

“Inferring friendship network structure by using mobile phone data”, (2009)

• Locale• “ActiveCampus”. William G. Griswold, Patricia Shanahan

Steven W. Brown, Robert T. Boyer, UCSD (2003)• Context COBRA. Harry Chen, Tim Finin, Anupam Joshi (2002)

11

Background: Context

Pictures Courtesy: 1) Mobile Youth 2) Zimmermann, A., Lorenz, A., Oppermann, R.: An operational definition of

context.

12

Approacho Automatically extract data from various data sources

with the help of smart phone

o Provide context modeling • Representation of context as ontologies• Represent the contextual information in a database

o Learning and Reasoning • Supervised learning approach• Identify feature set • Prediction of the Activity of the user

13

Architecture

14

Data Collection

User Tagging

Sensor Values

15

Data Collection

16

Data Extraction and Cleanup

17

Extracting Features

18

Classification

19

Toy Experiment• Data collected though framework developed by

eBiquity member which stored it in MySQL DB.• We added data from Google Calendar data • Data collected for one Student and one staff

member• Automated understanding of Calendar data• Manual cleaning up of data• Labeled instances to find “Conceptual Place”

o Student : 422 -Home, Lab, Class, Else whereo Staff Member : 280 – Home Vs. Office

20

Google Calendar

21

Toy Experiment• Data collected though framework developed by

senior members (Tejas) which stored in MySQL DB.

• Captured Google Calendar data • Data collected for one Student and one staff

member• Automated understanding of Calendar data• Manual cleaning up of data• Labeled instances

o Student : 422 -Home, Lab, Class, Else whereo Staff Member : 280 – Home, Office

22

Toy ExperimentSr. No

Captured Data

1 Device Id

2 Timestamp

3 Latitude

4 Longitude

5 Wi-Fi Status

6 Wi-Fi Count

7 Wi-Fi ID

8 Battery Status

9 Light

10 Proximity

11 Power Connected

12 User Present

13 Handset Plugged

14 Calendar Data

15 Temperature

23

Toy Experiment

Naïve Bayes J48 trees Random Trees Bayes Net Random Forest0

10

20

30

40

50

60

70

80

90

100

StudentPost Doc

Classifier

%

Accuracy

24

Analysis• Only few activities –> therefore good accuracy

• Data Sparse -> cannot do proper training

• Presence of Noise

• Artificially high decision-value to the information

• Overfitting

25

Experiment 1- Statistics

• Data collected though Application built for Android phone by Dr. Laura Zavala

• Added Bluetooth devices capture functionality• Data collected every 12 min for duration of 1 min (Notification)• Last activity saved, if user ignores.• Collects data from different

o Sensors o Nearby Wi-Fi deviceso Nearby Bluetooth devices (Paired, not paired)o GPS coordinates, Geo-locationo Call historyo User tagging for place and activity

26


• Collected data for 2 users for 2 weeks continuously.

• Captured Fine detailed activitieso 19 for Studento 14 or staff member

• Parsing for raw text data• Cleaning up the data• Transformation of data into feature vector• Use of Discretization techniques for continuous

attributes

27

Experiment 1- Accuracy

Naïve Bayes J48 trees Random Trees

Bayes Net Random Forest

0

10

20

30

40

50

60

70

80

90

100

StudentPost Doc

Classifier

%

Accuracy

28

Experiment 1- Analysis• Comparing with TOY experiment accuracy

o Similar accuracy for Naïve Bayes and Decision Trees in Toy Exp.

o Big drop in accuracy for decision trees here

• In Toy Experiment o Overfittingo Noiseo Missing Data

• In This Experimento We tried to work on cleanupo Discretization for sensor valueso Still have timestamp, Wi-Fi ids, such attributes as 1 feature.

29

Confused ActivitiesTotal Main Activity Conflicted Conflicted

54 Coffee/Snacks Working/Studying 12 Sleeping 5

218 Working/Studying Coffee/Snacks 5 Sleeping 8, Chatting 8

39 Reading Working/Studying 19 Sleeping 4

26 Cleaning Working/Studying 10 Sleeping 2

195 Sleeping Working/Studying 9

17 Cooking Working/Studying 5 Sleeping 3, Cleaning 2

49 Chatting/Talking on Phone Working/Studying 14 Sleeping 2 ,Coffee/Snacks 2

6 Class-Listening Class-TakingNotes 2

3 Talk-Listening Class-TakingNotes 1 Working/Studying 1

1 Watching Movie Sleeping 1

3 Dinner Working/Studying 3

9 Watching TV Working/Studying 3 Sleeping 6

1 Shopping Working/Studying 1

Student Data

30

Confused Activities

Total

Main Activity Conflicted Conflicted

525 Working/Studying Other/Idle 9 Sleeping 4 , Watching TV 6

9 Lunch Working/Studying 3 Other/Idle 1

72 Sleeping Working/Studying 19 Other/Idle 2

11 Cooking Working/Studying 3 Sleeping 2

78 Other/Idle Working/Studying 13 Walking 1

18 Watching TV Working/Studying 7 Other/Idle 1

2 Shopping Cooking 1

Staff Data

31


• Collected data for users for a month continuously.• Finer detailed activities captured

o 19 for Student

• Some activities were hard to distinguish -> reduced to small set of 9 activities for prediction

• Parsing for raw text data• Cleaned up the data• Use of Discretization techniques for continuous

attributes• Used “Bag of Words” approach

o Wi-Fio Geo-location o Bluetootho Timestamp

32

Experiment 2- Accuracy

Naïve Bayes J48 trees Bagging + J48 trees

LibSVM LibLinear0

10

20

30

40

50

60

70

80

90

Percentage split 66%

Cross Validation 10 FoldsClassifier

%

Accuracy

33

Experiment 2- Confusion Matrix

a b c d e f g h i j k <-- classified as 677 1 0 0 0 0 4 0 0 0 2 | a = [Sleeping] 0 186 0 0 20 0 3 0 5 0 0 | b = [Walking] 0 0 27 0 0 0 0 0 0 0 0 | c = [In Meeting] 0 2 0 65 0 4 0 0 0 0 0 | d = [Playing] 0 37 0 0 37 0 0 0 4 0 0 | e = [Driving/Transporting] 0 0 0 2 0 146 1 0 0 2 0 | f = [Class-Listening] 8 0 0 0 0 2 52 2 0 0 8 | g = [Lunch] 9 0 0 0 0 0 8 11 0 0 0 | h = [Cooking] 0 11 0 0 6 0 0 0 13 0 0 | i = [Shopping] 0 2 0 0 0 5 0 0 0 7 0 | j = [Talk-Listening] 5 0 0 0 0 0 1 0 0 0 34 | k = [Watching Movie]

34

Experiment 2- Analysis• Small Set of Activities analyzed• Individual basis• Naïve Bayes performance reduced

o More features includedo Less functional independence

• Decision Trees Accuracy Improvedo Bag of words approacho Concept Hierarchyo ConjunctionsInline with Research 1) “Physical Activity monitoring” by Aminian, Robert2) “Activity Recognition from user annotated accelerometer data” by Bao, intille

• Recognition accuracy is highest for Decision tree classifier => Proved Best for our Model

35

Accuracy for Models

11 Activities Stationary Vs. Moving 10 Activities In Meeting Vs. In Class

Home Vs. School Vs. Else Where

Home Vs. School82

84

86

88

90

92

94

96

98

100

% Accuracy

Classification for Activities

36

Small subset of Activities

• These activities do not have simple characteristics and are easily confused with other activities.o Phone kept on table while working, lunch, coffeeo Driving and Walking in school

• Not more sensor data to capture some activities• Model mostly relies on features like

o Wi-Fi IDso Geographic locationo Bluetooth Ids o Time of day

• Therefore, Hard to predict activities across users o E.g In Class, cooking (Does not predict relying on sound levels)

37

General Model

38

Classifiers Evaluating Our Data

Machine Learning Algorithm Evaluation Problems

Naive Bayes classifier Independence Assumption

Support vector machines Noise and Missing values

Decision trees Robust to errors, missing values, conjunctions

Random Trees No Pruning

Ensembles of classifiers Reduces Variance

39

Discretization• Filters – unsupervised attribute

• Binning

• Concept Hierarchy

• Division in intervals

• Smoothening the data

40

Bagging with J48• Ensemble Learning Algorithms

• Averaging over bootstrap samples reduces error from variance, esp. when small differences in training set can produce big difference between hypotheses.

41

Example J48+BaggingPlace = Home: Sleeping (9.0/2.0)Place = ITE346: In Meeting (1.0)Place = Outdoors| G1 = False| | Morning = True: Walking (5.0/2.0)| | Morning = False: Driving/Transporting (17.0/2.0)| G1 = True: Walking (2.0)Place = Home| Evening = False: Sleeping (20.0)| Evening = True| | noise = '(-inf-28.19588]': Cooking (0.0)| | noise = '(28.19588-32.71862]': Cooking (2.0)| | noise = '(32.71862-inf)': Watching Movie (1.0)Place = Restaurant: Lunch (5.0)Place = Movie Theater: Watching Movie (2.0)Place = Elsewhere: Walking (1.0)Place = ITE325: Talk-Listening (4.0)Place = ITE3338/ITE377: In Meeting (2.0)Place = Groceries store: Shopping (1.0)

loc2 = '(-inf-39.17259]': Watching Movie (2.0)loc2 = '(39.17259-39.18528]': Sleeping (0.0)loc2 = '(39.18528-39.19797]': Lunch (4.0)loc2 = '(39.24873-39.26142]': Walking (9.0/2.0)

Afternoon = False| Evening = False| | Place = Outdoors: Walking (1.0)| | Place = Elsewhere: Sleeping (0.0)| Evening = True: Walking (4.0)Afternoon = True| Wifi Id8 = True: In Meeting (3.0)| Wifi Id8 = False| | Place = Home: Lunch (0.0)| | Place = Restaurant: Lunch (4.0)| | Place = Movie Theater: Watching Movie (2.0)| | Place = Work/School: Working (1.0)| | Place = ITE346: Lunch (0.0)| | Place = Outdoors: Walking (1.0)| | Place = ITE3338/ITE377: Lunch (0.0)

Wifi Id8 = True: In Meeting (6.0/1.0)Wifi Id8 = False| Afternoon = False| | Evening = False: Sleeping (24.0/1.0)| | Evening = True: Walking (5.0)| Afternoon = True| | Place = Work/School: Working (1.0)| | Place = ITE346: Lunch (0.0)| | Place = Outdoors: Walking (1.0)| | Place = Home: Lunch (0.0)| | Place = ITE3338/ITE377: Lunch (0.0)

42

Contribution• Smart phone usage for Mid-level Activity

recognition (Supervised Learning Approach)• High level notion of context

• Accuracy of 88% for 9 Activities for a user• Accuracy Inline with other researches

o Home Vs Work 100% compared to 95% accuracy- MIT project using HMM

o Mid-level detailed activity recognition – Bao and Intille (MIT).o Highest Recognition Accuracy for Decision Tree classifier - Bao and

intille (MIT)

• General Model

43

Applications

Walking 1Working 2In Meeting 3Driving 4Other/Idle 5Watching TV 6Sleeping 7Cooking 8Talk-Listening 9Lunch 10Watching Movie 11Reading 12Shopping 13Coffee/Snacks 14

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Activity Distribution over a Week

Activity

Day

Mon

Tue

Wed

Thu

Fri

Sat

Sun

44

Applications

0:00

1:12

2:24

3:36

4:48

6:00

7:12

8:24

9:36

10:4

8

12:0

0

13:1

2

14:2

4

15:3

6

16:4

8

18:0

0

19:1

2

20:2

4

21:3

6

22:4

80

1

2

3

4

5

6

7

8

9

10

11

Weekday Activity Distribution

Timeline

Activity

Sleeping 1

Studying 2Coffee/Snacks 3

Reading 4

Driving/Transporting 5

Walking 6

In Meeting 7

Lunch 8Class-Listening 9

Class-Taking Notes 10

Chatting 11

45

Applications

0:00

1:12

2:24

3:36

4:48

6:00

7:12

8:24

9:36

10:4

8

12:0

0

13:1

2

14:2

4

15:3

6

16:4

8

18:0

0

19:1

2

20:2

4

21:3

6

22:4

80

1

2

3

4

5

6

7

8

9

10

Timeline

Activity

Walking 5

Studying 2

Transporting 6

Chatting 8

Playing 9

Sleeping 1

Other 10

Reading 4

Shopping 7

Coffee/Snacks 3

Weekend Activity Distribution

46

Applications• Understand Pattern of Activities for users• Keep a check on time spent

o Planner o Study Scheduleso Program Meetings

• Update Phone settings according to context• Recommendation Systems• Locate specific service nearby• Adjust presence of user• Update Calendar of a user

47

Limitations• Set of Experiments

o Duration of Data captureo Number of users for capturing data

• Information captured through Phone

• Audio, sound processing

• Training on data from different individuals for general model

48

Future• Robust General Model• Multiple feature sets for different kind of

predictions• Roles management• Rules for some ground truths or profiles• Collaborative activity inference• Models to incorporate sequence of activities

49

Thank you

50

ES – Decision Trees• Each node = attribute• End leaf gives classification results• Root node = Most information gain(Claude

Shannon) If there are equal numbers of yeses and no's, then there is a great deal of entropy in that value. In this situation, information reaches a maximum Info = -SUMi=1tom p1logp1

• attr 2 yes, 3 no=I([2,3])= -2/5 x log 2/5 - 3/5 x log 3/5

• Average them n subtract frm I(whole)

51

Classification via Decision Trees

• Effective with Nominal data • Pruning – correct potential overfitting• Confidence Factor = 0.25• Minimum number of Objects = 2• Error Estimation = (e+1)/(N+m)• Reduced Error Pruning - False• Sub tree Raising - True

“Decision Tree Analysis using Weka”- Sam Drazin, Matt Montag

radhika thesis

Documents

retrieval of context

survey of context

context cobra

mobile market

mobile phone applications

mobile subscribers77

surveyof mobile phone

mobilephone data