socialfusion: context-aware inference and recommendation...

SocialFusion: Context-Aware Inference and Recommendation By FusingMobile, Sensor, and Social Data

Aaron Beach1, Mike Gartrell1, Xinyu Xing1, Richard Han1, Qin Lv1, ShivakantMishra1, Karim Seada2

1University of Colorado at Boulder, 2Nokia Research Center Palo Alto

Contact: {aaron.beach, mike.gartrell, xinyu.xing}@colorado.edu

Department of Computer ScienceUniversity of Colorado at Boulder

Technical Report CU-CS-1059-09

December 2009

SocialFusion: Context-Aware Inference andRecommendation By Fusing Mobile, Sensor, and Social

Data

Aaron Beach1, Mike Gartrell1, Xinyu Xing1, Richard Han1, Qin Lv1, Shivakant Mishra1, Karim Seada2

1University of Colorado at Boulder, 2Nokia Research Center Palo Alto{aaron.beach, mike.gartrell, xinyu.xing}@colorado.edu

ABSTRACTMobile social networks are rapidly becoming an importantnew domain showcasing the power of mobile computing sys-tems. These networks combine mobile location informationwith social networking data to enable fully context-awareenvironments. This paper proposes SocialFusion, a frame-work to support context-aware inference and recommenda-tion by fusing together mobile, sensor, and social data. Weinvestigate a case study of SocialFusion that enables an ap-plication for group-based context-aware video. We showhow SocialFusion can be used to gather mobile, sensor, andsocial data, infer group descriptors, and then apply thesemeta-level descriptors to improve the recommendation ofvideo playback for a group of users viewing the same screen.

1. INTRODUCTIONMobile social networks are in their infancy, but have

the potential to revolutionize the field of mobile com-puting by enabling fully context-aware inference andrecommendation in ubiquitous computing environments.To date, a variety of independent research projects havebegun to show the power of mobile social networks, e.g.Serendipity [15], WhozThat [8], and CenceMe [29]. Webelieve it is an opportune time to evolve a more system-atic framework upon which researchers can build a newfoundation for mobile context-aware computing. Thispaper presents SocialFusion, a framework for systemati-cally fusing together diverse input streams from mobilesmartphones, sensor networks, and social networks sothat inferences can be made from the assembled datato extract contextual clues and recommendations canthen be made based on these inferences to invoke theappropriate decision(s) and/or action(s).

SocialFusion is motivated by the observation that thephenomenal growth of social networks, mobile smart-phones, and sensor networks provides an unparalleledopportunity to achieve a more comprehensive under-standing of the context surrounding an individual innearly any given environment. Mobile devices are nearlyubiquitous and can be leveraged to provide location

and announce a user’s identity/presence in a room orplace [8, 15]. In addition, mobile smartphones havebeen enriched with a multitude of sensors, such as ac-celerometers, microphones, cameras, and even digitalcompasses, that can be mined to infer actions [29] oreven orientation. Environmental sensor networks, bothindoor and outdoor, provide the ability to monitor light,temperature, humidity, wind, and other parameters thatfurther enrich our understanding of context. Finally,social networks like Facebook, MySpace, Twitter andLinkedIn provide invaluable contextual information con-cerning the preferences of individuals, e.g. their favoritehobbies, bands, and films, as well as their relationshipswith one another. Moreover, all three classes of mo-bile, sensor, and social data when temporally archivedprovide historical perspectives that further enhance theunderstanding of context. The confluence of the ex-plosive growth in social networks, sensor networks, andmobile computing provides us with the ideal opportu-nity to fuse together all of these diverse input streamsand thereby achieve an optimal understanding of indi-vidual and group context, which can be leveraged tocreate a new frontier of context-aware applications.

Our observation is that it is the combination of mo-bile, sensor, and social data that can help build bettercontext-aware applications, and that each data streamindependently only provides an incomplete picture ofsituational awareness. For example, recent mobile healthapplications have been deployed to help monitor dis-ease outbreaks, but currently only list reported diseasehot spots [3] rather than an individual’s exposure risk,i.e. how close they may have come into contact withother affected individuals. By enhancing such applica-tions with detailed location information obtained froman individual’s smartphone, the application would beable to more accurately assess whether the individualwas exposed to a disease, by comparing to other loca-tion traces of individuals known to have contracted thedisease. However, the granularity of location providedby mobile location services often only localizes an in-dividual to the scale of a room or lecture hall, whichmay be insufficient to identify whether two individu-

1

als were in close enough proximity to communicate thedisease. By integrating additional social networking in-formation, namely friendship information, then the ap-plication would have a stronger inference of potentialcontact, namely two friends are more likely to sit to-gether in the same meeting room and thus more likely tohave come into contact. Note that the friendship infor-mation alone would not provide enough context absentthe location. It is the combination of social network-ing information with mobile location information thatmakes a much stronger case that there was the poten-tial risk of disease exposure, and thereby substantiallyenhances the existing health application with contextawareness.

Multimedia audio and video applications can also ben-efit from the fusion of mobile, sensor, and social data.For example, a music jukebox player has been enhancedto play customized location-aware and preference-awaremusic, namely it is aware of both the identity of indi-viduals within a room, via their cell phones, as well asthe preferences of those individuals in the room, viatheir social networking preferences, e.g. the list of theirfavorite bands [8]. Similarly, a video player has been en-riched with context so that it plays trailers that are cus-tomized to the tastes of people viewing a shared videoscreen together [19]. Note that in these applications, anew facet of context awareness emerges, namely the re-search challenge is not only of inferring characteristicsabout an individual, e.g. their tastes, but even morechallenging how to infer meaningful characteristics of agroup of individuals. It is the collective tastes of the in-dividuals that would influence the recommendation ofa film or song to a group of people.

SocialFusion is designed to enable such applicationsby providing the framework whereby applications canobtain mobile, sensor, and social data, integrate them,infer context, and decide on a suitable course of ac-tion. The challenges addressed by SocialFusion includehow to collect diverse data streams, how to protect theprivacy of the collected data, how to analyze and fusetogether the assembled data to infer higher-level con-textual meaning, e.g., the characteristics of a group,and how to use what we have learned from inferenceto effectuate appropriate actions for different kinds ofcontext-aware applications.

Figure 1 illustrates SocialFusion’s multi-stage com-puting framework. The first stage collects together datafrom three major classes of data input streams, namelysocial networks, mobile phones, and sensor networks.The second stage incorporates inference functionalitywhose task is to fuse the data and thus derive higher-level contextual meaning in the form of descriptors fromthe raw data. These descriptors, combined with theoriginal data, are then supplied to a third stage consist-ing of a recommendation engine that decides what kind

Social Networks

SocialFusion

Mobile Data

Sensor Data

Applica7ons

Case Study: Group‐Based Mul7media Applica7on

Data Collec7on

Data Fusion/ Inference

Recommenda7on Engine Stage III

Stage II

Stage I

Figure 1: SocialFusion’s multi-stage comput-ing framework for inferring context and recom-mending contextual action by fusing mobile, sen-sor, and social data.

of context-aware action to take. The contributions ofSocialFusion consist of the following:

• introducing the concept of fusing together socialnetworks, mobile smartphones, and sensor networksto improve context awareness, especially involvingsocial situations

• designing a multi-stage platform for multi-dimensionalfusion consisting of data collection/management,data inference/fusion, and recommendation stages

• using the platform to implement a particular context-aware application which is a case study involvingrecommending films to show to a group of people

• showing how the additional context awareness sup-plied by SocialFusion’s diverse input streams im-proves a context-aware application’s actions, i.e.,how the addition of social networks and group de-scriptors improves the group recommendation gen-erated by a context-aware video application

2. RELATED WORKPrior research has addressed portions of the SocialFu-

sion vision of integrating social networks, mobile phones,and sensor networks, but as far as we are aware, no sin-gle project has addressed the full scope proposed herefor SocialFusion. In the space of mobile social networks,Serendipity links together a stand-alone social networkwith mobile devices [15], but is not integrated with ex-isting social networks like Facebook. WhozThat inte-grates Facebook with mobile phones to provide context-aware audio, but does not integrate with any sensors [8].CenceMe mines mobile data provided by iPhone sen-sors to infer user actions, but is not integrated with

2

social networks [28]. Early work on smart spaces oc-curred largely prior to the advent of social networks [22,16]. Commercial location-aware mobile social network-ing services such as Brightkite [1] and Loopt [2] oftendevise their own social networks, with little to no ex-ploitation of full context awareness as proposed here.

Recommendation algorithms and systems have re-ceived significant attention from both academia and in-dustry since the mid-1990s when collaborative filteringwas introduced. Recommendation systems are usuallyclassified into two categories: content-based recommen-dations and collaborative filtering based recommenda-tions. Content-based recommendation systems recom-mend an item to a user based on item description anduser’s interests [10, 25] and are useful recommendingwebpages, news articles, items for sale, etc.

Collaborative filtering based systems recommend itemsthat other similar users have preferred [23, 6]. Collabo-rative recommendation computes the similarity to otherusers rather than to other items. Several hybrid rec-ommendation systems combine both collaborative andcontent-based methods [7, 12].

The majority of the research in recommendation sys-tems has been focused on making recommendations toindividual users. Making recommendations to a groupof users with potentially competing interests introducesa whole set of new challenges. A few projects that haveaddressed the problem of making recommendations toa group of users include [18, 30]. SocialFusion providessupport for making group recommendations based onall these approaches by exploiting the advantages ofcontent-based, collaborative and hybrid methods. Inaddition, it enhances the current state-of-the-art by in-corporating detailed context information obtained byfusing data from online social networks, mobile smart-phones and sensor networks. In particular, this detailedcontext information aids in better identifying the char-acteristics of a group as well as the current environmen-tal conditions, which in turn results in making bettergroup recommendations.

An emerging body of work seeks to identify groupsfrom individual data traces by applying graph theoryclustering approaches. Standard solutions to the clus-tering problem include the K-means or Fuzzy C-meansalgorithms. Convoy detection seeks to identify a grouptraveling together by using density-based analysis oftrajectory databases [21]. Fuzzy clustering has beenused as a technique to track robotic objects [24]. Sen-sors in the environment or on phones, such as accelerom-eters, have been used to identify groups [34]. Section 5describes how we build upon each of these techniques.

The design of SocialFusion has focused its privacyconcerns on two major issues. The first issue is presencesharing or more generally the association of users witha specific context. The second issue has to do with the

release of personal information in general and whetherinformation deemed private by a user can still be usefulto the system without compromising that user’s privacy.

Previous work has dealt with the issue of anony-mous presence sharing between users through match-ing their shared points in time and space [14, 26, 9].This is achieved through announcing anonymous iden-tifiers which are resolved through a trusted broker sys-tem. This work has formed the basis for anonymouspresence sharing in SocialFusion.

The second problem of sharing personal or private in-formation without compromising the privacy of the in-dividual involves guaranteeing the disassociation of allpublicly released private data from its associated user.The private data must be “disassociated” in the sensethat, under some guarantee, it cannot be re-associatedwith its private source identity. This problem is closelyrelated to the K-anonymity problem in which a releasedpiece of data is K-anonymous if it maps to no fewerthan K different sources. Prior work on K-anonymityin social networking data has largely focused on devel-oping algorithms that anonymize only the social graphof friendships [27, 33, 31], or both friendship and userprofile data obtained from social networks [13]. [27, 33,31] primarily involve perturbation of the social graphstructure in a social network. Many of these approachesalso are designed for off-line anonymization, whereasSocialFusion must react quickly to the appearance ofa user in an environment. Instead, SocialFusion offersa new approach to anonymization that does not allowthe perturbation of social links nor the modification orgeneralization of a user’s data, and in addition sup-ports real-time mobile social networking applications.We sketched an approach in [9], but offer the completesolution and evaluation in this paper.

3. SOCIALFUSION FRAMEWORKFigure 2 provides a detailed example of SocialFusion’s

multi-stage computing framework. SocialFusion dividesits data flow into a sequence of three stages with clearlyseparated functionality in each stage. The frameworkis designed to be general, so that new modules can beinserted and built on top of existing modules in a hi-erarchical fashion. In this case, the individual mod-ules pictured were each developed for our case studyof a group-based context-aware video application, anddemonstrate how the components of each stage fit to-gether. The modules with solid borders have been com-pleted and integrated into the SocialFusion framework,while the modules with dotted borders/lines have yetto be integrated into the full framework.

The first stage collects preference data from Facebookand Netflix, and location data from mobile phones, andplaces the information into a set of databases. As thereare both location privacy and data privacy concerns

3

Group‐Based Video Applica2on

Stage III

Stage II

Stage I

Mobile Data

Sensor Data

Iden2fying A Group

Leader Dominated?

Domain Exper2se?

Social Connected‐

ness

Pairwise Similarity

K‐Anonymity Layer

Content‐Based Rec

Collabora2ve Filtering Rec

Hybrid Rec

Figure 2: A detailed example of how Social-Fusion’s multi-stage computing framework canbe leveraged t o support a group-based context-aware video application.

with releasing such data, SocialFusion incorporates ananonymization layer in Stage I that manages the releaseof any of the collected information. Any release of infor-mation from its databases must satisfy a K-anonymitycriterion. Thus, any subsequent stage may only operateon data that has been sufficiently anonymized. This ap-proach protects each user’s privacy as soon as possibleafter data collection.

Stage II then applies a hierarchical set of describers tothe assembled data in order to extract or infer contex-tual meaning. A describer is a module with an inferencealgorithm that is tailored for analyzing a specific char-acteristic of the assembled data. A describer modulegenerates a descriptor parameter that assesses the de-gree to which the characteristic is possessed by the ana-lyzed data. For example, we have built a describer mod-ule for our case study that analyzes collected individualmobile location traces as well as Facebook friendshipand preference information in order to extract groupsof individuals that are traveling together, generating agroup ID descriptor that identifies the individuals in thegroup.

As shown, describers can be hierarchically organizedin Stage II to base their inferences upon the inferencesof other prior describers. For example, each of the fourdescriber modules Leader, Expert, Social, and Similar-

ity depends for their inferences upon the group ID de-scriptor generated by the group identifier describer, i.e.in order for the Leader describer to analyze whether thesocial graph of relationships within a group reveals thepresence of a dominant social leader, it must first knowthe membership of the group, which is revealed in thegroup ID descriptor.

Once the describer modules have inferred sufficientcontextual clues from the assembled data in the formof descriptors, then Stage III uses these inferences torecommend a context-aware action. The recommenderswill have access to not just the descriptors but also thecollected raw data in the database via the anonymitylayer, though this is not shown in the figure for thesake of clarity. For example, a set of describers mayideally determine individual users’ current activity ortask, their emotional mood, their physical health, andtheir musical tastes. These descriptors can be used asinput to a recommendation algorithm that decides whatkind of music to play to a user, e.g. knowing a user isexercising and likes rock music, then the recommenderchooses an up-tempo rock tune to play for the individ-ual.

We expect that as data gathering, inference modules,and recommendation algorithms become more sophisti-cated that contextual detail will become ever more en-riched and the power of SocialFusion to transform ourenvironments and spaces with adaptive context aware-ness will be even more starkly demonstrated. As a re-sult, we have focused our initial investigation of So-cialFusion on a case study that involves rich contex-tual detail, namely a group of individuals, who notonly have their individual preferences, but also sophis-ticated social relationships to one another rich withcontextual detail that affect what kind of recommen-dation should best be made. Focusing on context-richgroup behavior gives SocialFusion the best opportunityto demonstrate its utility towards improving the cus-tomization of context-aware applications. Moreover,since groups represent a common mode of human in-teraction, then SocialFusion will be shown to addressa common case. However, focusing on group behav-ior also poses a greater challenge in that determining arecommendation for a group of individuals is inherentlymore difficult than determining a recommendation fora single individual, which is already a difficult problemin and of itself.

Our SocialFusion project makes headway in solvingthe group-based context-awareness problem, and usesa group-based video application to do so, as shown inFigure 1. The group descriptors generated by the StageII describers are then used as input in Stage III to theset of recommendation algorithms investigated in thiscase study, namely content-based recommendation al-gorithms, collaborative filtering recommendation algo-

4

rithms, and a hybrid of the two. These recommenda-tions are compared in our evaluation section againstground truth obtained from our user studies of groups.

4. COLLECTING AND MANAGING DATAThe power of SocialFusion comes from the combina-

tion of data obtained from a variety of sources. Social-Fusion organizes input data streams into three majorclasses, namely mobile data from smartphones, sensordata from fixed sensor networks, and social networkingdata from online social networks. In some cases, Social-Fusion requires support from the sources from whichthese data streams are generated, e.g. by installing amobile application on smartphones. In other cases, So-cialFusion uses the standard APIs provided by the datasource to collect the relevant as and when needed.

Mobile data from smartphones provides importantcontextual information of users as well as groups of usersas they move from one place to another. This includeslocation data, accelerometer data, digital compass data,as well as pictures and videos captured using phone’scamera. This data is collected by installing a mobileapplication on the smartphones of the users. The appli-cation retrieves a user’s location and other sensor valuesand passes on that information to SocialFusion. Sincethis information is time dependent and varies as usersmove, the mobile application passes on fresh values atregular intervals. Our application allows the user to ad-just the tradeoff between update frequency and batterypower.

Data from fixed sensors embedded in the local smartspace, e.g. temperature, humidity, and infrared sensorstogether with microphones and video cameras is alsoforwarded to SocialFusion at regular intervals. The ex-act mechanism to facilitate this is dependent on thenature of the smart space.

Social networking data is obtained from online so-cial networking websites such as Facebook, Twitter,LinkedIn, MySpace, etc.. In some cases, this requirespermission from the users, e.g. Facebook. In others,the data is openly available, e.g. twitter, and can beimported using the standard APIs. Some informationobtained from social networking websites, such as userpreferences is relatively static, and is updated only oncein a while. On the other hand, information such asfrequency of communication between users, e.g. Wallposts, is obtained more often.

4.1 Protecting PrivacyA clear concern is protecting the privacy of informa-

tion supplied by users of SocialFusion. We offer a multi-faceted solution. First, SocialFusion is predicated uponan opt-in approach. Users voluntarily opt into the sys-tem by downloading and installing the mobile applica-tion on their cell phones, agreeing during the sign up

procedure to reveal their social networking and loca-tion information. Data uploaded from the mobile ap-plication is encrypted via https. Data accessed fromsocial networks preserves the privacy policies of eachof those networks. In the case of sites like Twitter,all information is public knowledge, while other siteslike Facebook have elaborate privacy safeguards. How-ever, even though SocialFusion adheres to the individ-ual privacy policies, it is the combination of informa-tion across streams that both strengthens context andposes a greater challenge to privacy, i.e. correlating in-formation about an individual across different sites andstreams may collectively reveal more about an individ-ual than that individual wanted to reveal when restrict-ing access via independent filters on each site or stream.

To provide a more powerful safeguard for cross-streamprivacy, we provide an anonymity layer that adheresto the classic K-anonymity criterion described earlier.The following challenges must be overcome in providingthis anonymity guarantee: (1) accommodating the het-erogeneity of the input mobile, sensor, and social datastreams; (2) computing the anonymity criterion in nearreal time so that we can quickly decide whether therelease of some data is K-anonymous - this is impor-tant so that the smart space can adapt quickly to thepresence and preferences of a user; and (3) supportingsubsequent stages of SocialFusion, namely the inferenceand recommendation stages.

Prior work on K-anonymity seems unsuited becauseit typically distorts the data before release either by in-troducing a random perturbation or transformation intothe social graph [27, 33, 31], e.g. inserting a false nodeinto a graph or a false link, or by generalizing or ”fuzzi-fying” the information prior to public release [13], e.g.showing only an averaged value rather than a specificvalue. The problem this introduces is that distortingthe information prior to inference can impair the qual-ity of the inference. For example, inference that relieson identifying the leadership qualities of a group is verysensitive to the connectivity in the social graph, andintroducing false information can mislead the inferenceand generate inaccurate descriptors. The ripple effect isthat context-aware recommendations as well as outputactions will also be negatively affected.

Instead, we have developed a new approach to K-anonymity based on the principle of selective withhold-ing. In this approach, we do not distort the data inany matter, but rather decide only whether to selec-tively withhold the release of a piece of unperturbeddata, determining if said data release compromises theK-anonymity guarantee. This approach is intuitivelypreferable to distortion, since describer modules andrecommendation algorithms will operate only on cleanoriginal data, and in this way will not be misled withinferences based on falsified data, though in some cases

5

!"#$% &'(")'*+,-$*.% /01$*23% 4'503$3%

!"##$ %&'(")*+,$-#.))$ /00'$ 1$

2+'3$ 45)"-$-60-'+*$ /00'$ 7$

86.00'$ 45)"-$-60-'+*$ !69$ 7$

86'$ %&'(")*+,$-#.))$ %&+")$ 1$

!"##:$%&'(;$

86':$%&'(;$

!"##:$/00'$

2+'3:$/00'$

!"##:$1$

86':$1$

<=>$!<?@=$

1$

7$

Figure 3: SocialFusion selectively withholdsdata in order to satisfy K-anonymity. A directedgraph relating identity to the mobile, sensor,and social data generates a complex Boolean ex-pression that is then minimized to find K.

they may only see a partial view of the total data picturein order to preserve K-anonymity. We believe that ourselective withholding approach to K-anonymity breaksnew research ground.

Our selective withholding approach to K-anonymityanalyzes whether the release of a certain piece of data,e.g. a location or a social networking preference, willprovide anonymity up to a degree of K. Consider theexample shown in Figure 3. If the data set we wishto release is (Chemistry class, Chris, 1), then the onlypossibility is (Joe). However, if we withhold the dataitem (Chris) and only release (Chemistry class, 1), thenthe possible identities are (Joe) OR (Bill). Thus, we’veincreased the anonymity to K ≤ 2. This example showshow we could selectively withhold certain data items toincrease K to a desired value.

For more complex data sets, the Boolean expressionlinking the set of users to the set of data to be re-leased needs to be systematically derived, which canbe achieved using a directed graph that models the re-lationship between the data to be released and the userswho could be linked to that data. For any data set thatwe wish to release (d1, d2, ..., dn), we construct a col-umn of nodes for each di, where each node in columni consists of the pair (username, di). This identifies allpossible users who could be associated with the releaseof data item di. Next, we interconnect the nodes in acolumn with the nodes in the next column. This createsthe directed graph. The set of all truth cases would bethe superset of all paths from beginning nodes to theend nodes.

For example, before releasing (Chemistry class, Anne,1), we generate the directed graph shown in Figure 3consisting of three columns, one each for all users asso-ciated with the Chemistry class, Anne, and one course.A single path #1 through the graph corresponds to theBoolean expression (Bill AND Bill AND Bill), or just(Bill). A second zigzag path #2 corresponds to the

Boolean expression (Joe AND Bill AND Joe), or just(Joe, Bill). The union of all possible paths through thegraph gives us all possible combinations of users thatcould be associated with the release of the data (Chem-istry class, Anne, 1).

Since the full Boolean expression could be quite com-plex, we need to devise an algorithm to quickly re-duce the expression. We observe that logic minimiza-tion algorithms can accomplish this task. Several wellknown logic minimization algorithms exist, includingESPRESSO [11] and Quine-McCluskey. When appliedto our example graph in the figure, the simplified Booleanexpression elegantly reduces to (Bill) OR (Joe, Fred),thus illustrating the power of this technique. We deter-mine if the simplified Boolean expression meets some K-anonymity guarantee by counting the number of termsin the simplified expression. The number of terms inthe simplified expression equals the value K that wewish to compute.

5. PATTERN INFERENCE AND ACTUATIONOnce mobile, sensor, and social data has been col-

lected, stored efficiently and accessed through a K- anony-mous interface the data can then be processed by theapplication. However, the application is confronted witha sea of raw data. SocialFusion uses “inference mod-ules” to make sense of this “sea of data.”

In general, the task of making sense of the sea of dataconsists of describing patterns found in the data. Thesepatterns can then be used directly or help the applica-tion decide how to use the raw data. Pattern recog-nition must be done quickly and efficiently to supportreal-time applications. Also, the ability to find patternsthat incorporate disparate data types such as correlat-ing behavior information with location tags holds themost potential benefit for complicated mobile social net-working applications (such as a real-time group basedvideo recommendation system).

5.1 Frequent Pattern AnalysisOur first step is to identify frequent patterns in users’

mobile, sensor, and social data, including both frequentitemsets and frequent sequences. A frequent itemsets isa set of items that co-occur frequently. For example,“John”, “reading health-related news articles”, “morn-ing”, and “riding bus to work” may be a set of itemsthat co-occur many times in the mobile, sensor, andsocial data that we gather about John. A frequent se-quence is a set of items that often occur in a certain se-quence. For example, “pick up Tom”, “go to a footballgame”, and “ having dinner” may be a sequence of ac-tivities (items) that appear frequently and in the sameorder for John and Mike. SocialFusion leverages exist-ing frequent pattern analysis techniques developed inthe data mining community. While traditional frequent

6

pattern analysis techniques assume pre-defined and usu-ally homogeneous items, SocialFusion has to supportitems from different data sources (e.g., mobile, sensor,social) and of diverse entities (e.g., user, content, time,location, activity), as well as easy and flexible exten-sions to support new types of items. In addition, differ-ent items have to be associated with their correspondingtime information in SocialFusion, such that time-basedordering of items can be efficient calculated.

Due to the diversity in users’ interests and daily activ-ities, we first apply frequent pattern analysis techniques(both frequent itemsets and frequent sequences) on in-dividual users and individual data sources of each user,then combine those patterns to infer more complex pat-terns that span across multiple users and multiple datasources. This approach helps us to quickly prune outlarge amounts of noisy and infrequent items in each cat-egory and focus on identifying and combining the muchsmaller number of patterns that are truly frequent andcover multiple users and data sources.

5.2 Pattern-Based ActuationThe frequent itemset and sequence patterns inferred

above are indexed and maintained by the SocialFusionsystem, such that at run time, as up-to-date mobile,sensor, and social data are continuously captured, theSocialFusion system monitors the data and repeatedlyconducts the following tasks:1. Check the captured social fusion data against theinferred frequent patterns to determine if one or mul-tiple matches exist between current state of data andfrequent patterns.2. If such a match does exist, the system determinesfurther if an output action is needed and feasible, andthe type of action (e.g., individual vs. group, playing asong vs. recommending a movie).3. If the matched pattern indicates that a certain actionis desirable, the system then invokes the correspondingactuation agent, such as choosing and playing a songthat the user likes or recommending a movie to groupof friends in the same room.

Checking for matched patterns and actuating the cor-responding agents in SocialFusion are supported effi-ciently. Although large amounts of mobile, sensor, andsocial data continue to be captured and inserted intothe SocialFusion system, since the system has alreadyidentified the small number of frequent patterns of in-terest and patterns that would require certain actua-tions, only certain contexts need to be detected andrecommendations made, enabling very efficient outputactions. Through the fusion of a variety of mobile, sen-sor, and social data, our SocialFusion system not onlydetects more reliable and more comprehensive patterns,but also enables more accurate context/activity classifi-cation and context-aware recommendations. Such capa-

bilities and accuracies are not possible in previous sys-tems that rely on single or same-type data sources. Forinstance, while a previous system may be able to rec-ommend classic music and action movies to a user, So-cialFusion could recommend classic music when a useris walking on the street or an action movie when he iswith a group of friends on a Saturday evening.

5.3 Example: Real-Time Social InferenceA clear understanding or summary of social context

is at the core of SocialFusion’s Inference Engine. A setof inference modules have been implemented which canidentify and describe social groups in real-time. Usingtrajectory information, possible “convoys” are identifiedand then a particular query (group or user) is matchedto a “convoy” optimizing for social and accelerometerindicators such as social network friendship, correlatedmovements and physical orientation. This convoy isthen efficiently described as it relates to the particularquery. For instance, if the query targets movie infor-mation, an “expert descriptor” would be included withthe group description implying the relative expertise ofgroup members as related to movies.

Group identification.This can be framed as a frequent itemset problem.

For instance, when location updates are searched withinan appropriate temporal window trajectory clusteringor “convoy detection” effectively becomes a multi- di-mensional clustering problem in which time-stamped lo-cations are clustered within different segments of timeand frequently occurring clusters represent convoys. Thiscan be done through maintaining and modifying a “fuzzy”set of clusters across multiple time periods as done forobject tracking in many applications [24]. Because So-cialFusion uses many different data sources includingphone sensors and social information to more preciselyidentify social structure, fuzzy clustering works partic-ularly well for SocialFusion’s group identification as itis straightforward to perturb the clusters. For exam-ple, accelerometer readings from multiple phones canbe correlated to infer grouping [34]. Phone sensors cannot only be correlated with each other but can also beassociated with simple group behaviors or actions whichgive us more information about the state of the group.Furthermore, groupings can be optimized by evaluat-ing the social relations between group members suchas the friend-link density between users of Facebook orthe temporal contact graph gathered over long periodsof time which could detect long-term patterns such asmeetings that occur every third monday of the month.

The general approach taken by this project is to firstdo fast and efficient trajectory clustering or convoy de-tection resulting in a fuzzy clustering of all nearby clus-ters relevant to a query or individual. The relevant indi-

7

Table 1: Classification of Group Descriptorscontent social structure

single user Expert Leaderpair of users Similarity Social

vidual or individuals are then checked for other availablesources of information (Facebook, sensor data, long-term behavior modeling) with which the clusters aremodified and the maximally relevant cluster is chosen.

Group descriptors.In SocialFusion, a set of group describer modules

are implemented to capture the key characteristics of agroup and use these characteristics for group-based ac-tions (e.g., movie recommendation). Usually, multiplegroup describers can be used, and each group describeraims to characterize or summarize a particular aspectof a group. Specifically, the following group describermodules have been implemented in SocialFusion:

Expert Describer measures the relative expertise ofdifferent group members as related to a specific topic.The opinions of experts may be weighted more impor-tant than that of other group members. The describeralso summarizes a group’s absolute mean expertise, e.g.,a group of casual movie watchers or film critics.

Leader Describer measures the influence of individualmembers in a group based their social networks usingdegree or betweenness centrality metrics. For example,a person who has more friends in his/her social net-works and posts more frequently is considered a leaderand more influential in a group. The describer alsosummarizes how unbalanced the group members, suchas a group of equally-important peers or a hierarchycentered around a few group members.

Social Describer measures the strength of all pairwisemember social links in a group. Tighter and stronger so-cial relationships can be more influential within a group.Also, the describer summaries how densely or stronglya social group is connected, such as a group of highly-connected close friends or relative strangers.

Similarity Describer provides a relative weighting ofhow similar each pair of users are in terms of their con-tent interests. Closely-shared interests can be more in-fluential in a group. Also, the general interest diversityof an entire group can be summarized by this describer.

As shown in Table 5.3, the group descriptors gener-ated by the four describers above characterize groupsalong two major dimensions: (a) pairwise relations vs.individuals; and (b) content vs. social structure. BothLeader and Expert are single-user descriptors, whileSimilarity and Social are pairwise descriptors. On theother hand, both Expert and Similarity are content-based descriptors, while Leader and Social are basedon social structures. Depending on a group’s activitystate or purpose and how cohesive the group is, certaindescriptors may weight more than the others. For in-

stance, in some situations, experts may be more impor-tant and in other situations social relations may domi-nate group decisions. Our current work in SocialFusionis to develop and integrate useful metrics that not onlydescribe a group but also guide how descriptions of thegroup should be interpreted and utilized for upper-levelapplications.

6. CASE STUDY: GROUP-BASED MOVIERECOMMENDATION SYSTEM

To validate our design of the SocialFusion frameworkand to demonstrate how SocialFusion enables more pow-erful context-aware mobile computing, we present inthis section a case study of a group-based movie rec-ommendation system. This case study makes specificuse of the “fusion” capabilities of SocialFusion, captur-ing important mobile, sensor, and social data, inferringinherent group-based patterns and characteristics, andusing the inferred patterns for group-based actuation,i.e., recommending movies to a group of users who arephysically located in the same room.

Although various recommendation systems exist forindividual users [6], groups of users [20], and specificdata types, SocialFusion is the first framework thatenables group recommendation based on comprehen-sive mobile, sensor, and social data. Instead of a sim-ple approach that aggregates multiple users into a sin-gle “virtual” user, SocialFusion extracts and makes useof a number of group-based descriptors for better in-formed group recommendation. We have described inthe previous section how a coherent social group canbe identified and how different group descriptors canbe extracted from a group. In this section, we de-scribe in detail the content-based group recommenda-tion technique and collaborative-filtering-based grouprecommendation technique for group-based movie rec-ommendation, focusing on how different group descrip-tors may be utilized in the recommendation process toachieve the best group overall satisfaction.

6.1 Content-Based Group RecommendationContent-based recommendation techniques are based

on the assumption that if a user likes item A, then shewill like other items that are similar to item A. Inthe case of movies, by analyzing the attributes of dif-ferent movies (e.g., director, actors) and their similar-ities, movies of similar attributes can then be recom-mended based on users’ movie viewing history. To makecontent-based group recommendation using SocialFu-sion, we need to overcome the following key challenges:(1) obtaining individual members’ movie preferences;(2) generating group-based movie preferences; and (3)identifying dominating movie attributes for similaritymeasure. We address these challenges as follows.

First, to obtain the movie preferences of individual

8

members in a group, we examine these users’ Facbookprofiles, which are automatically captured and integratedinto the SocialFusion framework. A user’s Facebookprofile usually contains a favorite movie list which de-scribes the movies that the user likes the best. How-ever, these are just positive examples. To fully capturea user’s movie interest, we also need to obtain negativeexamples, i.e., the movies that a user does not like. Toaddress this problem, we introduce a mapping schemethat makes use of the Netflix data set [4], which con-tains over 100M ratings of nearly 18K movies by over480K users. On average, each user rates over 200 moviesusing the scale of 1 to 5 (5 is best). Due to the differ-ence in user’s rating criteria, we normalize each user’sratings as follows:

R′ =R− Umin

Umax − Umin. (1)

where R′ and R are the normalized and original ratings;Umax and Umin are the maximum and minimum ratingsof that user. Using the favorite movies listed in a Face-book user’s profile, we then map that user to a Netflixuser by computing the Jaccard similarity coefficient be-tween the Facebook user and a Netflix user and pick themost similar Netflix user. In a real usage scenario, itis possible for SocialFusion to automatically integrate auser’s Facebook account and Netflix account.

Next, we combine the movie preferences of individ-ual group members to generate a movie preference forthe entire group. We select the movies which have beenrated by most of the group members. More importantly,we harness two group-based satisfaction metrics: maxi-mizing satisfaction and minimizing misery. Let G be auser group, and ru,m be the rating of user u on moviem. Then the group-based satisfaction metrics are

rmax−satisfaction(G, m) =1|G|

∑u∈G

ru,m (2)

rmin−misery(G, m) = min{ru,m|u ∈ G} (3)

Based on the group ratings, movies are divided into twocategories: the ones that the group like or dislike. Thethreshold is set to be the mean plus standard deviationof all the movie ratings by the group.

Given a group’s liked and disliked movies, we obtainthe “plot key words” for each movie from a local copyof the IMDB data set 1. Since not all plot key wordsare important in terms of classifying the liked and dis-liked movies, we compute the expected information gainIG(w, M), which is the “classification power” of plotkey word w in a set of movies M , i.e., whether the pres-ence or absence of w helps us to determine if the grouplikes a movie or not:

IG(w, M) = I(M)−( |Mw||M |

·I(Mw)+|Mw||M |

·I(Mw

)(4)

1IMDB http://www.imdb.com/interfaces

Social Link and Activity

Dataset

Movie Preference

Dataset

Social Graph Processor

Group Describers

Group Recommender

System

Predicted or Actual Group Member Movie Ratings

Predicted Movie Ratings

for Group

Figure 4: Collaborative-filtering-based grouprecommendation workflow

where I(M), I(Mw), and I(Mw) are the entropy of theset of all movies, movies containing w, and movies notcontaining w, respectively. Here are some of the mostinformative plot key words obtained from a collectionof 80 movies:

Airplane Accident, Child, Tragic Villain, WWII, 1930s, 1940s,

Monkey, Single Mother, Mental Illness, Nightclub, Compassion,

Crushed To Death, Disney Animation Feature, Technology, ...

Once we have identified the most informative plot keywords, we can then make group movie recommendationusing standard machine learning techniques, includingID3, Naive Bayes and Support Vector Machine (SVM).

6.2 Collaborative-Filtering-Based GroupRecommendation

SocialFusion also uses collaborative filtering heuris-tics to generate recommendations for groups of users.We first use a standard collaborative filtering system,like Apache Mahout Taste, to generate individual pre-dicted movie ratings for each group member for a spec-ified set of movies. After obtaining these individualmovie ratings, we compute predicted movie ratings forthe group using several different heuristic functions. Wehave implemented five different recommendation meth-ods. Figure 4 depicts the general workflow. In eachmethod, we use an aggregation function to computea predicted movie rating for the group of users basedon the individual predicted or actual movie ratings ofeach group member. The first four methods use eachof the four descriptors (leader, expert, social, and simi-larity) independently. The fifth method, called the all-descriptors recommender, determines a predicted ratingfor a group of users by computing the weighted sum ofthe predicted group ratings output from the first fourrecommender systems.

A weighted sum function is used to compute a pre-dicted group rating in the leader and expert based rec-ommendation methods:

rG,m = k∑u∈G

desc(u) · ru,m (5)

where rG,m is the value of the predicted rating for groupG and movie m, user u is a member of G, desc(u) is

9

the descriptor value for u, and ru,m is u’s predictedrating for v. The multiplier k serves as a normalizingfactor and is defined as k = 1/

∑u∈G desc(u). Note that

this function is similar to the aggregation functions usedin many heuristic-based (memory-based) collaborativefiltering systems [6].

Describers that measure the relative strength of rela-tions between users need a way to emphasize the ratingsof two users in some way that would affect the expectedgroup rating. If the relationship is very strong, theshared rating of the users would be pushed away fromthe mean (toward the extremes of the rating range) andif the relationship is weak the shared rating would bepushed toward the mean. This is accomplished by usinga specialized sigmoid function which closely resemblesa Gompertz curve centered around the mean:

f(x) =S

1 + e−wt(6)

where S is the magnitude of the rating scale, w is anappropriate weight shaping the sigmoid function and

t =( r1r2

r2mean

) s12smean (7)

The variable t allowed the integration of user ratings(r1, r2) and the users’ relationship strength s12 in sucha way that when user ratings are greater than the meanrating rmean then the base of the exponent is greaterthan 1.0 and conversely when the ratings are less thanrmean the base is less than 1.0. Since this base is raisedto the users’ relationship strength s12 over the meanrelationship strength smean then the overall value of t isfurther from 1.0 when the users’ relationship strength isgreater than average and is nearer to 1.0 when the users’relationship strength is less than average. Thereforethe mean of many values of the sigmoid function willbe affected more by the fused ratings of users with astrong relationship than the fused ratings of users’ withweak relationships.

The similarity and social recommender systems bothuse the sigmoid function described above to compute apredicted movie rating for a group:

rG,v =

∑u1,u2∈G sigmoid(desc(u1, u2), r1, r2)

|Pairs(G)|(8)

where sigmoid(desc(u1, u2), r1, r2), desc(u1, u2) is thevalue of the similarity or social descriptor between usersu1 and u2, and |Pairs(G)| is the number of pairs ofusers in G with similarity/social descriptor values.

The all-descriptors recommender determines a pre-dicted movie rating for a group by computing the weightedsum of the predicted movie ratings generated from thefour single-descriptor recommender systems:

rG,v = a · rG,v,expert + b · rG,v,leader+c · rG,v,similarity + d · rG,v,social

(9)

where rG,v,descriptorName is the group movie rating pre-dicted by each single-descriptor recommender system.We determine the optimal weights a, b, c, d for a groupby performing a brute-force iterative search to find theweights which produce the minimum RMSE value.

7. EVALUATIONIn this section, we conduct a set of evaluations to val-

idate the design of the SocialFusion system and demon-strate its effectiveness in terms of capturing diverse groupcharacteristics and facilitating context-aware and group-based recommendation. Specifically, we start by de-scribing the implementation of the mobile system andan evaluation of the K-anonymity algorithm. We thenpresent a detailed analysis of various group descriptorsusing the Facebook and Netflix data sets. We continuewith a real user group study and some key observa-tions. Finally, we present the evaluation results of ourcontent-based and collaborative filtering based grouprecommendation techniques.

7.1 Mobile Client ImplementationWe have designed and implemented a mobile client

that allows users to anonymously share their presence [9]via Bluetooth. This mobile system is composed of amobile component (MC) that resides on a mobile de-vice and a stationary component (SC) that resides ona desktop or laptop PC. The MC, implemented in JavaME, allows a user to log in to his/her Facebook ac-count and initiate anonymous sharing of his/her Face-book ID. The SC, implemented in Java SE, detects auser’s shared Facebook ID and uses it to submit queriesto SocialFusion, such as retrieving the user’s movie pref-erences or accessing the user’s social networks or Face-book wallposts. The MC has been tested on Nokia N95smartphones. We are currently working on a new ver-sion of the MC that will run on Nokia N97 smartphonesand share sensor information from the accelerometer,GPS, magnetometer (compass), and camera with So-cialFusion.

7.2 K-AnonymityWe have implemented a prototype of SocialFusion’s

selective withholding K-anonymity algorithm and per-formed an initial evaluation of its behavior and scal-ability. Selective withholding consists of two compo-nents, namely the Boolean expression builder, and thenthe logic minimizer component that yields the value K.We use an open source Quine-McCluskey implementa-tion [5] to perform logic minimization over the Booleanexpression generated from the directed graph. We ranthe tests on a Macbook using a university Internet con-nection.

Our metrics were gathered using the Facebook ac-count of one of the participants, here called “user A”,

10

who has 222 Facebook friends and seven favorite movieslisted on his Facebook profile. A study has shown thatthe mean number of Facebook friends reported by thestudy participants was between 150 and 200 [17] . There-fore, we suppose that user A is a reasonable represen-tation of a typical or average Facebook user.

We conducted our evaluation of K-anonymity perfor-mance by having SocialFusion submit a query to Face-book requesting the list of favorite movies in the Face-book profile for user A. SocialFusion also gathers the fa-vorite movies lists for each of user A’s Facebook friends.SocialFusion then proceeds to construct a Boolean ex-pression representing the relationship between user A’sfriends and their favorite movies.

Figure 5 shows how the time required to minimizethe unsimplified Boolean expression in the SocialFu-sion logic minimizer component varies with the numberof terms in the unsimplified Boolean expression. Wesee from this plot that the logic minimizer componentscales reasonably well for unsimplified Boolean expres-sions containing up to about 450 terms. The maximumnumber of terms in the unsimplified Boolean expressionthat we saw from this data set of Facebook users andfavorite movies was 450 terms. We expect that 450-term Boolean expressions will account for many typicalusage scenarios, although this will vary based on thenumber of the user’s favorite movies that match withfriends’ favorite movies. There is a nonlinear relation-ship between the number of terms in the unsimplifiedBoolean expression and the number of terms in the sim-plified expression. Based on the results of our tests, wehave found that unsimplified Boolean expressions con-taining around 450 terms have up to about 20 termswhen simplified by the SocialFusion’s logic minimizer.20 terms in the simplified Boolean expression providesK-anonymity guarantees for k = 20. Thus, we haveshown that SocialFusion’s selective withholding is fea-sible for K-anonymity guarantees up to k = 20, whichincludes user groups as large as most social networkfriend lists (consisting of 200–300 friends).

Table 2 shows the run times of each component ofSocialFusion for the following conditions for user A:number of friends = 222, number of movie matches =13, number of friend matches = 7, number of termsin unsimplified expression = 339, number of terms insimplified expression = 7. The SocialFusion total runtime in table 2 is the time for our system to returnK-anonymous favorite movie preferences for a user. Inour tests, the run time of the social network data gath-erer component dominates the run time of SocialFusion.We expect that running SocialFusion with local accessto a user database would significantly reduce the runtime of this component. However, the current averagetotal SocialFusion run time of 1377 ms for our testsprovides acceptable performance for applications such

1

10

100

1000

0 100 200 300 400 500

Minim

iza'

on 'me (m

s)

Unsimplified number of terms

Figure 5: Time to minimize the unsimplifiedBoolean expression vs. number of terms in theunsimplified Boolean expression

Table 2: Run times of SocialFusion ComponentsComponent Mean Std-Dev (σ)data gatherer 852 msec 145 msecexpression builder 11 msec 1.7 mseclogic minimizer 297 msec 8.29 msectotal 1377 msec 134.7 msec

as context-aware video playback.

7.3 Group DescriptorsTo evaluate the different types of group descriptors,

we analyze a publicly available Facebook data set com-posed of over 60,000 Facebook users [32]. We selectedseveral groups by randomly choosing a starting usernode in the social graph constructed from this data setand then performing a breadth-first traversal to findconnected nodes. Figures 6 and Figure 7 show socialgraphs for two groups composed of five and ten users,respectively.

Summary group descriptor values of the two groupsare shown in Figure 8. The Expert summary value iscomputed as

∑u∈G |Mu|/|G|, where |Mu| is the num-

ber of favorite movies user u has. The Leader summaryvalue is computed as max(|Nu|)

(P

u∈G |Nu|)/|G| , where |Nu| is thenumber of friends u has in group G. The Similarity andSocial summary values are computed as

Pu1,u2∈G desc(u1,u2)

|Pairs(G)| ,where desc(u1, u2) is the value of the Similarity or Socialdescriptor between users u1 and u2 in G, and |Pairs(G)|is the number of user pairs in G with Similarity or Socialdescriptor values. From Figure 8, we can make the fol-lowing observations between the 5-user group and the10-user group: (1) the 5-user group has more exper-tise; (2) the 10-user group is more leader-based; (3) the5-user group is more socially cohesive; and (4) the 10-user group has some users with similar movie interests.These results demonstrate the heterogeneity of groupsand the importance of capturing group characteristicsfor better context-aware and group-based actions.

11

Figure 6: A social group com-posed of five Facebook users.

Figure 7: A social group com-posed of ten Facebook users.

Group Group Summarysize descriptor value

5-user Expert 8.205-user Leader 2.005-user Similarity 05-user Social 1.2010-user Expert 1.9010-user Leader 2.3710-user Similarity 0.017710-user Social 1.0

Figure 8: Group descriptor sum-mary values of social groups.

7.4 Group Study of Real UsersTo understand how a group of people reaches a de-

cision and how the group decision may be affected byvarious group characteristics, we have conducted a realuser group study consisting of 12 participants. Thereare 9 males and 3 females in the group, aged between18 and 30, and with different ethnic backgrounds. Ourstudy was conducted in two stages.

In the first stage, the participants were asked to rate100 movies on a scale of 1–5 or indicate that they hadnot seen a movie. On average, each participant has seenand rated 68.6% of the 100 movies, and each movie hasbeen rated by at least 70.0% of the participants. Eachparticipant was also asked to specify how often he/sheinteracts with other participants in the group. Eachparticipant knows some members in the group but notall of them, thus forming a connected social graph withmultiple hops between certain members. In the secondstage, all participants gathered in the same room andrated 20 movies. Each movie is rated as follows: (1)show the movie trailer; (2) participants write down theirown rating of the movie; (3) participants discuss themovie and vote on a single group rating for that movie;and (4) participants write down how they feel about thegroup decision and why they feel that way.

We make several important observations based on ourreal-user group study. First, certain individuals inter-acted much more than other members of the group, andtheir opinions seemed to influence other members in thegroup – a pattern captured by our Leader Describer.Second, certain members had not seen the movies andseek the opinion of others who have seen the movie ormany other movies – such expert-based effect is cap-tured by our Expert Describer. Third, when asked howthey feel about the group decisions, some participantspointed out that other members had similar tastes totheir own and that their opinion was highly affectedby these similar participants – this is captured by ourSimilarity Describer. It was also observed that certainmembers of the group knew each other very well and

Figure 9: Screenshot of a SocialFusion’s videoapplication playing a recommended film.

tended to talk directly to each other, forming a sharedopinion in the group – such social relationship basedeffect is captured by our Social Describer.

7.5 Group-Based Movie RecommendationNext, we evaluate the performance of group-based

movie recommendation, using either content-based orcollaborative filtering based group recommendation meth-ods. To evaluate the content-based methods, we use theindividuals’ movie ratings obtained in the first stage ofthe real user group study as the training data, and threesupervised learning algorithms (ID3, NaiveBayes, andSVM). We consider two group-based prediction meth-ods: maximizing-satisfaction (i.e., maximizing mean valueof predicted ratings) and minimizing-misery (i.e., maxi-mizing the minimum predicted rating). To evaluate thecollaborative filtering based methods, we use individualusers’ movie ratings to predict their group ratings, uti-lizing one or all the four group descriptors: leader, ex-pert, similarity, and social relationship. Figure 9 showsa movie trailer recommended to a SocialFusion user.

Content-based group recommendation.Table 3 shows the precision and recall of different

content-based methods. SVM achieves better perfor-

12

0

0.2

0.4

0.6

0.8

1

2 4 6 8 10 12 14 16 18 20Nor

mal

ized

mov

ie r

atin

g

Movie ID

Group preferenceMaximizing satisfaction

Minimizing misery

Figure 10: Comparison of actual group ratings, maximizing-satisfaction, and minimizing-misery.

0

20

40

60

80

100

20 40 60 80 100

Per

form

ance

(%

)

Number of movies

PrecisionRecall

Figure 11: Performance vs. number ofmovies in the training set.

Table 3: Performance of content-based grouprecommendation methods

ID3 NaiveBayes SVMmaximizing precision 50.0% 25.0% 65.6%satisfaction recall 30.0% 50.0% 60.0%minimizing precision 50.0% 50.0% 76.3%

misery recall 50.0% 10.0% 55.0%

mance than ID3 and NaiveBayes. More importantly,minimizing misery achieves better group recommenda-tion than maximizing satisfaction. This is also demon-strated in Figure 10. The Pearson correlation coeffi-cient between maximizing-satisfaction and actual grouprating is only 0.5927, while minimizing-misery achieves0.9485 correlation with actual group rating. This canbe explained by the relatively sparse group structure(social descriptor value of 118.6 vs. maximum score of365.0), i.e., the members in this group are less cohesivethan average (half of the max or 180.0). In this case,the users are more like strangers and minimizing-miserytends to be the dominating effect. We further investi-gate whether fewer number of movies in the training setdegrades the recommendation performance. As shownin Figure 11, when the number of training movies de-creases from 100 to 20, there is no significant perfor-mance degradation in terms of precision and recall.

Collaborative filtering based group recommendation.In the real user group study, we obtained both movie

ratings of individual users and group movie ratings. Toevaluate the collaborative filtering based group recom-mendation methods we have developed, we take themovie ratings of individual users and predict the grouprating using different group descriptors, then comparethe predicted group rating to the actual group rating.The metric we use is root mean squared error (RMSE):

RMSE =

√∑m∈M (rG,m − r′G,m)2

|M |(10)

which measures the difference between group G’s actualrating rG,m and predicted rating r′G,m for each movie min a set of movies M . Table 4 shows the RMSE values ofthe baseline (mean-based) approach and all five of ourcollaborative filtering methods. Of the single-descriptorapproaches, the similarity recommender has the best

Table 4: Performance of collaborative filteringbased group recommendation methods

Recommender RMSE Improvementover baseline

mean-based (baseline) 0.7658 –social 0.9304 -21.50%leader 0.8267 -7.95%expert 0.7554 1.36%similarity 0.7199 5.99%all-descriptors 0.7186 6.16%

accuracy. The all-descriptors approach achieves evenbetter accuracy, with the optimal weights a = 0.0168,b = 0, c = 0.92, and d = 0.0632. These weights showthat for this group, user similarity (c) has the mostimpact on the group’s movie ratings, while social rela-tionship (d) has a small but noticeable impact. Thisconfirms our observation that content similarity wasdominant in our user study group, which consisted ofloosely-connected individuals.

The 6.16% RMSE improvement over the baseline wouldbe significant in the Netflix Prize context 2, though ourproblem setting differs in its scale and group focus.

8. FUTURE WORKWe plan to integrate more components into Social-

Fusion, including input from fixed sensors, a hybridrecommendation engine, and the anonymization layerwith its privacy and security components. We alsoplan to explore other context-aware multimedia appli-cations beyond our case study. We intend to conductfurther user studies to evaluate SocialFusion’s usabil-ity, the impact of its recommendations on users’ deci-sions, and how privacy affects users’ attitudes towardsSocialFusion. Scalability and performance are other ar-eas that need more investigation. We further plan toincorporate an API so that other context-aware appli-cations can easily build upon our framework. We alsowill release SocialFusion as open source for the researchcommunity, so that other researchers can add their ownmodules for inference, recommendation, and context-aware experimentation.

2http://www.netflixprize.com/community/viewtopic.php?id=828.

13

9. CONCLUSIONSThis paper has presented SocialFusion, a new frame-

work for fusing together mobile, social and sensor net-works in order to support the next generation of increas-ingly sophisticated context-aware applications. Social-Fusion consists of 3 stages: first, a data gathering andmanagement stage, including a novel K-anonymizationalgorithm; next, an inference stage that fuses togetherthe diverse data streams using describer modules to ex-tract contextual clues called descriptors; finally, a rec-ommendation stage that leverages the rich assembleddata and descriptors to recommend a context-aware ac-tion. A case study of a group-based context-aware videoapplication illustrated how SocialFusion can improvecontext awareness, especially for groups of people.

10. REFERENCES[1] Brightkite. http://brightkite.com.[2] Loopt. http://www.loopt.com.[3] Mobile apps help people track h1n1, prevent flu, nov 25,

2009, usatoday. http://www.usatoday.com/news/health/2009-11-25-new_fluapps23_ST_N.htm.

[4] Netflix. http://www.netflix.com.[5] Quine-mccluskey algorithm (java).

http://en.literateprograms.org/Quine-McCluskey_algorithm_(Java).

[6] G. Adomavicius and A. Tuzhilin. Toward the nextgeneration of recommender systems: A survey of thestate-of-the-art and possible extensions. IEEE Transactionson Knowledge and Data Engineering, 17(6):734–749, 2005.

[7] M. Balabanovic and Y. Shoham. Fab: Content-based,collaborative recommendation. Communications of theACM, 40(3), 1997.

[8] A. Beach, M. Gartrell, S. Akkala, J. Elston, J. Kelley,K. Nishimoto, B. Ray, S. Razgulin, K. Sundaresan,B. Surendar, M. Terada, and R. Han. Whozthat? evolvingan ecosystem for context-aware mobile social networks.IEEE Network, 22(4):50–55, July-August 2008.

[9] A. Beach, M. Gartrell, and R. Han. Solutions to securityand privacy issues in mobile social networking. InWorkshop on Social Mobile Web (SMW09), associatedwith SocialCom09, 2009.

[10] D. Billsus, Pazzani, and J. Chen. A learning agent forwireless news access. In The International Conference onIntelligent User Interfaces, 2002.

[11] R. K. Brayton, A. L. Sangiovanni-Vincentelli, C. T.McMullen, and G. D. Hachtel. Logic MinimizationAlgorithms for VLSI Synthesis. Kluwer AcademicPublishers, Norwell, MA, USA, 1984.

[12] R. Burke. Hybrid recommender systems: Survey andexperiments. User Modeling and User-Adapted Interaction,12(4), 2002.

[13] A. Campan and T. Truta. A clustering approach for dataand structural anonymity in social networks. In PinKDD2008. ACM, August 2008.

[14] L. P. Cox, A. Dalton, and V. Marupadi. Smokescreen:flexible privacy controls for presence-sharing. In MobiSys’07: Proceedings of the 5th international conference onMobile systems, applications and services, pages 233–245,New York, NY, USA, 2007. ACM.

[15] N. Eagle and A. Pentland. Social serendipity: Mobilizingsocial software. IEEE Pervasive Computing, 4(2),April-June 2005.

[16] W. K. Edwards and R. Grinter. At home with ubiquitouscomputing: Seven challenges. In Proceedings of the 3rdInternational Conference on Ubiquitous Computing(Ubicomp 2001), pages 256–272, May 2001.

[17] N. Ellison, C. Steinfield, and C. Lampe. The benefits offacebook “friends:” social capital and college students’ useof online social network sites. Journal ofComputer-Mediated Communication, 12(4), 2007.http://jcmc.indiana.edu/vol12/issue4/ellison.html.

[18] I. Garcia, L. Sebastia, E. Onaindia, and C. Guzman. Arecommendation system for tourist activities. LNCS: E.Commerce and Web Technologies, 5692, 2009.

[19] C. M. Gartrell. Socialaware: Context-aware multimediapresentation via mobile social networks. Master’s thesis,University of Colorado at Boulder, December 2008.

[20] A. Jameson. More than the sum of its members: challengesfor group recommender systems. In Proc. of the workingconf. on Advanced visual interfaces (AVI 2004), pages48–54, Gallipoli, Italy, May 2004.

[21] H. Jeung, M. L. Yiu, X. Zhou, C. S. Jensen, and H. T.Shen. Discovery of convoys in trajectory databases. Proc.VLDB Endow., 1(1):1068–1080, 2008.

[22] T. Kindberg and J. Barton. A web-based nomadiccomputing system. Internet, August 2000.

[23] J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon,and J. Riedl. Grouplens: Applying collaborative filtering tousenet news. Communications of the ACM, 40(3), 1997.

[24] P. X. Liu, M. Meng, and C. Hu. On-line data-driven fuzzyclustering with applications to real-time robotic tracking.In 2004 IEEE International Conference on Robotics andAutomation, April 2004.

[25] S. Loeb. Architecting personal delivery of multimediainformation. Communications of the ACM, 35(12), 1992.

[26] J. Manweiler, R. Scudellari, Z. Cancio, and L. P. Cox. Wesaw each other on the subway: secure, anonymousproximity-based missed connections. In HotMobile ’09:Proceedings of the 10th workshop on Mobile ComputingSystems and Applications, pages 1–6, New York, NY, USA,2009. ACM.

[27] G. Miklau, D. Jensen, P. Weis, and S. Srivastava.Anonymizing social networks. Technical Report TechnicalReport 07-19, Computer Science Department, University ofMassachusetts Amherst, March 2007.

[28] E. Miluzzo, N. D. Lane, S. B. Eisenman, and A. T.Campbell. Cenceme - injecting sensing presence into socialnetworking applications. In Proc.of the 2nd European Conf.on Smart Sensing and Context (EuroSSC 2007), Oct. 2007.

[29] E. Miluzzo, N. D. Lane, K. Fodor, R. Peterson, H. Lu,M. Musolesi, S. B. Eisenman, X. Zheng, and A. T.Campbell. Sensing meets mobile social networks: thedesign, implementation and evaluation of the cencemeapplication. In Proc. of the 6th ACM Conf. on EmbeddedNetwork Sensor Systems (SenSys 2008). ACM, Nov. 2008.

[30] M. O’Connor, D. Cosley, J. Konstan, and J. Riedl.Polylens:a recommender system for groups of users. In TheEuropean Conference on Computer-Supported CooperativeWork, 2001.

[31] B. Thompson and D. Yao. The union-split algorithm andcluster-based anonymization of social networks. InProceedings of the 4th International Symposium onInformation, Computer, and Communications Security(ASIACCS 2009), pages 218–227. ACM, March 2009.

[32] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi.On the evolution of user interaction in facebook. InProceedings of the 2nd ACM SIGCOMM Workshop onSocial Networks (WOSN’09), August 2009.

[33] Q. Wei and Y. Lu. Preservation of privacy in publishingsocial network data. In Proceedings of the 2008International Symposium on Electronic Commerce andSecurity (ISECS 2008), pages 421–425. IEEE ComputerSociety, March 2008.

[34] M. Wirz, D. Roggen, and G. Troster. Decentralizeddetection of group formations from wearable accelerationsensors. In Proceedings of the 2009 IEEE InternationalConference on Social Computing, page IEEE Press, 2009.

14

socialfusion: context-aware inference and recommendation...

Documents