sensing and modeling human networks - mit media lab2 abstract in this thesis, i describe a plan to...

31
1 Sensing and Modeling Human Networks by Tanzeem Khalid Choudhury Thesis Proposal for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology April 2003 Thesis Advisor _________________________________________________________________ Prof. Alex Pentland Dept. of Media Arts and Sciences Massachusetts Institute of Technology Thesis Reader _________________________________________________________________ Prof. Thomas J. Allen Sloan School of Management Massachusetts Institute of Technology Thesis Reader _________________________________________________________________ Prof. Michael Kearns Dept. of Computer and Information Science University of Pennsylvania

Upload: others

Post on 15-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Sensing and Modeling Human Networks

    by

    Tanzeem Khalid Choudhury

    Thesis Proposal

    for the degree of Doctor of Philosophy at the

    Massachusetts Institute of Technology

    April 2003

    Thesis Advisor _________________________________________________________________

    Prof. Alex Pentland

    Dept. of Media Arts and Sciences

    Massachusetts Institute of Technology

    Thesis Reader _________________________________________________________________

    Prof. Thomas J. Allen

    Sloan School of Management

    Massachusetts Institute of Technology

    Thesis Reader _________________________________________________________________

    Prof. Michael Kearns

    Dept. of Computer and Information Science

    University of Pennsylvania

  • 2

    Abstract

    In this thesis, I describe a plan to use wearable sensors and machine learning algorithms for sensing and modeling

    human networks. A human network is defined as the pattern of communication, advice or support which exists

    among the members of a social system.

    I introduce the “sociometer”, a specially designed wearable sensor package, built for measuring face-to-face

    interactions between people. The hypothesis of this thesis is that the pattern of communication within groups of

    people can be captured by wearable sensors worn by the groups for an extended period of time and that this structure

    can be recognized by machine learning methods. Furthermore, we can categorize different types of interactions we

    have with different types of people. First, I will derive methods to extract robust features from the sensor stream that

    will enable us to detect interactions. Then these features will be used to build models of the communication structure

    and the dynamics of group interactions.

    Knowledge of how people interact is important in many disciplines, e.g. organizational behavior, social network

    analysis and knowledge management applications such as expert finding. At present researchers mainly have to rely

    on questionnaires, surveys or diaries in order to obtain data on physical interactions between people. In this thesis, I

    hope to show that computational models of group interactions can be built using sensor measurements from the

    sociometer. The primary contributions of this thesis are a unique data set of sensor data collected from a group of 23

    individuals about their daily interactions, and a data-analysis pipeline for extracting relevant interaction structure

    from this data. The thesis will compare the results with survey data collected from the same group of people while

    they were wearing the sociometers. Evaluation of the computational models derived from sensor data will be done

    using various measures of centrality used in the social sciences. I will also compare and contrast the survey results

    with the results from the sociometer data.

  • 3

    Table of Contents

    1 INTRODUCTION ...........................................................................................................................................4

    2 BACKGROUND AND MOTIVATION.........................................................................................................5

    3 THE SOCIOMETER: A WEARABLE DEVICE FOR UNDERSTANDING HUMAN NETWORKS...8

    3.1 DESIGNING THE SOCIOMETER ............................................................................................................8

    4 DATA ANALYSIS: THE INFLUENCE MODEL........................................................................................9

    4.1 THE INFLUENCE MATRIX .................................................................................................................11

    4.2 LEARNING FOR THE INFLUENCE MODEL ..........................................................................................12

    4.2.1 Expectation-Maximization Method.................................................................................... 12

    4.2.2 The Constrained Gradient Descent Method ...................................................................... 13

    4.2.3 Performance of the Learning Algorithms .......................................................................... 15

    5 EXPERIMENTS USING THE SOCIOMETER.........................................................................................16

    5.1 THE DATA COLLECTION METHOD .....................................................................................................16

    5.2 INITIAL ANALYSIS OF THE DATA.......................................................................................................18

    6 EVALUATING THESIS RESULTS............................................................................................................25

    7 INTELLECTUAL CONTRIBUTIONS.......................................................................................................27

    8 TIME-LINE....................................................................................................................................................29

    9 RESOURCES.................................................................................................................................................29

    10 SHORT BIOGRAPHY..................................................................................................................................29

    11 BIBLIOGRAPHY..........................................................................................................................................29

  • 4

    “Social scientist, particularly economists, have a fatal attraction for working on theoretical propositions with

    mathematical models and avoiding the real world.”

    Richard Cyert (1994)

    1 Introduction In almost any social and work situation our decision-making is influenced by the actions of others around us. Who

    are the people we talk to? For how long and how often? How actively do we participate in the conversations?

    Answers to these questions have been used to understand the success and effectiveness of a work group or an

    organization as a whole. Can we identify the differences between people’s interactions? Can we identify the

    individuals who talk to a large fraction of the group or community members? Such individuals, often referred to the

    connectors, have an important role in information diffusion [1]. Thus, learning the connection structure and nature of

    communication among people are important in trying to understand the following phenomena: (i) diffusion of

    information (ii) group problem solving (iii) consensus building (iv) coalition formation etc. Although people heavily

    rely on email, telephone and other virtual means of communication, research shows that high complexity

    information is mostly exchanged through face-to-face interaction [2]. Informal networks of collaboration within

    organizations coexist with the formal structure of the institution and can enhance the productivity of the formal

    organization [3]. Furthermore, the physical structure of an institution can either hinder or encourage communication.

    Usually the probability that two people communicate declines rapidly with the distance between their work location

    [2, 4]. Being able to measure the relationship between communication networks and different environmental and

    organizational attributes will enable us to create better workplaces with improved communication and collaboration

    among their members.

    Figure 1: Role of from face-to-face interaction in high complexity information exchange (from Prof. Allen's paper

    [2])

  • 5

    As wearable computers become more and more integrated into our daily life it provides the opportunity to use

    appropriate sensors to capture information about how groups of people interact over periods of weeks, months and

    even years. In this thesis, we use wearable sensor packages that are uniquely designed to collect auditory, motion

    and identification data from people wearing them. However, just collecting data is not enough, we need to have the

    mechanism to extract patterns from the raw sensor data.

    The hypothesis of this thesis is that the patterns of communication within a group people can be captured by

    wearable sensors worn by the group and that this structure can be recognized by machine learning methods. Many of

    our interactions over periods of weeks and months are repetitive – i.e. I talk to my office mate everyday, my advisor

    once a week etc. Hence if we are able to collect sensor data from people over an extended period of time it will be

    an ideal dataset for statistical pattern recognition algorithms to learn those repeatable interaction patterns. My first

    goal is extract robust interaction features from the data, which capture information about who we talk to, when we

    talk to them, the duration and nature of our face-to-face interaction. These features can be used to learn the

    communication network of the group – i.e. are some people better connected than others? Does the structure of the

    organization and physical layout impose barriers communication? We believe that these techniques can complement

    and augment existing manual techniques for data collection and analysis. The results can be used for understanding

    human communication patterns studied in organizational behavior and social network analysis. The knowledge of

    people’s communication networks can also be used in improving context-aware computing environments and

    coordinating collaboration between group and community members. Then I would like to build models of

    interaction that identify different types of interaction a person has with members in his network.

    In summary, I would like to build a complete system that will enable sensor-based approach to measuring

    interactions between groups of people. I will address issues from user acceptable sensor selection to sensor

    packaging to robust feature extraction and dynamic models of group interaction. The results of this thesis will be

    compared with measures used in social network analysis for understanding group behavior. The goal is to discover

    how much information about social network relationships can be derived by applying statistical machine learning

    techniques to data obtained from wearable sensors. I hope to lay the groundwork for being able to automatically

    study how different groups within social or business institutions connect, understand how information propagates

    between these groups and analyze the effects of new policy or new technology on the group structure.

    2 Background and Motivation “We like to think of ourselves as individuals capable of making up our own minds about what we think is important, and how to live our lives. Particularly in the United States, the cult of the individual has gained a large and devoted following, governing both our intuitions and our institutions. Individuals are to be seen as independent entities, their decision are to be treated as originating from within, and the outcomes they experience are to be considered indicator or their innate qualities and talents. It’s a nice story, implying as it does not only the theoretically attractive notion that individuals can be modeled as rationally optimizing agents, but also morally appealing message that each person is responsible for his or her own actions. However, there is a difference between holding someone accountable for their actions and believing the explanation for those actions is entirely self-contained. Whether we are aware of it or not, we rarely, if ever, make decisions completely independently and in

  • 6

    isolation. Often we are conditioned by our circumstances, our particular life histories and our culture. We also can not help but be influenced by the mesmerizing pool of universally available, often media-driven information in which we continually swim. In determining the kind of person that we are and the background picture against which our lives play out, these generic influences determine both the expertise and the preferences we bring into and decision-making scenario. But once we are in the scenario, even our experience and predispositions may be insufficient to sway us entirely one way or the other. This is where the externalities – whether information, coercive, market or coordination externalities – enter to play a crucial role. When push comes to shove, humans are fundamentally social creatures, and to ignore the role of social information in human decision making – to ignore the role of externalities – is to misconstrue the process by which we come to do the things we do.”[5]

    Social network analysis focuses on uncovering the patterns of people’s interactions from data. Many believe that the

    success or failure of societies and organizations often depends on the patterning of their internal structure. This area

    of research tries to characterize the relationship among groups of people[6-9]. Based on network data gathered

    primarily from surveys or self-reports the goal is not only to learn the topology but also to understand the roles and

    position of the members of the network. The emphasis is on analysis techniques that can explain the behavior of the

    collective – be it in an organization, school or communities[10-13]. It is important to note that many of these

    techniques make very insightful discoveries by asking simple but relevant questions or queries about the nature of

    communication of groups.

    Recently there has been significant research efforts in the ‘science of networks’ - deriving models that can explain

    how information propagates, contagious disease spreads, and even terrorist organizations operate. A popular

    approach is to propose a model and then validate it based on how well the model explains various phenomenon

    observed by the system[14-17]. Some work focuses on the generalizability of the models to different domains

    starting from statistical physics to social networks to the worldwide web. Another important direction of this

    research is how one can use local/partial information about the network structure to propagate information or search

    for targets efficiently [18-20].

    Statistical approaches to understanding the topology and dynamics of how groups of people exchange ideas or

    propagate information has been done for online communities and the worldwide web [21, 22], where there is an

    abundance of data and measurements of system parameters is not uncertain. But using similar techniques to study

    the informal networks that arise from face –to-face interactions have not been explored much. There are a couple of

    reasons behind that – (i) there is a lot of data readily available for analysis (ii)there is much less ambiguity in the

    data compared to sensor measurements of face-to-face interactions. Whereas it may be trivial to figure out whether

    A sent and email to B but quite difficult to figure out whether A spoke to B using sensors such as microphones,

    cameras etc. So there is two stages of studying face-to-face communication – (i) extracting reliable measurements

    from sensors and then (ii) using these features to extract meaningful parameters that explain the pattern of

    interactions.

    There is also long history of work in the social sciences aimed at understanding the interactions between individuals

    and influencing their behavior. In the psychology community, there are many instances of work studying these

  • 7

    effects. At a simple level, in the work of Wells and Petty [23], the authors show how a speaker’s confidence could

    be significantly influenced by repeated head nodding from the audience. At a more complex level the structural

    relationships between people and their physical proximity can effect the collaboration pattern and adoption of new

    ideas[4, 5]. Studies of this kind give us interesting insights into the workings of human dynamics. In many cases,

    the experimenters have been able to take quantitative measures of behavior changes by looking at task performance

    or questionnaire responses. However, there are important aspects of the behavior which can not be captured this

    way – how much one person is influencing another's behavior, the intensity of the interaction, and so on. Thus far,

    experimenters have relied on qualitative measures based on hand annotations ("Bill talked more to Susan during this

    session" or "Employee talks to W, Y and Z daily and never to S") or anecdotal incidents.

    In the field of ubiquitous and pervasive computing researchers have been working for years in sensor-based

    approach to modeling humans and dealing with sensor uncertainty and noise. There has been work in trying to

    understand the identity and activity of people in meeting rooms, offices and even in open public-spaces [24-29]. The

    basic notion of context-aware computing is to give computers the ability to understand one’s surroundings.

    However, so far the focus is on single-person perspective: to understanding what and who is around the user of the

    system and to better provide information or resources to that person. One important aspect often ignored is how

    others fit into a person’s interaction network which can be an important factor in designing a better context-aware

    agent or environment.

    Some research in using sensors to support face-to-face communication comes close to being able to analyze how

    people tend to interact within a community. The work with active badges focus mainly on presence rather than

    actions of people – e.g. the Olivetti badge system developed by AT&T could identify a person’s location in a

    building but not whether they were talking or with whom[30]. A few initiatives in learning social-network structure

    using sensors is beginning to emerge[31], however, so far these systems only look use proximity information to

    make inferences. In the interests of studying the interactions between humans and the influences of various

    experimental variables, we worked on an experimental setup we called the "Facilitator Room." Our goal was to use

    computer vision, speech processing, and machine learning tools towards quantifying how much one person is

    influencing another's behavior[32]. The main disadvantage of the Facilitator room was the difficulty of obtaining

    natural interactions. If the goal is to map, explain and eventually influence human communication and interactions

    the ability to get sensor measurements from people in their regular setting without imposing and special constraints

    or requirements is critical. In order to overcome the constraints imposed by the Facilitator room, we built the

    ‘sociometer’ to measure interactions in groups without changing their daily patterns of interaction.

  • 8

    3 The Sociometer: A wearable device for understanding human

    networks As far as we know, there are currently no sensor-based data-driven methods for modeling face-to-face interactions

    within a community. This absence is probably due to the difficulty of obtaining reliable measurements from real-

    world interactions. One has to overcome the uncertainty in sensor measurements. This is in contrast to modeling

    virtual communities where we can get unambiguous measurements about how people interact – the duration and

    frequency (available from chat and email logs) and sometime even detailed transcription of interactions [33, 34].

    In this thesis, we present a pipeline starting from data collection methodologies to machine-learning techniques to

    infer the structure and relationships that exists in groups of people. Potentially much cheaper and more reliable than

    human-delivered questionnaires, data-driven approaches can be more easily scalable to larger groups. Automatic

    discovery of the high-level group structures within an organization can also provide a sound basis for then exploring

    more fine-grained group interactions using questionnaires or interviews. In the following subsections we will

    describe how we use wearable sensors to measure and build models of interactions.

    3.1 Designing the Sociometer

    The first step towards reliably measuring communication is to have sensors that can capture interaction features. For

    example, we need to know who is talking to whom, the frequency and duration of communication. To record the

    identity of people in an interaction, we equip each person with an infra-red (IR) transceiver that sends out unique ID

    for the person and receives ID from other people in her proximity. We use microphones to detect conversations.

    The sociometer was designed in collaboration with Brian Clarkson [35] for the comfort of the wearer, aesthetics, and

    placement of sensors that are optimal in getting reliable measurements of interactions. It is an adaptation of the

    hoarder board, a wearable data acquisition board, designed by the electronic publishing and wearable computing

    group at the Media lab [36, 37]. The sociometer has an IR transceiver, a microphone, two accelerometers, on-board

    storage, and power supply. The wearable stores the data locally on a 256MB compact flash card and is powered by

    four AAA batteries. A set of four AAA batteries is enough to power the device for 24 hours. Everything is packaged

    into a shoulder mount so that it can be worn all day without any discomfort.

    The sociometer stores the following information for each individual

    (i) Information about people nearby (sampling rate 17Hz – sensor IR)

    (ii) Speech information (8KHz - microphone)

    (iii) Motion information (50Hz - accelerometer)

    Other sensors (e.g. light sensors, GPS etc.) can also be added in the future using the extension board.

  • 9

    Figure 2 - The wearable sensor board

    Figure 3 - The shoulder mounted sociometer

    The success of IR detection depends on the line-of-sight between the transmitter-receiver pair. The sociometer has

    four low powered IR transmitters. The use of low powered IR transmitters is optimal because (i) we only detect

    people in close proximity as opposed to far apart in a room (as with high-powered IR) and (ii) we detect people who

    are facing us and not people all around us (as with RF transmitter). The IR transmitters in the sociometer create a

    cone shaped region in front of the user where other sociometers can pick up the signal. The range of detection is

    approximately six feet, which is adequate for picking up face-to-face communication. The design and mounting of

    the sociometer places the microphone six inches below the wearer’s mouth, which enables us to get good audio

    without a headset. The shoulder mounting also prevents clothing and movement noise that one often gets from clip-

    on mics.

    4 Data Analysis: The Influence Model The step after data collection is building a computational model that can use the data to infer the dynamics of the

    individuals and their interconnections. The learnability and interpretability of a model greatly depends on its

    parameterization. The "Influence Model," developed as a generative model by Chalee Asavathiratham in his PhD

    dissertation [38], is a possible means of representing the influences a number of Markov chains have on each other.

    The model describes the connections between many Markov chains with a simple parameterization in terms of the

    “influence” each chain has on the others. We later found a similar model also proposed by Saul and Jordan[39].

  • 10

    Asavathiratham showed how complex phenomena involving interactions between large numbers of chains could be

    simulated through this simplified model, such as the up/down time for power stations across the US power grid.

    The Influence model is a tractable framework for understanding the recurrent classes of the global system and its

    steady state behavior by doing eigenstructure analysis of the “influence matrix”. This representation makes the

    analysis of global behavior possible, which otherwise would become intractable with increasing number of

    individuals or agents.

    This model seems very appropriate to model the effects we are interested in. In Asavathiratham’s description all

    states were observed. He did not develop a mechanism for learning the parameters of the model – he assumed that

    they were known apriori. Learning the model parameters from observation in an important requirement in our case.

    We generalize the model with hidden states/observations and develop an algorithm for learning the parameters of

    this model from data. We then show the performance of this algorithm and describe the characteristics of the

    solution space using synthetic data.

    The graphical model for the influence model is identical to that of the generalized N-chain coupled HMM, but there

    is one very important simplification. Instead of keeping the entire 1 1 1( | ,..., )i Nt t tP S S S− − , we only keep

    1( | )i jt tP S S − and approximate the former with:

    11 1 1( | ,..., ) ( | )

    i N i jt t t ij t t

    j

    P S S S P S Sα− − −= ∑

    In other words, we form our probability for the next state by taking a convex combination of the pair-wise

    conditional probabilities for our next state given our previous state and the neighbors’ previous state. As a result, we

    only have N QxQ tables and N α parameters per chain, resulting in a total of NQ2 + N2 transition parameters - far fewer parameters than any of the above models. The real question, of course, is whether we have retained enough

    modeling power to determine the interactions between the participants. Asavathiratham refers to the α 's as "influences," because they are constant factors that tell us how much the state transitions of a given chain depend on

    a given neighbor. It is important to realize the ramifications of these factors being constant: intuitively, it means

    that how much we are influenced by a neighbor is constant, but how we are influenced by it depends on its state.

    This simplification seems reasonable for the domain of human interactions and potentially for many other domains.

    Furthermore, it gives us a small set of interpretable parameters, the α values, which summarize the interactions between the chains. By estimating these parameters, we can gain an understanding of how much the chains

    influence each other.

  • 11

    Figure 4 Graphs for (a) a generalized coupled HMM, (b) an Influence Model with hidden states, (c) an Influence

    Model with observed states.

    4.1 The Influence Matrix

    There is clearly a computational advantage of using the influence model framework in terms of the relatively small

    number of parameters that must be learned. Another benefit of this representation is comes also during the analysis

    the global dynamics of the system. It has been shown in [38] that we can make statements about the recurrent states

    and the steady state probabilities of the global system by analyzing the structure of the influence matrix.

    The influence matrix H is defined as the Kronecker product [40] of the network influence matrix A and the local

    state transition matrix P.

    H P= Α ⊗

    The recurrent state of a Markov chain is a state j if Pj(T

  • 12

    of information among the group? If we want consensus among nodes what kind of network graph will help achieve

    that, i.e. can we identify and modify the recurrent states of the network?

    Our framework allows us to take a data driven approach to modeling social dynamics. In the following section we

    describe how we learn individual’s states from observation data of their communication and use those to learn the

    network influence matrix. By doing so we will have learned all the parameters necessary to define the in Influence

    model. We will then be ready to understand the global properties of the system and how we may go about changing

    these properties.

    4.2 Learning for the Influence Model

    The problem of estimating the Influence Model from data can be stated as follows. We are given sequences of

    observations, { }itx , from each chain, i . The goal is to estimate the amount of influence, ijα , that chain j has on chain i, along with the pairwise conditional probability distributions that describe this inter-chain influence,

    1( | )i jt tP S S − . In this section we develop methods for doing this and illustrate them with synthetic data.

    4.2.1 Expectation-Maximization Method

    In Figure 4 we show the graphical model for the most general form of the Influence Model with hidden states and

    continuous observations. Fitting this model to data requires us to maximize the likelihood of Influence Model over

    its free parameters. The likelihood function can be readily written as:

    0 0 0 1( , ) ( ) ( | ) ( | ) ( | )i i i i i i j

    t t ij t tji i t

    P S X P S P x S P x S P S Sα − =

    ∑∏ ∏∏

    One possibility for estimating the parameters of this model is Expectation-Maximization. The E-step requires us to

    calculate ( | )P S X which in most cases amounts to applying the Junction Tree algorithm (exact inference) or other

    approximate inference algorithms. We will discuss the possibilities for doing inference on this model later. The M-

    step is specific to this model and requires maximizing the lower bound obtained in the E-step. Examining this

    expression we can see that the M-step for all the parameters except the ijα ’s is only trivially different from the

    HMM[41]. However, we can readily write down the update equations for the ijα ’s by noticing that they are mixture

    weights for N conditional probability tables analogous to a mixture of Gaussians. The ijα update equations are

    obtained by following the derivation of the M-step for a Gaussian mixture (i.e. introduce a hidden state to represent

    the “active” mixture component and then take an expectation over its sufficient statistics):

  • 13

    1

    1

    ( , , | )

    ( , | )

    i i jt t t

    new t k lij i j

    t tt k l

    P c j S k S l X

    P S k S l Xα

    = = ==

    = =

    ∑∑∑∑∑∑

    The “ itc j= ” event means that at time t chain i was influenced by chain j , and the “itS k= ” event means that

    chain i was in state k during time t .

    Unfortunately, exact inference of the Influence model is computationally intractable because of the densely

    connected hidden variables [42]. Variational methods or approximate inference techniques may be alternate

    tractable methods for learning the full model. However, in the next section we take a different approach towards

    tackling the intractability problem.

    4.2.2 The Constrained Gradient Descent Method

    Due to the difficulties involved in doing the inference required for E-step, we decided to simplify the estimation

    problem by allowing the states itS to be observed for each chain (see Figure 4). In practice, we found we could

    obtain reasonable state sequences by fitting an HMM to each chain’s observations and performing a Viterbi

    decoding. Then the chain transition tables can be easily estimated (by frequency counts) directly from these state

    sequences. Since our goal is to estimate the inter-chain influences (via the ijα ’s) this “clamping” of the observation

    and chain transition parameters help combat the overfitting problems of the full model.

    We now have an unusual DBN where the observed nodes are strongly interconnected and the hidden states are not.

    This presents serious problems for inference because marginalizing out the observed state nodes causes all the

    hidden states to become fully connected across all time and all chains. Unless we apply an approximation that can

    successfully decouple these nodes, a maximization procedure such as EM will not be tractable. However, there is a

    far similar way to estimate the ijα values in our observed scenario. Let us first examine how the likelihood function

    simplifies for the observed Influence Model:

    0 1( |{ }) ( ) ( | )i i j

    ij ij t tji i t

    P S P S P S Sα α − =

    ∑∏ ∏∏

    Converting this expression to log likelihood and removing terms that are not relevant to maximization over ijα

    yields:

    *1arg max log ( | )

    ij

    i jij ij t t

    i t j

    P S Sα

    α α −

    = ∑∑ ∑

    We can further simplify this expression by keeping terms relevant to chain i :

  • 14

    *1arg max log ( | )

    ij

    i jij ij t t

    t j

    P S Sα

    α α −

    = ∑ ∑

    This per chain likelihood is concave in ijα , which can be easily shown as follows: Let 0i

    iN

    αα

    α

    =

    01

    1

    ( | )

    ( | )

    it t

    it

    i Nt t

    P S S

    B

    P S S

    =

    � then the per chain likelihood becomes: ( ) log , ii tt

    f Bα α= ∑ .

    This is concave since for any 0 1w< ≤ and 0 1,α α :

    0 1 0 1

    0 1

    0 1

    0 1

    ((1 ) ) log (1 ) ,

    log (1 ) , ,

    (1 ) log , log ,

    (1 ) ( ) ( )

    it

    t

    i it t

    t

    i it t

    t

    f w w w w B

    w B w B

    w B w B

    w f wf

    α α α α

    α α

    α α

    α α

    − + = − +

    = − +

    ≥ − +

    = − +

    ∑∑

    (using Jensen’s inequality)

    Now take the derivative w.r.t. ijα :

    ( ) 1 11

    ( | ) ( | ).

    ( | ) ,

    i j i jt t t t

    i k it tij ik t t i t

    k

    P S S P S S

    P S S Bα α α− −

    ∂ = =∂ ∑ ∑∑

    Re-parameterize for the normality constraint:

    1

    ij ij

    iN ijj

    β α

    β β

    =

    = − ∑ �*

    1 1arg max log ( | ) (1 ) ( | )ij

    i j i Nij ij t t ij t t

    t j j

    P S S P S Sβ

    β β β− −

    = + − ∑ ∑ ∑

    Now take the derivative w.r.t. ijβ :

    ( ) 1 11 1

    ( | ) ( | ).

    ( | ) (1 ) ( | )

    i j i Nt t t t

    i k i Ntij ik t t ik t t

    k k

    P S S P S S

    P S S P S Sβ β β− −

    − −

    −∂ =∂ + −∑ ∑ ∑

  • 15

    We need to maintain the following constraints on ijβ to fulfill the normality constraints on ijα :

    0

    1 0

    ij

    ijj

    β

    β

    >

    − >∑

    These constraints can be enforced with the following log boundary functions:

    *1 1arg max log ( | ) (1 ) ( | ) log 1 log

    ij

    i j i Nij ij t t ij t t ij ij

    t j j j j

    P S S P S S A Bβ

    β β β β β− −

    = + − + − + ∑ ∑ ∑ ∑ ∑

    We update the derivative to reflect these log boundary penalties:

    ( ) 1 11 1

    ( | ) ( | ).

    ( | ) (1 ) ( | ) 1

    i j i Nijt t t t

    i k i Ntij ik t t ik t t ik ij

    k k k

    AP S S P S S B

    P S S P S S

    ββ β β β β

    − −

    − −

    −∂ = − +∂ + − −∑ ∑ ∑ ∑

    The gradient and the per-chain likelihood expression above are inexpensive to compute with appropriate rearranging

    of the conditional probability tables to form the itB vectors. This along with the facts that the per-chain likelihood is

    concave and the space of feasible ijα ’s is convex means that this optimization problem is a case of constrained

    gradient ascent with full 1-D search (see [43]). Furthermore, in all examples, 20 iterations were sufficient to ensure

    convergence.

    4.2.3 Performance of the Learning Algorithms

    To evaluate the effectiveness of our learning algorithm we show results on synthetic data. The data is generated by

    an Influence Model with 3 chains in lock step: one leader which was evolving randomly (i.e., flat transition tables)

    and 2 followers who meticulously followed the leader (i.e., an influence of 1 by chain 2 and a self-influence of 0).

    We sampled this model to obtain a training sequence of 50 timesteps for each chain. These state sequences were

    then used to train another randomly initialized Influence Model. For this learned model, the 1( | )i jt tP S S − were

    estimated by counting and the ijα ’s by maximizing the likelihood with gradient ascent as described above. The

    resulting influence graph is shown along with a typical sample sequence in Figure 5. Note how the “following”

    behavior is learned exactly by this model – chains 1 and 3 follow chain 2 perfectly.

    1

    2 3

    Influence Model (LockStep Simulation)

    2 4 6 8 10 12 14 16 18 200

    2

    4

    6

    8

    10

    12Influence Model Sample Sequence (LockStep Simulation)

    chain

    1 c

    hain

    2 c

    hain

    3

    time

    x1(t)

    x2(t)

    x3(t)

    x1(t+1)

    x2(t+1)

    x3(t+1)

    s1(t)

    s2(t)

    s3(t)

    s1(t+1)

    s2(t+1)

    s3(t+1)

    Generating Model

    0 2 4 6 8 10 12 14 16 18 201

    1.5

    2

    2.5

    3

    3.5

    4Sample Training Sequence

    chain

    1 c

    hain

    2 c

    hain

    3

    time

    Figure 5 The evaluation pipeline for testing the Influence Model on the lockstep synthetic data: (a) the graph for the

    generating model at time t and t+1 (b) the training sequence (c) the learned influences (α’s) – the thickness of the

  • 16

    lines corresponds to the magnitude of the influence. Note that the strong influence of chain 2 on 1 and 3 was

    correctly learned. (d) Sample paths from the learned model. Note how chains 1 and 3 (the followers) follow chain 2

    perfectly.

    The alpha matrix captures the strength of chain 2’s dynamics on chains 1 and 3 very well. The learned alpha matrix

    is

    0.0020 0.9964 0.0017

    0.2329 0.4529 0.3143

    0.0020 0.9969 0.0011

    We also evaluated the Generalized Coupled HMM (i.e. full state transition tables instead of the mixtures of pairwise

    tables) on this data using EM, using the Junction Tree Algorithm for inference. Again we sampled from the lock step

    model and trained a randomly initialized model. In this case, the learned model performed reasonably well, but was

    unable to learn the “following” behavior perfectly due to the larger number of parameters it had to estimate

    ( 1 1 1( | ,..., )i Nt t tP S S S− − vs. 1( | )

    i jt tP S S − ).

    In order to measure the effect of training data size on the quality of the model, we test how well the learned model

    predicts the next state of the follower chains given the current state of all the chains for different size training set.

    Table 1 shows the prediction results for both the Influence Model and the Generalized Coupled HMM (GCHMM).

    Clearly, the Influence Model requires a lot less training data to model the dynamics accurately.

    Training Data

    Size

    Influence Model

    Chain #

    1 3

    GCHMM

    Chain #

    1 3

    10 100% 99.5% 67% 50.5%

    20 99.5% 100% 66% 90.5%

    50 100% 100% 100% 100%

    100 100% 100% 100% 100%

    Table 1 Prediction Results for the follower chain in the synthetic dataset

    5 Experiments using the Sociometer

    5.1 The data collection method

    In our experiment we want to capture the following (i) how members within a group interact ? (ii) how member

    across groups interact? And (iii) how physical distribution of groups effect the interaction patterns? Formal structure

  • 17

    and physical layout of an organization has been shown to impact the collaboration and communication patterns of

    people, thus we wanted our experimental design to be able to capture these phenomenon.

    Different research groups at the lab volunteered to wear the sociometer – the wearable sensor package that measures

    social interactions. The subjects were chosen to represent atleast 80% of the members within a research group and

    we had four separate research groups participate in the study. The groups were sampled such that they were

    distributed through out the building – two groups from the third group, one group from the fourth floor and the other

    from the lower level. The users have the device on them for six hours a day (11AM –5PM) while they are on MIT

    campus. We performed the experiment in two stages – (i) the single group stage where 8 subjects from the same

    research group wore the sociometers for 10 days (two full work weeks - 60 hours of data per subject). (ii) the multi-

    group stage where 23 subjects from 4 different research group wore the sociometers for 11 days (over two full work

    weeks and 66 hours of data per subject) and a total of 1518 hours of interaction data.

    Over the course of the experiment, 25 different users wore the device and most of them were very satisfied with the

    comfortable and aesthetic design of the device. None of the subjects complained about any inconvenience or

    discomfort from wearing the sociometer for six hours everyday. Despite the comfort and convenience of wearing a

    sociometer, we are aware that subject’s privacy is a concern for any study of human interactions. Most people are

    wary about how this information will be used. To protect the user’s privacy we agree only to extract speech

    features, e.g. energy, pitch duration, from the stored audio and never to process the content of the speech. But, to

    obtain ground truth we need to label the data somehow. Our proposed solution is to use garbled audio instead of the

    real audio for labeling. Garbled audio makes the audio content unintelligible but maintains the identity and pitch of

    the speaker [44]. In future versions of the sociometer we will store encrypted audio instead of the audio, which can

    also prevent unauthorized access to the data.

    Right now, we take the following measures to address privacy concerns – (i) the identity of the subjects aren’t

    revealed at point during or after the study (ii) audio features such as energy, spectral characteristics and speech

    duration are used – no transcription or speech recognition is done on the data (iii) each individual could withdraw

    from the study at anytime and the data collected from them would be destroyed (iv) the data collected from the

    subject will be used for this study only. (v) the subjects were given an explanation of the experimental procedures

    and the consent form will included details about the nature of data that is being collected and (vi) the data is stored

    in a secure computer only accessible by the principal and associate investigators. The project is approved by MIT’s

    Committee on the Use of Humans as Experimental Subject (COUHES) application # 2889.

  • 18

    Figure 6: Various subjects wearing the sociometers during the two-week data collection period.

    5.2 Initial analysis of the data

    The first step in the data analysis process is to find out when people are in close proximity. We use the data from

    the IR receiver to detect proximity of other IR transmitters. The receiver measurements are noisy – the transmitted

    ID numbers that the IR receivers pick up are not continuous and are often bursty and sporadic. The reason for this

    bursty signal is that people move around quite a lot when they are talking, so one person’s transmitter will not

    always be within the range of another person’s receiver. Consequently the receiver will not receive the ID number

    continuously at 17Hz. Also each receiver will sometimes receive its self ID number. We pre-process the IR receiver

    data by filtering out detection of self ID number as well as propagating one IR receiver information to other nearby

    receivers (if receiver #1 detects the presence of tag id #2, receiver #2 should also receive tag id #1). This pre-

    processing ensures that we maintain consistency between different information channels. However, we still need to

    able use the bursty receiver measurements to detect the contiguous time chunks (an episode) people are in

    proximity. Two episodes always have a contiguous time chunk in between where no ID is detected. A hidden

    Markov model (HMM) [45] is trained to learn the pattern of IR signal received over time. Typically an HMM takes

    noisy observation data (the IR receiver data) and learns the temporal dynamics of the underlying hidden node and its

    relationship to the observation data. The hidden node in our case has binary state - 1 when the IDs received come

    from the same episode and 0 when they are from different episodes. We hand-label the hidden states by labeling 6

    hours of data. The HMM uses the observation and hidden node labels to learn its parameters. We can now use the

    trained HMM to assign the most likely hidden states for new observations. From the state labels we can estimate the

    frequency and the duration that two people are within face-to-face proximity. Figure 7 shows five days of one

    person’s proximity information. Each color in the sub-image identifies a person to whom the wearer is in close

    proximity of and the width is the duration contact. Note that we are also able to detect when multiple people are in

    close proximity at the same time.

  • 19

    Figure 7 - Proximity information for person # 1. Each sub-image shows one day’s information

    Figure 8 - Zoomed into the red shaded region from day two in figure 3. Upper panel: Bursty raw data from IR

    receiver. Lower Panel: Output of the HMM which groups the data into contiguous time chunks.

    The IR tag can provide information about when people are in close face-to-face proximity. But it provides no

    information about whether two people are actually having a conversation. They may just have been sitting face-to-

    face during a meeting. In order to identify if two people are actually having a conversation we first need to segment

    out the speaker from all other ambient noise and other people speaking in the environment. Because of the close

    placement of the microphone with respect to the speaker’s mouth we can use simple energy threshold to segment the

    speech from most of the other speech and ambient sounds. It is been shown that one can segment speech using

    voiced regions (speech regions that have pitch) alone[46]. In voiced regions energy is biased towards low-frequency

    range and hence we threshold low-energy threshold (2KHz cut off) instead of total energy. The output of the low-

    frequency energy threshold is passed to another HMM as observation, which segments speech regions from non-

    speech regions. The states of hidden node are the speech chunks labels (1 = a speech region and 0 = non-speech

    region). We train our HMM on 10 minutes of speech where the hidden nodes are again hand labeled.

    Figure 9 shows the segmentation results for a 35 second audio chunk. In this example two people wearing

    sociometers are talking to each other and are interrupted by a third person (between t=20s and t=30s). The output of

  • 20

    low frequency energy threshold for each sociometer is fed into the speech HMM which segments the speech of the

    wearer. The red and green lines overlaid on top of the speech signal show the segmentation boundaries for the two

    speakers. Also note that the third speaker’s speech in the 20s-30s region is correctly rejected, as indicated by the

    grayed region in the figure. In Figure 10 we show the spectogram from the two speakers’ sociometers overlaid with

    the results from the HMM.

    Figure 9 - Speech segmentation for the two subjects wearing the sociometer.

    Figure 10 - Spectogram from person A and person B's microphone respectively with the HMM output overlaid in

    black(top) and blue(bottom).

    One may argue that energy-based approach to speaker segmentation is very susceptible to the noise level of the

    environment and sound from the user’s regular activity. In order to overcome this problem we have incorporated

    robust speech features developed by Basu [47] which extracts voiced region very reliably even in noisy

    environment. However, the downside of this is any speech and not just the user’s speech is detected. So we use a

    second stage model on the derived features based on energy to segment out only the user’s speech and discard all the

    rest. This method has been very effective in our initial experiments.

  • 21

    Now we have information about when people are in close proximity and when they are talking. When two people

    are nearby and talking, it is highly likely that they are talking to each other, but we cannot say this with certainty.

    Recent results presented in the doctoral thesis of Sumit Basu [46] demonstrate that we can detect whether two

    people are in a conversation by relying on the fact that the speech of two people in a conversation is tightly

    synchronized. Basu reliably detects when two people are talking to each other by calculating the mutual information

    of the two voicing streams, which peaks sharply when they are in a conversation as opposed to talking to someone

    else. We use his techniques for detecting if two people talking in close proximity are actually talking to each other.

    The conversation alignment measure is as follow:

    1 2

    1 21 2

    , 1 2

    [ ] ( [ ], [ ])

    ( [ ] , [ ] )( [ ] , [ ] ) log

    ( [ ] ) ( [ ] )i j

    a k I v t v t k

    p v t i v t k jp v t i v t k j

    p v t i p v t k j

    = −= − == = − =

    = − =∑

    where v1 and v2 are two voicing streams and i and j range over 0 and 1 for voiced and unvoiced frames. We can

    further improve on the energy based segmentation of speaker by using non-initial maximum of the auto-correlation,

    the number of auto-correlation peaks and the normalized spectral entropy as features for training an HMM that

    identifies voiced regions as proposed by Basu[46].

    Once we detect the pair-wise conversation chunks we can estimate the duration of conversations. We can further

    break down the analysis and calculate how long each person talks during a conversation. We can measure the ratio

    of interaction, i.e. (duration of person A’s speech):(duration of person B’s speech). We can also calculate what

    fraction of our total interaction is with people in our community, i.e. inter vs. intra community interactions. This

    may tell us how embedded a person is within the community vs. how much the person communicates with other

    people. For example, someone who never talks to his work group but has many conversations in general is very

    different from someone who rarely talks to anyone.

  • 22

    A first pass picture of the network structure can be obtained by measuring the duration that people are in close face-

    to-face proximity. Figure 11 shows the link structure of our network based on duration, i.e. the total length of time

    spent in close proximity. There is an arrow from person A to person B if the duration spent in close proximity to B

    accounts for more than 10% of A’s total time spent with everyone in the network. The thickness of the arrow scales

    with increasing duration. Similarly, Figure 12 shows the link structure calculated based on frequency, i.e. the

    number of times two people were in close proximity. We are currently working on combining the audio and IR tag

    information and re-estimating the link the structure. Then we will be able look at network structure along various

    dimensions – i.e. based on frequency and duration of actual conversations people have with each other. We can also

    analyze the structure based on the dynamics of interaction (e.g. interaction ratio) we mentioned earlier in the section.

    There are a few interesting points to note about differences in the structure based on duration vs. frequency. The

    two main differences are that in the frequency network there are links between ID #1 and ID # 7 and there are extra

    links connecting ID #6 to many more nodes than the duration network. The additional link to ID # 6 was created

    Figure 10 Each row shows the mutual information score for subject 1 with 6 other subjects for a given day. The

    first row is the spectogram of subject 1.

  • 23

    because person # 6 sat mostly in the common space through which every one passed through frequently.

    Consequently, ID# 6 was picked up by most other receivers quite often, but the duration of detection was very short.

    However, if we combined IR with the presence of audio these extra links would most likely disappear. But, the links

    between ID 1 an ID 7 are more interesting – although these two people never had long discussions they quite often

    talked for short periods of time. These links would probably remain even when combined with audio.

    Figure 11 - The link structure of the group based on proximity duration.

    Figure 12 - The link structure of the group based on proximity frequency.

    Figure 13 – Interaction distribution based on proximity duration (first column) and proximity frequency (second

    column). Each row shows results for a different person in the network

  • 24

    Figure 13 shows the fraction of time each individual spends with other members in the group based on duration and

    frequency. Person 1 talks to all other members regularly and is the most connected person as well (see Figure 11 and

    Figure 12). Person 2-6 have more skewed distribution in the amount of time they spend with other members, which

    means they interact mostly with a select sub-group of people. These are only a few examples of looking at different

    characteristics of the network. Analysis along different dimensions of interaction is going to be one the main

    advantage of sensor-based modeling of human communication networks.

    A similar analysis of the larger dataset shows clustering of groups and a reduction of interaction with increasing

    physical separation.

    5

    10

    15

    20

    25

    0

    5

    10

    15

    20

    25

    30

    0

    2000

    4000

    Figure 14 (a) The connectivity matrix of interaction duration Each row is a different individual and each column

    depicts the fraction of his/her interaction with other. (b) Shows the same connectivity graph as a bar graph

    These connectivity graph or network graph can then be used to estimate centrality measures as traditionally done in

    social network analysis. The eigenvector centrality measure for the larger group based on the adjacency matrix is

    shown below (Figure 15). This measures will be compared with the centrality measures estimated from the survey

    data on interaction collected daily during the experiment stage. The similarities and differences in these measures

    will provide an indication on how accurately centrality can be estimated from sensor data and in what scenarios

    personal recall are most tend to be unreliable.

  • 25

    Figure 15 Eigenvector centrality of the 23 individual in the larger study estimated from proximity data

    Figure 16 Eigenvector centrality of the 23 individual estimated from survey data

    Once we have reliable ways of estimating how people interact within the community, the duration and frequency of

    interaction we can then begin to model how information propagates between groups and analyze the effects of new

    policies or new technologies on the dynamics of the group.

    6 Evaluating thesis results The primary contributions of this thesis are (i) a unique data set of sensor data collected from a group of individuals

    and their daily interactions and (ii) a data-analysis pipeline for extracting relevant interaction structure from this

    data. The analysis will have the following capabilities -

  • 26

    1. Transcription of the sensor data into descriptive labels (such as conversation duration and participants).

    2. Characterization of the communication network – i.e. the network structure/map

    3. Temporal modal of interaction and participation types

    4. Prediction of future interactions

    5. Calculating standard network analysis measures – e.g. centrality scores (betweenness, eigenvector etc.) and

    network interaction matrix from sensor data.

    The transcription of sensor data will be evaluated against human-labeled ground truth using standard cross-

    validation procedure involving separate testing and training data sets. We will evaluate how successful we are in

    detecting conversations between people and the duration and frequency of the interactions. We will also evaluate

    the consistency between the output of our algorithms with hand-collected survey data. The survey was designed

    using the guidelines followed by Prof. Thomas Allen used in his studies on organizational behavior[10]. We use the

    features extracted from our sensors to estimate measures of centrality automatically. We will compare these

    centrality scores with those obtained from survey-data. Centrality measures are used in social network analysis as a

    measure of influence and embeddedness of a person in his/her community [6, 49, 50]. Betweenness centrality

    measure will be used to assess how well the sensor data can discover communities and will be compared with the

    surveys.

    The communication network of the group will be mapped from the transcription labels using Bayesian network

    methods and evaluated against the results obtained from surveys conducted during the data collection phase. We will

    identify leaders and communities of practice that arise using measures typically used in social network analysis (e.g.

    degree centrality and betweenness centrality) and correlate these measurements with the estimated influence values.

    We will also analyze the matches and differences between the results obtained from sensor data and the

    questionnaires – e.g. survey data suffers from errors arising due to failures in personal recall, especially events that

    have low-frequency of occurrence. We will also calculate what fraction of our total interaction is with people in

    different group, i.e. inter vs. intra group interactions. This can reveal the differences between the formal structure vs.

    the informal network that arise within an organization.

    The hypothesis is that temporal models will encode the rules of typical interactions – turning taking nature,

    participation ratio, group vs. pairwise interactions etc. I will evaluate the temporal model’s ability to correspond or

    label equivalent segments across different interactions. These models will also be evaluated by their prediction

    accuracy on separate test and training sets. It is also necessary to evaluate and compare the prediction across

    different days and time of day. Note, evaluation steps (3) & (4) and closely related use of the temporal models –

    filtering vs. prediction.

  • 27

    7 Intellectual Contributions The novel idea of this thesis is that one can use sensor-based data-driven methods to automatically study

    connections between different groups in social or business institutions. This thesis presents a unique data set of

    sensor data collected from 23 individual belonging to four work./research groups. The dataset captures 66 hours of

    natural interaction data per person (1518 hours total) over the course of two weeks.

    There is no previous work documented on statistically estimating models of face-to-face interactions in groups from

    sensor-based data. Social psychologists and anthropologists have put enormous effort in understanding the nature of

    face-to-face interactions from extensive surveys, interviews and from user diaries. In the domain of computational

    perception attempts have been made in limited domains – e.g. can we understand the dynamics in a meeting room?

    The reason behind this is because there is always a trade-off between resolution and robustness. If we try to obtain a

    very detailed description of interactions e.g. gaze tracking, facial expressions, as well as speech recognition the

    fragility and context-dependence of the perceptual tools will limit the application to very restricted domains.

    This thesis attempts to achieve robustness and context independence by limiting the perception to somewhat coarse

    but informative features combined with state-of-the-art machine learning algorithms such that it can be used in

    broad context and scenarios without imposing additional burden on the user. Although I advocate the use of generic

    feature the whole notion of using data driven methods will allows us to capture various events that social networks

    analysts haven’t been able to capture so far. For example, do A and B mostly interact alone or do they mostly

    interact in the presence of C, i.e. is this a pair-wise interaction or group interaction? By using ‘always on’ sensors we

    can encode such ‘simultaneity’ events that is hard to capture otherwise.

    The link topology or connectivity provides information about communities of practice, authorities or leaders within

    the group, and identifies key individuals or gatekeepers with organization. There has been studies to discover these

    in web communities [22, 51], in corporate virtual network [13, 52] and in face-to-face interactions through many

    studies conducted using surveys or self-report diaries. This thesis presents a way of automating the process for face-

    to-face communication which can potentially pave the way of conducting experiments on physical interactions in

    scales of thousands and millions of people as done in online and virtual communities and overcome the bottleneck in

    number of people that can be surveyed using manual techniques.

    Furthermore, this work provides the tool for social scientists to actually go and test some of their theoretical models

    with first-hand data from real life. I also believe the results of this thesis can be usefully applied in context-aware

    computing – context-aware agents need to be able to ascertain the context of its user but also understand the user’s

    personal network in order to provide efficient and relevant assistance. For example, if I need to deliver an important

    message to person A who is away based on my communication network can my agent figure suggest another person

    B that can deliver my message to A is the shortest possible time?

  • 28

    The main intellectual contribution of this thesis are listed below:

    1. A means of extracting the features used in social sciences – i.e. (i) who talks to who (ii) how often (iii) for

    how long, automatically from sensor data without tedious manual annotation. These features are

    extensively used in discovering communities of practice and analyzing the behavior of organization[3] –

    however our features are at a finer time-scale and suffers from no personal interpretation bias. We also go

    beyond these measure and analyze the interaction style and dynamics.

    2. Influence measures between people have been used in social network community [48] to predict the steady

    state behavior of groups. However, these measures of influences are static. The influence model proposed

    in the thesis goes beyond that and incorporates influences not only based on the connection between

    individual but also the dynamics of their interactions. For example, suppose person A has a strong

    influence on person B but the actual effect of this influence on the state of B will depend not only on the

    influence value but also the state-transition dynamics of B given A.

    3. Evaluation of the algorithms done against human-labeling of raw sensory data. For a subset of the data

    collected we have the permission of the participants to listen to the data for hand labeling. For that dataset

    human labels of ‘who talk to whom’, ‘for how long’ and interaction patterns will be obtained. The output of

    our algorithms will be compared against these human labels. This approach is similar to ethnographic

    methods where human observer labels the observations either from recorded data or via direct observation

    of the interactions. We will check for consistency in the outputs of the sensors across devices to ensure the

    labels and devices are in agreement and to ensure that any errors in the device recording do not bias the

    labels obtained from human labeling.

    4. Comparison of results with survey data collected during the experiment. Our survey was designed using the

    guidelines followed by Prof. Thomas Allen used in his studies on organizational behavior[10]. We also

    estimate measures of centrality automatically from raw sensor data. Centrality measures are used in social

    network analysis as a measure of influence and embeddedness of a person in his/her community [6, 49, 50].

    Betweenness centrality measures are often used to discover different communities that arise within a group

    of people, we use this measure on our sensor results as well. In many studies it has been shown that

    topology of people’s connectivity is the most important feature and the actual interaction content is not as

    crucial in understanding a person’s role within the community [2, 13].

    5. A complete system that will enable sensor-based approach to measuring interactions between groups of

    people. We address issues from user acceptable sensors to sensor packaging to robust feature extraction.

    Privacy preserving methods of data analysis is adopted in experiments done in this thesis. We will

    evaluate the effectiveness and acceptability of our approach from the exit surveys collected from all

    subjects participating in the experiment.

  • 29

    8 Time-line

    April Develop dynamic models of interaction types

    May Evaluate the communication graphs and dynamic models

    June Evaluate results and finish all the analysis work

    July Write thesis

    August Thesis defense

    9 Resources The following items will be required in the future to finish the thesis work:

    Storage space:

    • Hard drive space for data storage and back-up – 1 terabyte

    Data Analysis

    • Multiple high-end Pentium PCs for number crunching

    10 Short Biography

    Tanzeem Choudhury is a PhD candidate working with Prof. Alex (Sandy) Pentland at

    the MIT Media Laboratory. She received her master’s degree from the Media

    Laboratory and a bachelor’s degree in Electrical Engineering from the University of

    Rochester. Her research interests include machine perception, machine learning,

    ubiquitous computing and the analysis of social networks and organizational behavior.

    11 Bibliography 1. Gladwell, M., The Tipping Point: How little things make can make a big difference. 2000, New York: Little

    Brown.

    2. Allen, T., Architecture and Communication Among Product Development Engineers, 1997, Sloan School of Magement, MIT, WP Number 165-97

    3. Huberman, B. and Hogg, T., Communities of Practice: Performance and Evolution. Computational and Mathematical Organization Theory, 1995. 1: p. 73-95.

    4. Allen, T., Organizational Structure for Product Development, 2000, Sloan School of Management, MIT, WP Number 166-97

    5. Watts, D., Six Degress: The Science of a Connected Age. 2003: W. W. Norton & Company.

    6. Wasserman, S. and Faust, K., Social Network Analysis Methods and Applications. 1994: Cambridge University Press.

  • 30

    7. Valente, T.W., Social network thresholds in the diffusion of innovations. Social Networks, 1996. 18: p. 69-89.

    8. Solomon, H., Mathematical Thinking in the Measurement of Behavior: Small Groups, Utility, Factor Analysis. A Study of the Behavior Models Project. 1960, Glencoe, IL: Free Press.

    9. Granovetter, M., The strength of weak ties. American Journal of Sociology, 1973. 78(6): p. 1360-1380.

    10. Allen, T., Managing the Flow of Technology. 1977, Cambridge MA: MIT Press.

    11. Wasserman, S. and Pattison, P., Logit models and logistic regressions for social networks: an introduction to Markov Graphs and p*. Psychometrika, 1996. 60: p. 401-426.

    12. Liljeros, F., Edling, C., Amaral, L., Stanley, H., and Aberg, Y., The web of human sexual contacts. Nature, 2001. 411: p. 907-908.

    13. Tyler, J., Wilkinson, D., and Huberman, B. Email as spectroscopy: Automated discovery of community structure within organizations. In International Conference on Communities and Technologies. 2003. Amsterdam, The Netherlands.

    14. Erdos, P. and Renyi, A., On the evolution of random graphs. Publications of the Mathematical institute of the Hungarian Academy of Sciences. 1960. 17-61.

    15. Watts, D. and Strogatz, S.H., Collective dynamics of ’small world’ networks. Nature, 1998. 393: p. 440-442.

    16. Barabasi, A. and Albert, R., Emergence of scaling in random networks. Science, 1999. 286: p. 509-512.

    17. Huberman, B. and Adamic, L.A., Evolutionary dynamics of the world wide web. Nature, 1999. 401: p. 131.

    18. Adamic, L.A., Lukose, R., and Huberman, B., Local search in unstructured networks, in Handbook of graphs and netwroks: From the genome to the internet, S. Bornholdt and H. Schuster, Editors. 2002, Wiley.

    19. Kleinberg, J., The small-world phenomenon: An algorithmic perspective, 1999, Cornell Computer Science Technical Report 99-1776

    20. Watts, D., Dodds, P., and Newman, M., Identity and Search in Social Networks. Science, 2002. 296: p. 1302-1305.

    21. Hofmann, T. Learning Probabilistic Models of the Web. In Proceedings of the 23rd International Conference on Research and Development in Information Retrieval (ACM SIGIR’00). 2000.

    22. Kleinberg, J. Authoritative sources in a hyperlinked environment. In ACM-SIAM Symposium on Discrete Algorithms. 1998.

    23. Wells, G., Petty, R., The effects of overt head movements on persuasion. Basic and Applied Social Psychology, 1980. 1(3): p. 219-230.

    24. Morris, R.J. and Hogg, D., Statistical models of object interaction. International Journal of Computer Vision, 2000. 37(2): p. 209-215.

    25. Pentland, A., Smart Rooms. Scientific American, 1996. 274(4): p. 68-76.

    26. Pentland, A. and Choudhury, T., Face recognition for smart environments. IEEE Computer, 2000. 33(2): p. 50-55.

    27. Greene, T., Feds use biometrics against Super Bowl fans. 2001. http://www.theregister.co.uk/content/archive/16561.html.

    28. Bregler, C. Learning and recognizing human dynamics in video sequences. In IEEE Conference on Computer Vision and Pattern Recognition. 1997.

    29. Stauffer, C. and Grimson, W.E.L., Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000. 22(8): p. 747-757.

    30. Addlesee, M.D., Curwen, R., Hodges, S., Newman, J., Steggles, P., Ward, A., and Hopper, A., Implementing a sentient computing system. IEEE Computer, 2001. 34(8): p. 50-56.

  • 31

    31. Hofer, B., Blank, R., and Meier, R., Spotme: An Autonomous Short-Range Communication System. 2002. http://www.spotme.ch/.

    32. Basu, S., Choudhury, T., Clarkson, B., and Pentland, A. Towards Measuring Human Interactions in Conversational Settings. In In the proceedings of IEEE Int’l Workshop on Cues in Communication at the Computer Vision and Pattern Recognition (CVPR) Conference. 2001. Kauai, Hawaii.

    33. Adar, E., Lukose, R., Sengupta, C., Tyler, J., and Good, N., Shock: Aggregating Information while Preserving Privacy, 2002, HP Laboratories: Information Dynamics Lab,

    34. Gibson, D., Kleinberg, J., and Raghavan, P. Inferring Web communities from link topology. In Proc. 9th ACM Conference on Hypertext and Hypermedia. 1998.

    35. Clarkson, B., Wearble Design Projects. 2002. http://web.media.mit.edu/~clarkson/.

    36. Gerasimov, V., Selker, T., and Bender, W., Sensing and Effecting Environment with Extremity Computing Devices. Motorola Offspring, 2002. 1(1).

    37. DeVaul, R. and Weaver, J., MIT Wearable Computing Group. 2002. http://www.media.mit.edu/wearables/.

    38. Asavathiratham, C., The Influence Model: A Tractable Representation for the Dynamics of Networked Markov Chains, in Dept. of EECS. 2000, MIT: Cambridge. p. 188.

    39. Saul, L.K. and Jordan, M., Mixed memory Markov models. Machine Learning, 1999. 37: p. 75-85.

    40. Schafer, R.D., An Introduction to Nonassociative Algebras. 1996, New York: Dover.

    41. Blimes, J., A Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models, 1997, University of California, Berkely, ICSI-TR-97-021

    42. Lauritzen, S.L. and Spiegelhalter, D.J., Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, 1988. B50: p. 157-224.

    43. Bertsekas, D.P., Nonlinear Programming. 1995, Belmont: Athena Scientific.

    44. Marti, S., Sawhney, N., Jacknis, M., and Schmandt, C., Garble Phone: Auditory Lurking. 2001.

    45. Jordan, M. and Bishop, C., An Introduction to Graphical Models. 2001.

    46. Basu, S., Conversation Scene Analysis, in Dept. of Electrical Engineering and Computer science. Doctoral. 2002, MIT. p. 1-109.

    47. Basu, S. A two layer model for voicing and speech detection. In Conference on Accountics, Speech and Signal Processing (ICASSP). 2003.

    48. Bonacich, P., Dynamic Models of Influence, in Introduction to Mathematical Sociology. In preparation.

    49. Bonacich, P., Power and centrality: a family of measures. American Journal of Sociology, 1987. 92: p. 1170-1182.

    50. Bonacich, P., Eigenvector-like measures of centrality for assymetric relations. Social Networks, 2001. 23: p. 191-201.

    51. Gibson, D., Kleinberg, J., and Raghavan, P. Inferring Web Communities from Link Topology. In 9th ACM Conference on Hypertext and Hypermedia. 1998.

    52. Lukose, R., Adar, E., Tyler, J., and Sengupta, C. SHOCK: Communicating with Computational Messages and Automatic Private Profiles. In Proceedings of the Twelfth International World Wide Web Conference. 2003.