the impact of peer pressure on the emergence of conventional norms in structured societies

Upload: matteocam

Post on 04-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    1/20

    The impact of peer pressure on the emergence of

    conventional norms in structured societies

    Matteo Campanelli

    November 22, 2013

    Abstract

    In this work I survey existing work on Emergence of Conventions that(a) tries to model social pressure by means of a reward metric based onthe history of the interaction among agents and (b) focuses on spatialmodels to examine the properties of these dynamics.

    Contents

    1 Introduction 2

    2 Modelling social pressure forconvention emergence 32.1 Memory-based reward . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.1.2 Formal description of MRM . . . . . . . . . . . . . . . . . 42.2 Adaptation mechanisms . . . . . . . . . . . . . . . . . . . . . . . 42.3 Spatial structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    3 Results of experiments with memory and topology 53.1 Effects of neighborhood size . . . . . . . . . . . . . . . . . . . . . 73.2 Effects of memory size . . . . . . . . . . . . . . . . . . . . . . . . 7

    3.2.1 Interpretation of the impact of larger memory windows . 83.2.2 Mono vs Multi Learning when varying memory size . . . 83.2.3 Different topologies . . . . . . . . . . . . . . . . . . . . . . 9

    3.3 Effects of Learning Approach . . . . . . . . . . . . . . . . . . . . 93.4 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    4 Convention emergence with populations including non-learningagents 114.1 Fixed Strategy agents . . . . . . . . . . . . . . . . . . . . . . . . 124.2 Experimental setting and results . . . . . . . . . . . . . . . . . . 12

    1

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    2/20

    5 Discussion and Related Work 135.1 Summary of the presented work . . . . . . . . . . . . . . . . . . . 13

    5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    1 Introduction

    In multi-agent systems, different agents aim at achieving different goals, yetthey need to interact, deciding how to share limited resources, coordinating andcooperating. In these settings it is fundamental that agents agree on rules (alsocalled norms), yet allowing them a certain degree of freedom. Conformity torules reduces social friction, relieves cognitive load in humans (or computationalload in artificial agents) and facilitates coordination [ST97].

    There are two different approaches to rules in agent societies. On the one

    hand, these rules can be seen as a design tool and imposed from above [ST97]However agents may not always agree on such rules offline, since features of thesociety at hand may be unknown or be changing over time. Furthermore, designof rules may be computationally hard [ST97]. In these cases it is important thata society is able to dynamically converge to norms. The latter approach, calledinteractionist [SC11], will be the focus of this paper: I will describe settingswhere agents in a society interact over time and gradually agree on a convention.

    But what is concretely a norm or convention? In a population of agents,interaction usually takes the form of games. Any time agents interact with eachother they do it by choosing an action in a game. A normis then a restrictionon the set of available actions [ST97]. A restriction where the set of availableactions is a singleton is called a convention.

    In this paper I will survey existing experimental work on emergence of con-

    ventions that employs models of peer pressure in spatially structured popula-tions.

    Villatoro et al. [VSSM09] propose to model human-like dynamics for emer-gence of conventions, they propose a reward metric determined by memory ofpast actions, thus determining peer pressure; they name this metric Memory-based Reward Metric (MRM). They show that using Q-learning to learn it,agents can converge to conventions in agent societies with different underlyingspatial structures.

    Later work [GA12] has examined additional features in the spatial memory-based model in [VSSM09], examining the effects of heterogeneous population.

    The structure of this paper is as follows. Section 2 introduces the interactionframework assumed in this work: I describe the reward metricMRMand theoverall simulation context. Section 3 summarizes the main results in [VSSM09].Section 4 introduces the extension proposed in [GA12] and its main results.In Section 5 I give an overview on the vast field of emergence of conventionsand highlight similarities and differences with the line of research described insections 2 and 4. In the same section, I conclude with a further discussion ofthe work in [VSSM09] and [GA12].

    2

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    3/20

    2 Modelling social pressure for

    convention emergenceThis section describes one of the main works this paper builds upon, [VSSM09].Its further extensions and applications in [GA12] are described in the next sec-tion.

    The work in [VSSM09] proposes a model of convention emergence in agentsocieties. The interaction among agents is described by a game; at the end ofeach interaction they receive a reward inspired by notions of social pressure.Agents are rewarded on the base of how much they conformed in the currentaction and in the past history of repeated interactions with their peers. Theauthors explore how parameters, such as the memory size of the history above,affect the emergence of convention.

    Agents can only observe their own private actions and rewards, that is they

    are not aware of the past history of other agents, their current actions or theirreward.The interactions among agents are also situated in a spatial environment.

    Specifically, each agent is represented by a node in a network and the neighborsof that node represent the peers the corresponding agent may interact with.Thus the assumed type of network affects interaction dynamics. The authorsdescribe various experimental settings with different types of network topologies.

    At each time step, each agent is paired with one of its neighbors. Each ofthe pairs is presented a game and agents decide which action1 to take. Thenrewards are assigned with the criteria above (see subsubsection 2.1.2 for furtherdetails) and a new iteration begins unless a convention emerged. The societyreaches a convention when all the agents decide to be in the same state and thissituation is immutable.

    2.1 Memory-based reward

    2.1.1 Motivation

    In this section I introduce the motivation for the game agents play in thisframework, which is also the main point of discussion of this paper.

    Literature in norm emergence usually model interaction among agents bymeans of games with a static reward 2. These games fail to consider the fulldynamics of norm evolution in human societies. Often among humans, normsemerge as the result of peer pressure, which is applied by learning after repeatedinteractions. If agents employ machine learning algorithms to adapt their be-havior, they may include past interactions as parameters for their adaptation.

    However, this is not enough to model the full context and the persistent natureof social pressure in human societies. The reward agents receive should thenitself represent this context. Individuals often use past interaction from their

    1the authors also describe these actions as states the agents decide to be in2By games with static reward the authors seem to assume stateless games, or games

    whose state dont depend on the history of past actions

    3

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    4/20

    peers to judge them. Thus, a reward modelling social pressure should be basedon history of past actions.

    Villatoro et al. propose the Memory-Based Reward Metric (MRM) as agame for agents where history of past actions determine the reward agents getduring their interaction.

    In the following paragraphs I formally describeMRM.

    2.1.2 Formal description of MRM

    The memory-based game proposed in [VSSM09] looks like the following.The reward for an interaction depends on the current and previous choices,

    modelling the social pressure that arises from the history of interactions. Eachagentx has a fixed length FIFO memoryMxrecording the most recent l actionsselected. At each time step two agents are selected and they choose one of them availabe actions (m is assumed to be 2 in [VSSM09]).

    If an agent selects the majority action, as represented in the combinationof the two memories, then its payoff is equal to the proportion of the majorityactions that it was responsible for, otherwise it receives nothing (see equationsbelow). Specifically, when an agentx interacts with another agent y, the rewardrx it receives for action ax is given by:

    rx=

    Ma

    x

    Max+Ma

    y

    ifax a

    0 otherwise

    where Max is the number of times actiona appears in agent xs memoryanda is the majority action.

    The following example describes how the reward metric works: assuming theonly two possible actions are a and b, if two agents x and y have as memories

    respectivelyMx= (a,a,b,a) andMy = (a,b,a,b), the majority action is a with5 occurrences in the joint memories. Let us assume agents actions areax =band ay = a, then their rewards are respectively rx = 0 (x did not choose themajority action a) and ry = 2/5 (since y has a total of 2 occurrences of actiona in its memory over a total of 5).

    The experiments in [VSSM09] and in [GA12] both study (among other pa-rameters) how the memory size in MRMaffect the emergence of conventions inthis model. Another important factor studied in these works is how a spatiallystructured society of agents affects emergence of conventions.

    2.2 Adaptation mechanisms

    In the previous subsection I described the game agents play at each time step.One issue to be answered is: how do agents decide which action to choose whenthey are facing an interaction?

    In [VSSM09] agents use a machine learning algorithm to estimate the worthof each action. The specific approach used is -greedy Q-learning [SB98]. Thebasic idea is the following. With a certain probability agents choose a random

    4

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    5/20

    action (exploration). In all the other cases agents will choose the action whosepayoff is considered the highest (exploitation). The expected reward of each

    action is calculated by means of the Q-update formula:

    Qt(a) (1 ) Qt1(a) + reward

    whererewardis the payoff received at the current interaction and is a learningparameter.

    For the experiments in the next section the parameter will be fixed at 25%.

    2.3 Spatial structure

    The work in [VSSM09] studies three different spatial structure models:

    one-dimensional lattice, in which each agent is connected to a fixed numberof other agents;

    scale-free network, whose node degree distribution asymptotically followsa power law;

    fully-connected stars network, where a relatively small numbers of hubs orcore nodes forming a sub-clique are connected with a number of leafnodes.

    Examples of these topologies can be found in Figure 1.

    Figure 1: Topologies in [VSSM09]

    Now that I have introduced the society where simulations will take I willdescribe experiments and their results. Figure 2 shows a summary of the simu-lation process as described in this section.

    3 Results of experiments with memory and topol-

    ogy

    In this section I will describe the experimental setting in [VSSM09]. Many ofthe concepts described here will be part of the work in [GA12].

    5

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    6/20

    Figure 2: Simulation process in [VSSM09]

    In order to measure the speed a society reaches a convention, Villatoro etal. control several parameters during the experiments:

    recall that MRM is a reward based on past interactions, thus the authorsvary thememory sizeagents have in interactions;

    the performance of the emergence of conventions is observed in severalspatial structures, such as the ones presented in Section 2.3;

    Thepopulation size;

    Fixed the size of a population, societal configurations may differ by the

    size of the neighborhood: the number of connections agents have amongthem. Later I will discuss how this property may vary only in certaintypes of networks;

    the authors identify two different learning modalitiestwo agents may em-ploy during interaction, they call them Mono and Multi Learning, respec-tively when only one of the agents is learning or both are.

    Varying the parameters above, the authors measure the convergence rate ofthe agents to a convention (in case they reach a convention).

    Some researches consider a convention emerged if a certain percentage ofagents employ the same action, among these instance [Kit93, DPS03]. In thework I am describing, Villatoro et al. consider a convention as emerged iffall the

    agents are in the same state (viz. they choose the same action). The authorsmotivate this choice by observing that in cases with percentages lower than100%, the conventions could fluctuate: for instance even after 90% of a societyhad converged to a convention, it could still switch back to another.

    Summarizing, the societal configurations in the experiments vary by thefollowing parameters: memory size, population size, neighborhood size, network

    6

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    7/20

    topology, learning modalities. I shall now describe the experimental effects ofvarying these parameters on the performances of emergence of conventions. The

    results reported in the next sections have been averaged over 25 runs.

    3.1 Effects of neighborhood size

    The neighborhood size is the number of connections each agent has with itspeers, i.e. the degree of each node of the network. It is expressed as the fractionof the (rest of the) population to which each agent is connected. Neighborhoodsize (NS) is a parameter that can be set in only one type of the network topolo-gies used in the experiments: one-dimensional lattice. Here the caseNS= 100%leads to a clique and the case NS= 2

    N1(whereNis the population size) leads

    to a circular ring 3. Notice that, differently from the one-dimensional lattice, inthe case of the scale-free network and fully connected star topology, the neigh-borhood size is a function of the population size, thus being predetermined.

    When varying neighborhood size other experimental parameters are:

    Population size of 100 or 200 agents

    Memory size is equal to 5

    The results below refer mostly to multi-learning approach is used. Themono-learning approach shows similar results.

    The diameter of a network plays a role in the results presented in this sec-tion.The diameter is affected from the neighborhood size: it decreases geomet-rically when the neighborhood size increases, as shown in Figure 3.

    Figure 4 shows the results of convergence rate with different neighborhoodsizes and multi-learning. The authors state results for mono-learning are verysimilar but they do not include them for reasons of space.

    The results in the figure shows that reducing the Neighborhood Size de-creases the convergence time which stabilizes after a certain threshold value.After NS reaches 30%, the convergence speed is not affected significantly. Theauthors explain this observing that when the neighborhood size increases, onaverage the diameter decreases, thus requiring less steps for agents in differentparts of the network to communicate their decisions. Furthermore, as shown inFigure 3, the diameter does not decrease significantly after NS 30%.

    Dynamics of convergence are similar for both small and large populations(respectively 100 and 200 agents) from NS 20%. With a low neighborhoodsize, however, convergence time in small populations, is more than 40% higherthan in large populations.

    3.2 Effects of memory size

    In MRM agents receive a reward that depend from their past actions.

    3the authors assume that the society is always a connected graph

    7

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    8/20

    Figure 3: Diameter relation with Neighborhood size in a One Dimensional Lat-tice with population = 100

    In this set of experiments the authors fix the population size to 100; for theone dimensional lattice they choose a fully connected network.

    As Figure 5 shows, when memory size increases the convergence speed de-creases.

    3.2.1 Interpretation of the impact of larger memory windows

    The results above can be explained observing the inner mechanism of MRM. InMRM the reward each agent takes is given by a fraction, whose denominator isthe same for both players and it is equal to the total number of actions in the

    joint memories of the agents, viz. twice the memory size. Thus the memorysize determines how easily agents can be influenced in certain circumstances:the units of reinforcements are smaller with a larger memory size, this wouldproduce two similar reinforcements for both actions in the two agents and aslower convergence rate. Differently, small memory windows would lead to largerdifferences in proportional rewards and consequently larger reinforcements.

    3.2.2 Mono vs Multi Learning when varying memory size

    Figure 5 shows that the Multi Learning approach to takes less time than theMono Learning. The authors observe that in the former case both agents are

    8

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    9/20

    Figure 4: Results with different neighborhood sizes and multi-learning

    receiving a feedback from the interaction. This leads to an accelerated learningprocess.

    3.2.3 Different topologies

    From Figure 6 (where the Mono Learning approach is used) we observe thatScale Free and Fully Connected Star Networks have higher convergence rates.In the next section I will describe how this is due because of the formation ofsub-conventions in the society and the time required to resolve them.

    3.3 Effects of Learning Approach

    In this section I describe some effects of learning approach on the emergence ofconventions. There are different observations for the case of One DimensionalLattices on the one hand and Scale Free networks and Fully connected Stars onthe other.

    Figure 7 shows results for a population of 100 agents with different learningapproaches. For small neighborhood sizes, the Multi Learning approach per-forms worse than the Mono Learning one. WhenNS 30% Multi Learningconverges in fewer steps than Mono Learning; this is related to what seen inSection 3.1, how the neighborhood size affects the diameter of the network. Anexplanation of these results follows.

    9

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    10/20

    Figure 5: Results with different memory sizes for One Dimensional Lattice (FullyConnected Network)

    If NS is low (high diameter). In this case agents will interact moreoften with their neighbors, thus producing subconventions in different ar-

    eas of the network. In regions presenting overlapping conventions agentswill interact with other agents presenting the same convention or a dif-ferent one. If using Multi Learning, agents interacting with others witha different subconvention would require many interactions to change sub-convention or to make other agents switch to theirs; this because they willalso be reinforced by agents with the same state as theirs. On the otherhand, in case of Mono Learning it is easier to break subconventions as it iseasier that the flow of reinforcements go towards only one subconventionin the overlapping region;

    If NS is high (low diameter). In this case it is harder to sustainsubconventions as interactions come more easily from any region of thenetwork. Multi learning allows better performance in this setting: agentswill be learning from all the interactions they are involved in.

    3.4 Summary of results

    The last paragraphs have described the following results on the rate of conven-tion eemergence:

    10

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    11/20

    Figure 6: Results with different memory sizes in different topologies with MonoLearning (y-axis is in log scale)

    In general the type of network does affect efficiency of convention emer-gence;

    A higher neighborhood size improves convergence rate by preventing theformation of local subconventions;

    A larger memory size affects the rewards requiring more iterations forconvergence;

    Multi-learning makes subconventions more likely to appear.

    4 Convention emergence with populations in-

    cluding non-learning agents

    In the previous section I described the work in [VSSM09] where a society of agentinteracts by means of a memory-based game modelling social pressure. In thatwork, agents were all adapting using the same learning approach (Q-learningand identical learning modality). In this section I will describe experimentalresults in [GA12] which adopts heterogeneous populations in a variant of the

    11

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    12/20

    model in [VSSM09]. In particular we will see populations with a fraction of non-learning agents that adopt a fixed strategy and where the number of possible

    actionsm is higher than 2.

    4.1 Fixed Strategy agents

    The agent societies described in this section are different from those in [VSSM09]:not all agents adapt their behavior to that of their peers; in fact the popula-tions present another type of agent next to the adaptive agents described before,calledFixed Strategy(FS) agents. When facing an interaction, these agents willalways play a fixed specific action.

    In [GA12] the authors consider two different scenarios for FS agents:

    All the FS agents use a specific fixed strategy;

    The strategies are uniformly distributed among the FS agents.These two approaches are motivated respectively by (i) improving conver-

    gence times and (ii) slowing convergence and mantaining diversity. In eithercase, when agents interact with FS agents, their best response will always be toplay the fixed strategy in response. Notice that, as in the previous work, agentsare anonymous, therefore interaction with FS agents cannot be recognized.

    4.2 Experimental setting and results

    The adaptive agents employ the same Q-learning algorithm and parametersseen in the previous section. However the definition of convergence changes:in [GA12] the authors use Kittocks definition of convergence [Kit93], hence aconvention is reached when 90% of the population used the same action. Thesize of the population for which results are shown is N= 500 and the authorsclaim that similar results hold for N= 100 and N= 1000.

    The considered societies are spatially structured. There are 3 types of net-works the authors consider: random graph, scale-free graph and small-worldgraph. Informally, in a random graph existing edges are chosen randomly andin a small-world graphs most nodes are not neighbors of one another, but mostnodes can be reached from every other by a small number of hops or steps.

    In experiments FS agents are placed according to different criteria. In theresults shown in Figure 8 they were placed randomly. In the results shown inFigure 9 they were also placed according to degree and betweenness centrality.The latter is a measure of a nodes centrality in a network and it is measuredby the number of shortest paths from all vertices to all others that pass through

    that node.The results in 8 show that the number of FS agents have a strong impact onthe convergence rate in small-world networks. The figure shows that introducingFS agents also improves convergence speed in random and scale-free networks,though to a lesser degree.

    12

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    13/20

    According to Figure 9, the strategy by which FS agents are placed on a net-work can have a significant impact. In random graphs, placing agents according

    to degree or betweenness centrality improved speed convergence roughly by afactor of 3 when the number of FS agents was higher than 10. Further experi-ments in [GA12] have shown that placing FS agents by betweenness centralityoutperforms placing by degree when the number of FS agents is higher than 5,whereas in the included figure, which refers to scale-free networks, the differencein performance is negligible.

    Two final results worth mentioning (not included in the figures) concern theimpact of the number of actions m and the distribution of strategies amongF Sagents: increasingm consistently increased the convergence time among topolo-gies; reaching a convention is also slower when actions are uniformly distributedamong FS agents.

    5 Discussion and Related Work

    5.1 Summary of the presented work

    In the previous sections I have presented the work in [VSSM09] and [GA12].Their research presents experimental studies of spatially structured agent soci-eties where the authors model agentss interaction and adaptation in the follow-ing wat:

    agents interact by means of a game where each agents current rewarddepend on its conformity to the history of actions of its peers;

    agents adapt using Q-learning and they can only observe their own privatemoves and rewards.

    The experiments in [VSSM09] show how conventions can emerge under theassumptions above with different parameters such as memory size, network type,population and neighborhood size, learning modality. Their results also showshow these different parameters can affect the efficiency by which societies reachconventions.

    In [GA12], the authors propose to introduce fixed strategy agents in societieswhere interaction is defined by the Memory-based Reward Metric and study theeffects of fixed agents, arbitrary number of actions and network choice on theemergence of convention.

    5.2 Related Work

    In this section I will discuss related work in the field of emergence of conventionsand present a short focused overview on research on spatial structure contextu-alizing the role of the study on structured populations in [VSSM09] and [GA12].

    13

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    14/20

    Emergence of conventions

    Shoham and Tennenholtz [ST97] were the first in multi-agent systems researchto experiment with norm emergence. They viewed a norm as a social law whichconstrains actions or behaviours of the agents in the system. They used amechanism called co-learning which is a simple reinforcement learning mecha-nism based on a Highest Cumulative Reward (HCR) rule for updating an agentsstrategy when playing a simple coordination game and a cooperation game (pris-oners dilemma). According to this rule, an agent chooses the strategy that hasyielded the highest reward in the pastmiterations. The history of the strategieschosen and the rewards for each strategy is stored in a memory of a certain size(which can be varied). They experimented with the rate at which the strategyis updated (after one iteration, two iterations, three iterations, etc.). Whenthe frequency of update decreases, convention emergence decreases. They ex-perimented with flushing the memory of the agent after a certain number of

    iterations and retaining only the strategy from the latest iteration. They foundthat when the interval between memory flushes decreases, the efficiency of theconvention emergence decreases.

    Early research took into account the willingness to follow the crowd as amotivation for agents adaptation. In [Eps01] Epstein adopts imitiation mech-anisms for studying norm emergence. Such a model is characterized by agentsmimicking the behaviour of what the majority of the agents do in a given society.Epsteins main argument for an imitation mechanism is that individual thought(i.e. the amount of computing needed by an agent to infer what the norm is)is inversely related to the strength of a social norm. This implies that when anorm becomes entrenched the agent can follow it without much thought. Ep-stein has demonstrated this in the context of a driving scenario in which agentscan observe each others driving preference (left or right) based on a certain

    observation radius r. If the agent sees more agents driving on the right withinthe observation radius, it changes to the right. When a norm is established, theobservation radius becomes one (i.e. the agent looks at one agent on its rightand left to update its view about the norm).

    Sen and Airiau [SA07] proposed a mechanism for the emergence of normsin coordination problems and social dilemmas. Agents learn from repeated in-teractions with anonymous members of the society in a setting of incompletebut perfect information (they can see the other agents moves but they are notaware of their payoffs).They experimented with different reinforcement learningalgorithms, population sizes and number of actions showing that agents learnednorms based on private local interactions. They observed that when the popu-lation size is larger, the norm convergence is slower, and when the set of possibleaction states is larger, the convergence is slower. They also studied the influ-ence of the dynamic addition of agents with a particular action state to a poolof existing agents, as well as norm emergence in isolated sub-populations. Theyshow that their approach of individual learning is indeed a robust mechanismfor evolving stable social norms even in problems such as social dilemmas.

    Other research has investigated norm emergence in structured societies. The

    14

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    15/20

    agents in Delgado et al. [DPS03] interact with their neighbors in a scale freegraph. They play a coordination game in which payoff is high if both agents

    chose the same action and low if both agents chose different actions. The authorsformulate their action choice in terms of history. Each agent keeps a history ofinteractions and the corresponding reward. The agents then utilize the historyto select the best payoff action.

    Spatial structure and emergence of conventions

    The line of work presented here studies the role of memory (embedded in thereward metric) and spatial structure in convergence of conventions. Spatialstructure is particularly important given that the role played by the topologyof the underlying social network of a multiagent system in the emergence ofa norms and conventions has not been studied in depth. In the nineties, someresearchers such as Kittock [Kit93] , Shoham and Tennenholtz [ST97] and Cohen

    et al. [CRA99] pointed out that topology was a key factor for the efficiencyin the emergence of conventions. However, it was not until some years laterthat the effect of topology attracted the interest of more researchers workingin norms and conventions [AK01, DPS03]. The tipping point was the findingthat components of real systems exhibit non-trivial interaction patterns, whichcould be modelled as a network. This field known as complex networks [AB02,Ada99, WS98] introduced a new class of networks that had properties that werenot present in the idealized networks random, complete or regular networks used to model the pattern of interaction between agents.

    5.3 Discussion

    In this subsection I conclude identifying potential improvement in the works in

    [VSSM09].

    Impact of current actions

    One thing that is not clear from the description of the original model in [VSSM09]or the extension in [GA12] is whether the current reward an agent receives de-pends from the current action by the agent it was paired with. Informally, recallthat the rewards are assigned as follows (see Section 2.1): the current actionax by agent x is compared to the majority action a

    . Only if ax a agent

    x gets a positive reward. The majority action is defined as whichever actionis played most by the two players combined [VSSM09] and it is obtained bythe joint history of past actions by the two agents. Hence the reward for agentx depends on the comparison ofax to a function (i.e. the majority action) of

    the joint histories. Notice that these joint histories may or may not include theother agents current action (and ax) depending to the answer to the follow-ing point: are the memories updated before or after the rewards are assigned?Unfortunately, the answer to this points seems unclear both in [VSSM09] and[GA12]. If the memories are updated after the rewards are assigned, then the

    15

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    16/20

    current action plays no role whatsoever for the agents current reward: the re-ward depends from the comparison to the majority action, whose computation

    does not include agentxs peers action.Several models of interaction in MAS assume current actions play a direct

    role on the reward of each agent static games are used in literature formodeling interaction and they make such assumptions [ST97, You93]. IfMRMhad to be extended to encompass these situations, it may have to be testedwhether dynamics of convergence change when updating memories after theassignment of rewards. Also, it is possible that updating memories after therewards have some impact on the spreading of computation.

    The impact of memory and properties of the reward metric

    In the experiments in [VSSM09] higher memories are shown to decrease the con-vergence rates (see Section 3.2). The authors interpretation of such results (see

    subsubsection 3.2.1) is that this occurs because of the reward units appearing inthe equation in 2.1.2. The work in [ST97] also experiments with different mem-ory sizes. Their agents populate an unstructured society (the peers one specificagent may be paired with are selected uniformly) and interact by means of acooperation game. Agents adapt by means of a simple reinforcement learningalgorithm which uses the most rewarding action in the agents history. In thiswork, Shoham and Tennenholtz encounter a decreasing ratio of agents agreeingon a convention when agents use larger history sizes. They provide a differentrationale from that in [VSSM09] explaining that large histories make agentsuse unreliable old information. the fact that old memories may not provide areliable portrait of the present. I propose that this interpretation may play arole in this setting too. In order to test such a hypothesis in one possibilitywould be to experiment with memory restarts: with a certain frequency agentsmemories would be resetted (not their Q-functions though). This would changethe reward functions keeping the strategic attitudes agent developed until thatmoment. If the relation between large memory and poor convergence rates stillheld this may support the interpretation by Villatoro et al. and shed more lighton specific features ofMRM.

    In general, the reward metric in the line of work presented here, MRM, hasseveral novel features compared to main literature in the field of emergence ofconvention. The metricMRMuses conformance to a peers history to measurereward, which is a new paradigm to model interaction compared to some com-monly used in literature where interaction among agents has been explored bystage games [ST97, You93, SA07], games with more than one stage including sys-tem of rewards or punishments [Axe86, PC09] or simplified stereotypical MAS

    settings such as food hunting [WW95, CC95]. Thus some ofMRMs featuresmay deserve some further investigation in themselves. At the moment it is notclear if the properties of convergence depend on the specific learning algorithmchosen and how to connect the results in these works to others in literature.This investigation ofMRM may be experimental, such as the one proposedabove about memory size, or analytical, where one possibility is, for example,

    16

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    17/20

    the formal modelling of the dynamics of the system in one of the simplest struc-tural scenarios, such as one-dimensional lattices. Past work in the emergence of

    convention has made use of analytical approaches [ST97, VHBR97].

    References

    [AB02] Reka Albert and Albert-Laszlo Barabasi. Statistical mechanics ofcomplex networks. Reviews of modern physics, 74(1):47, 2002.

    [Ada99] Lada A Adamic. The small world web. In Research and AdvancedTechnology for Digital Libraries, pages 443452. Springer, 1999.

    [AK01] Guillermo Abramson and Marcelo Kuperman. Social games in asocial network. Physical Review E, 63(3):030901, 2001.

    [Axe86] Robert Axelrod. An evolutionary approach to norms. Americanpolitical science review, 80(04):10951111, 1986.

    [CC95] Rosaria Conte and Cristiano Castelfranchi. Understanding the func-tions of norms in social groups through simulation. Artificial soci-eties: The computer simulation of social life, 1995.

    [CRA99] Michael D Cohen, Rick L Riolo, and Robert Axelrod. The emer-gence of social organization in the prisoners dilemma: How context-preservation and other factors promote cooperation. Technical re-port, 1999.

    [DPS03] Jordi Delgado, Josep M Pujol, and Ramon Sanguesa. Emergenceof coordination in scale-free networks. Web Intelligence and Agent

    Systems, 1(2):131138, 2003.

    [Eps01] Joshua M Epstein. Learning to be thoughtless: Social norms and in-dividual computation. Computational Economics, 18(1):924, 2001.

    [GA12] Nathan Griffiths and Sarabjot Singh Anand. The impact of so-cial placement of non-learning agents on convention emergence. InProceedings of the 11th International Conference on AutonomousAgents and Multiagent Systems-Volume 3. International Foundationfor Autonomous Agents and Multiagent Systems, 2012.

    [Kit93] James E Kittock. Emergent conventions and the structure of multi-agent systems. L. Nadel and D. Stein, eds, 1993.

    [PC09] Michael J Prietula and Daniel Conway. The evolution of metanorms:quis custodiet ipsos custodes? Computational and MathematicalOrganization Theory, 15(3):147168, 2009.

    17

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    18/20

    [SA07] S. Sen and S. Airiau. Emergence of norms through social learning.InProceedings of the 20th international joint conference on Artifical

    intelligence, pages 15071512. Morgan Kaufmann Publishers Inc.,2007.

    [SB98] Richard S Sutton and Andrew G Barto.Reinforcement learning: Anintroduction, volume 1. Cambridge Univ Press, 1998.

    [SC11] Bastin Tony Roy Savarimuthu and Stephen Cranefield. Norm cre-ation, spreading and emergence: A survey of simulation modelsof norms in multi-agent systems. Multiagent and Grid Systems,7(1):2154, 2011.

    [ST97] Yoav Shoham and Moshe Tennenholtz. On the emergence of socialconventions: modeling, analysis, and simulations. Artificial Intelli-gence, 94(1):139166, 1997.

    [VHBR97] John B Van Huyck, Raymond C Battalio, and Frederick W Rankin.On the origin of conventions: evidence from coordination games.The Economic Journal, 107(442):576596, 1997.

    [VSSM09] D. Villatoro, S. Sen, and J. Sabater-Mir. Topology and mem-ory effect on convention emergence. In Proceedings of the 2009IEEE/WIC/ACM International Joint Conference on Web Intelli-gence and Intelligent Agent Technology-Volume 02, pages 233240.IEEE Computer Society, 2009.

    [WS98] Duncan J Watts and Steven H Strogatz. Collective dynamics ofsmall-worldnetworks. nature, 393(6684):440442, 1998.

    [WW95] Adam Walker and Michael Wooldridge. Understanding the emer-gence of conventions in multi-agent systems. In ICMAS, volume 95,pages 384389, 1995.

    [You93] H.P. Young. The evolution of conventions. Econometrica, 61(1):5784, 1993.

    18

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    19/20

    Figure 7: Different Learning Approaches in One Dimensional Lattices withDifferent Memory Sizes (y-axis is in log scale)

    Figure 8: Convergence rate with random placement of FS agents (m= 2)

    19

  • 8/13/2019 The impact of peer pressure on the emergence of conventional norms in structured societies

    20/20

    Figure 9: Convergence rate with differnt placements of FS agents (m= 2) in arandom graph

    20