using free association networks to extract …

USING FREE ASSOCIATION NETWORKS TO EXTRACTCHARACTERISTIC PATTERNS OF AFFECT DYNAMICS

SUPPLEMENTARY MATERIALS

Yaniv DoverJerusalem Business School,

&The Federmann Center for the Study of Rationality,

Hebrew University,Jerusalem, Israel.

[email protected]

Zohar MooreJerusalem Business School,

Hebrew University,Jerusalem, Israel.

[email protected]

January 8, 2020

S1 Permuted Association Network

S1.1 Permuting the network structure

We wish to test the following: (1) whether the convergence of affect depends on the network structure, and (2) whetherconvergence patterns depend on the specific relationship between words and affect. To test whether these two assertionshold, we use random permutations. First, we investigate convergence patterns using simulated random walks on theassociation network, but one for which we permute its links, randomly. In this scenario, we randomly rearrange theassociation network’s links between words. As in section 3(b), we use the weighted random walk simulations in theassociation network to study the convergence of valence and arousal as a function of the affect value of a seed word.Similar to the analysis done on the non-permuted association network itself, we study convergence across 20 randomwalk steps. In this case, however, in order to test the role of the network structure, we randomly permute the networkstructure, i.e., rearrange the network links in a random manner, per each seed word. Figure S1 shows the valenceconvergence and Figure S2 shows the results for arousal convergence patterns.

Both figures show a straightforward pattern, the convergence towards the steady state value is instantaneous, i.e., withinone step for both valence and arousal. This steady state value, perhaps not surprisingly, is equal to the global affectmean of all words in the network. The implication is that the affect convergence pattern, as observed in section 3(b),completely depends on the network structure.

S1.2 Permuting word affect

Figures S3 and S4 show the convergence patterns of valence and arousal, respectively, but for the case in which wordaffect was permuted. In this scenario, unlike the previous one (S1.1), we used the non-permuted association network butrandomly permuted the affect value assigned per each word. Practically speaking, we randomly reassigned the valenceand arousal values per each word such that the affect values associated with each word were, in high probability, not thevalues attributed to that word in the original data. The resulting affect convergence patterns are shown in Figures S3 andS4.

The figures show, like in the previous scenario (S1.1) that without the correct and specific attribution of affect to wordsin the association network, there is no convergence pattern. We, therefore, conclude that both the specific associationstructure and the specific attribution of affect to words, underlie the unique convergence patterns observed in section3(b) of the paper.

SUPPLEMENTARY MATERIALS - JANUARY 8, 2020

Figure S1: A plot of the characteristic trajectories of valence of weighted random walk processes of 20 steps over therandomly permuted association network as a function of their initial valence value. Seed words were divided into ninebins of valence (see legend) and then, per each step (x-axis), the mean valence was plotted. Each colored line representseach bin, i.e., processes beginning with words within the bin. The black dashed line denotes the steady state value (i.e.,value of all processes reached after 100 steps). The gray line denotes the mean valence value across all words.

S2 Testing the Association Network Approach using semantic data: The example of MobyThesaurus II

S2.1 Introduction and data

In order to test the robustness and usefulness of the association network approach for investigating affect dynamics, weperformed the main analyses of section 3 in the paper, but for a text-semantic data set. The purpose of this section is tocompare the weighted random walk network analysis of a semantic network data to that of the association networkdata described in the paper. After we compare between the two data sets, we conclude with some estimations that mayexplain the differences between the data sets.

We chose a well-known and public-domain semnatic network data set: Moby Thesaurus II of the Project Gutenberg.The specific data set was downloaded from the web site: https://github.com/statico/dotfiles/tree/master/.vim/mthes10.

The version of the Moby Thesaurus II that we used includes 25,194 words that are connected to at least one more word,i.e., have at least one synonym in the data. In this data set, a total of 1,372,929 word-to-word links exist. Unlike theword association data, there is no probability (FSG) of transition information in this data. This stems from the fact thatthe relations between words here is binary, i.e., either a word is either synonym of another word or not. For the sake ofsimplicity, we assume uniform transition probabilities within each word, i.e., if a focal word has 10 connected words,then the probability of transition from that focal word to the connected words is 1

10 .

S2.2 Affect changes in single links in the semantic network data

The corresponding analysis to that shown in Figures 1 and 2 in section 3(a) is shown in Figures S5 and S6, respectively.These figures illustrate the distribution of the probability of affect change within a single-link jump. The striking featurein both figures is the strong asymmetry. In Figure S5, there is a marked asymmetry towards a negative change ofvalence between words. Essentially, most word-to-word jumps are those which reduce valence positivity. While in theassociation network data (Figure 1), small jumps and high jumps are most likely for both negative and positive valence,

2


Figure S2: A plot of the characteristic trajectories of arousal of weighted random walk processes of 20 steps over therandomly permuted association network as a function of their initial arousal value. Seed words were divided into ninebins of arousal (see legend) and then, per each step (x-axis), the mean arousal was plotted. Each colored line representseach bin, i.e., processes beginning with words within the bin. The black dashed line denotes the steady state value (i.e.,value of all processes reached after 100 steps). The gray line denotes the mean arousal value across all words.

here the situation is different. All scales of negative jumps are mostly equally likely (almost a flat distribution line fornegative valence changes). When it comes to positive jumps, the higher the jump, the less likely it is. The situation forarousal is the opposite. Figure S6 shows that positive arousal jumps are much more common than negative ones. In thegeneral picture, it seems that there is a strong marked convergence of valence towards relatively low values and a strongmarked convergence of arousal towards higher arousal values. This is consistent with the findings we describe in theconvergence patterns study in the next sub-section.

S2.3 Affect convergence patterns in the semantic network data

Figures S7 and S8 show that convergence towards the steady state values (marked by the dashed black lines) occurquickly, relative to the case of the association network.

Convergence to the steady state values of arousal and valence occur at around steps 5 to 10 for the Moby ThesaurusII data in comparison to the association network data (section 3(b)) in which convergence occurs at around 10–20steps. Correspondingly, the affect vector field pattern (Figure S9) seems also to be a smooth pattern pointed towardsa steady state point in the valence–arousal space. This pattern demonstrates quick convergence from any part of thevalence–arousal space in the network to the steady state point. In what follows, we discuss the main differences betweenthe convergence patterns of the thesaurus data and the association network data.

S2.4 Discussion of the comparison between the association network and the semantic network

Given the above observations, it seems that the affect dynamics in both cases has major similarities, but also somedifferences between the data sets. In both cases, there is a global convergence point, generally, of similar values. Whilein the association data affect converges to a valence of 5.7 and arousal to 4.3, in the case of the semantic network(Moby Thesaurus), affect converges to a valence of 5.2 and arousal of 4.3. The affect steady state point is, therefore,quite similar in both data. This, perhaps, is a clue that this neighborhood of valence-arousal may be the "neutral zone"

3


Figure S3: A plot of the characteristic trajectories of valence of weighted random walk processes of 20 steps overthe association network as a function of their initial valence value calculated for the randomly permuted affect-wordrelations. Seed words were divided into nine bins of valence (see legend) and then, per each step (x-axis), the meanvalence was plotted. Each colored line represents each bin, i.e., processes beginning with words within the bin. Theblack dashed line denotes the steady state value (i.e., value of all processes reached after 100 steps). The gray linedenotes the mean valence value across all words.

towards which affect processes converge universally. Of course, a real claim of universality should be tested moreextensively and using more data sets of various origins.

The main difference between the convergence patterns is that the semantic data exhibits less affect dynamic structurearound the steady state point. Essentially, we observe a basin of convergence towards that point in the thesaurus semanticdata. In the association data, we observe a more intricate structure and more affect "memory" in how affect changesover time in that network. The reasons for this difference is out of scope for this paper. But, we may hypothesizethat a possible cause is that the association network data characterize thought processes more directly, reported by aspecific group of individuals. In contrast, the thesaurus data lack transition weights on the word-to-word association (noprobability transitions), and they also are a result of across-years and across-cultures processes of language constructionand layering. Therefore, the mechanisms behind the generation of the thesaurus semantic data may include severalprocesses which may average each other such that the more local network structure disappears. Still, we leave a moreconcrete explanation to future research.

S2.5 Word network environments for the semantic data set

Similar to the word network environment analysis in the paper (3(c.1)), we compare the top and bottom affect words tothe top/bottom 10-step network environment words.

Table S2 lists the five words for which Vi,e=10 and Ai,e=10 are ranked as highest and lowest. Table S2 shows a patternthat is similar to Table 2 in the paper. The top valence words are words expressing a joyful experience, while the bottomvalence words are mostly associated with extreme violence (and a major illness). For arousal, again, similar to theresults in Table 2 in the paper, highest arousal is related to either sex or violence, while lowest arousal is associated withcalm concepts.

When it comes to word network 10-step environments, in the semantic data (Table S2), it seems that highest (networkenvironment) valence is still associated with joyful experiences, but in this case, mostly related to musical instrumentsor satisfaction. The lowest valence is associated with disease conditions. Interestingly, most of these diseases are not

4


Figure S4: A plot of the characteristic trajectories of arousal of weighted random walk processes of 20 steps overthe association network as a function of their initial arousal value calculated for the randomly permuted affect-wordrelations. Seed words were divided into nine bins of arousal (see legend) and then, per each step (x-axis), the meanarousal was plotted. Each colored line represents each bin, i.e., processes beginning with words within the bin. Theblack dashed line denotes the steady state value (i.e., value of all processes reached after 100 steps). The gray linedenotes the mean arousal value across all words.

Figure S5: A plot of the mean probability of valence changes between linked words in the association network, as afunction of the magnitude of valence change (black curve). For clarity of visualization, the black curve was smoothedby 0.6 valence points (x-axis). The red bars denote the standard error around the mean. The dashed gray line is themirror image of the black curve, provided to visualize the horizontal asymmetry.

5


Figure S6: A plot of the mean probability of arousal changes between linked words in the association network, as afunction of the magnitude of arousal change (black curve). For clarity of visualization, the black curve was smoothedby 0.6 valence points (x-axis). The red bars denote the standard error around the mean. The dashed gray line is themirror image of the black curve provided to visualize the horizontal asymmetry.

Figure S7: A plot of the characteristic trajectories of valence of weighted random walk processes of 20 steps over theassociation network as a function of their initial valence value. Seed words were divided into nine bins of valence (seelegend) and then, per each step (x-axis), the mean valence was plotted. Each colored line represents each bin, i.e.,processes beginning with words within the bin. The black dashed line denotes the steady state value (i.e., value of allprocesses reached after 100 steps). The gray line denotes the mean valence value across all words.

necessarily fatal, but are associated with suffering and irritation which, in some cases, could be long term (vs. thegenerally shorter-term violent events depicted in Table S1). In terms of extreme arousal network environments, it seemsthat the most agitating concepts relate to surgery, specifically surgical removal of body parts. The network environmentwhich is least agitated exhibits concepts related to travel by boat.

6


Figure S8: A plot of the characteristic trajectories of arousal of weighted random walk processes of 20 steps over theassociation network as a function of their initial arousal value. Seed words were divided into nine bins of arousal (seelegend) and then, per each step (x-axis), the mean arousal was plotted. Each colored line represents each bin, i.e.,processes beginning with words within the bin. The black dashed line denotes the steady state value (i.e., value of allprocesses reached after 100 steps). The gray line denotes the mean valence value across all words.

Figure S9: A representation of the two-dimensional (valence–arousal) vector field for the Moby Thesaurus II data set.The valence–arousal space was divided into square bins of 0.25 sides. Each arrow in the figure corresponds to theaverage change of valence and arousal in that bin. Per each bin, all the outgoing links of words with valence and arousalvalues corresponding to that bin were used to calculate the average. Color codes the magnitude of average change pereach bin to improve the visualization, the closer the color to red, the higher the magnitude of the average jump within abin.

The analysis in this section shows, similar to the network environment analysis in the paper (section 3(c)), that networkenvironment affect is an interesting construct. Looking at the network environment of a word provides different aspectsof the experience which may be associated with a word.

7


Table S1: Top and bottom five words of valence and arousal values for single words

Valence top and bottomfive-word listWord Mean Word MeanVacation 8.53 Racism 1.48Happiness 8.48 Murder 1.48Happy 8.47 Leukemia 1.47Christmas 8.37 Torture 1.40Enjoyment 8.37 Rapist 1.30

Arousal top and bottomfive-word listWord Mean Word MeanInsanity 7.79 Soothing 1.91Gun 7.74 Librarian 1.75Sex 7.60 Dull 1.67Rampage 7.57 Calm 1.67Lover 7.45 Grain 1.60

Table S2: Top and bottom words of valence and arousal 10-step environment values

Valence 10-step environment Vi,e=10

top and bottom five-word listWord Mean Word MeanUkulele 6.17 Appendicitis 3.23Samisen 6.12 Cholera 3.25Enjoyable 6.12 Dermatitis 3.28Sitar 6.11 Hookworm 3.29Satisfying 6.10 Gout 3.30

Arousal 10-step environment Ai,e=10

top and bottom five-word listWord Mean Word MeanHysterectomy 5.73 Ferryman 3.52Mastectomy 5.72 Gondolier 3.56Vasectomy 5.72 Sculler 3.58Appendectomy 5.72 Boatman 3.60Tonsillectomy 5.71 Yachtsman 3.62

S3 Test runs of the mood-dependent random walk model

To get a better understanding of the η and λ parameters in the mood-dependent random walk model, we perform a set ofruns in which we study the following value ranges: λ = 0.1, 1, 3, 5 and η = 0.1, 0.5, 1. The value ranges were chosensuch that the effect of their change on model outcomes is noticeable. The value range of η was chosen as a contrast tothe constant value chosen for δ, which was 1. In general, this model deserves a more detailed study which is out ofscope and will be discussed in future work.

In the mood-dependent random walk model, the interpretation of η is to be the extent to which the exposure to certainword affect changes the affect state of the individual (Equation 3.3). For example, a scenario in which η = 0 is onein which the valence state (Sv(t)) does not change over time. The term which η multiplies (Sv(t − 1) − vk) is thedifference between the previous valence state and the valence of the word to which an individual is exposed to. In otherwords, the higher the η, the higher the effect of the valence in the word an individual is exposed to in the individual’svalence state.

The interpretation of λ (Equation (3.4)) is related to the actual transitions between words in the association network.For λ = 0, the transitions between associated words are the original transitions reported in the data, i.e., the pairwisetransition probabilities measured for individuals. As λ increases, transitions to words in which the valence is far fromthe valence state (Sv(t)) become less probable. For example, if Sv(t) = 2, i.e., a person is in a low-valence state, the

8


transition to any word with a valence above or below 2 will be less likely, because the factor e−λ·(Sv(t)−vj)2 will rapidlydecrease, the farther the valence vj of word j is from 2.

To see how the varying of both parameters affects the affect convergence in the model, Figure S10 illustrates the 12 runsfor the different combinations of the parameters. As in all the valence–arousal vector field illustrations, the longer andredder the arrows, the stronger the valence–arousal flows. The values of η and λ corresponding to each cell in the figureare denoted on the boundaries of the figures’ table. In the figure, it is obvious that the higher the λ, the weaker the flowin general. Unlike the bottom left cell (λ = 0.1, η = 0.1), which exhibits strong flows towards the steady state point,the top left cell (λ = 5, η = 0.1) exhibits weak fragmented flows. The increase of η seems to have a similar effect. Thehigher the η (from left to right, e.g., in the top row), the stronger the general flows and lower the fragmentation. Thislatter effect of η is more subtle (and can be noticed by the slight increase of more-red colors with an increase of η). Forexample, it seems that the added (more high arousal, but low valence) steady state equilibrium point diminishes for lowvalue λ (e.g., bottom row figures). In other words, it seems that the emergence of this second equilibrium point stemsfrom the mood-dependence aspect of the random walk.

Figure S10: A tabular figure of the valence–arousal vector field of the association network for 12 combinations of η andλ. The value of the parameters used for each cell are the values noted in the label of the corresponding row and column.The normalization of the vectors illustrated in the field are the same across all figures to enable comparison betweenthem.

We note that these runs are just a "peek" into the complexities of the mood-dependent random walk model. Futureresearch should investigate deeper into the more specific role of each parameter and general traits of the model.

9

using free association networks to extract …

Documents