[ieee 2013 international conference on social computing (socialcom) - alexandria, va, usa...

6
Partitioning and Scaling Signed Bipartite Graphs for Polarized Political Blogosphere Sedat Gokalp Computer Science Arizona State University [email protected] M’hamed Temkit Mathematical Statistics Arizona State University [email protected] Hasan Davulcu Computer Science Arizona State University [email protected] I. Hakki Toroslu Computer Engineering Middle East Technical University [email protected] Abstract—Blogosphere plays an increasingly important role as a forum for public debate. In this paper, given a mixed set of blogs debating a set of political issues from opposing camps, we use signed bipartite graphs for modeling debates, and we propose an algorithm for partitioning both the blogs, and the issues (i.e. topics, leaders, etc.) comprising the debate into binary opposing camps. Simultaneously, our algorithm scales both the blogs and the underlying issues on a univariate scale. Using this scale, a researcher can identify moderate and extreme blogs within each camp, and polarizing vs. unifying issues. Through performance evaluations we show that our proposed algorithm provides an effective solution to the problem, and performs much better than existing baseline algorithms adapted to solve this new problem. In our experiments, we used both real data from political blogosphere and US Congress records, as well as synthetic data which were obtained by varying polarization and degree distribution of the vertices of the graph to show the robustness of our algorithm. I. I NTRODUCTION Blogosphere plays an increasingly important role [1] as a forum of public debate, with knock-on consequences for the media, politics, and policy. Hotly debated issues span all spheres of human activity; from liberal vs. conservative politics, to extremist vs. counter-extremist religious debate, to climate change debate in scientific community, to globalization debate in economics, and to nuclear disarmament debate in security. There are many applications [2], [3], [4], [5], [6] for recognizing politically-oriented sentiment in texts. Previous work [7] studied linking patterns and discussion topics of political bloggers by measuring the degree of interaction between liberal and conservative blogs, and to uncover their differences. In this paper, given a mixed set of blogs debating a set of related issues from two opposing camps, we propose an algorithm to determine (i) which blog lies in which camp, (ii) what are the contested issues, and, (iii) who are mentioned as the key individuals within each camp. Bipartite graphs [8], [9], [10] have been widely used to represent relationships between two sets of entities. We use bipartite graphs to model the relationships between blogs and issues (i.e. topics, individuals, etc.) mentioned within blogs. We use signed weighted edges to represent opinion strengths, where positive edges denote support, and negative edges denote opposition between a blog and an issue. We develop algorithms to solve the following problems on signed bipartite graphs modeling blog debates: 1) Partitioning of both the blogs, and the underlying issues mentioned in blogs, into two opposing camps; 2) Scaling of both the blogs and the underlying issues on a univariate scale such that the position of a vertex is closer to the positions of the vertices it is connected with positive edges, and further away from the positions of the vertices it is connected with negative edges. Using this scale, a researcher can identify both the mod- erate and extreme blogs within each camp, and the polarizing vs. unifying issues. Partitioning and scaling help a researcher to better understand the structure of a social, political or economic debate, or even the details of an emerging geopo- litical conflict in the world. While extremist ends of a scale, may represent blogs with irreconcilable viewpoints, in some cases, moderate blogs may represent viewpoints that are more amenable to engage in a constructive dialog through a set of unifying issues. Moderates may sympathize with some of the claims and grievances of the other side. Longitudinal analysis using our proposed algorithms could reveal interesting dynamics, such as, moderates from opposing camps could be in the process of forming a coalition by making the necessary compromises to reach a consensus. All the while, moderates may be alienating extremists in their own camps who may choose to focus on polarizing issues only, and lash out violent or demonizing rhetoric on everyone else who do not share their exclusivist viewpoints. To the best of our knowledge, simultaneous scaling and partitioning on signed weighted bipartite graphs has not been studied in the literature, and this paper is the first attempt to introduce the problem and provide an effective solution and evaluation strategies. Major contributions of this paper are: an iterative algorithm, named Alternatingly Normalized CO-HITS (ANCO-HITS), to propagate the scores on a signed bipartite graph to solve the partitioning and scaling problems described above; and performance evaluations of blog/issue partitioning and scaling algorithms using synthetic data sets which were obtained by varying polarization and degree distribution of signed bipartite graphs, and analysis with two real data sets: (i) partitioning and scaling of Republicans/Democrats and their roll call votes 1 based on the 111 th US Congress voting records, and (iii) partitioning and scaling of the top 22 liberal and conservative blogs, and the most influential individuals mentioned in these 1 http://thomas.loc.gov/home/rollcallvotes.html SocialCom/PASSAT/BigData/EconCom/BioMedCom 2013 978-0-7695-5137-1/13 $26.00 © 2013 IEEE DOI 10.1109/SocialCom.2013.32 168

Upload: i-hakki

Post on 12-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2013 International Conference on Social Computing (SocialCom) - Alexandria, VA, USA (2013.09.8-2013.09.14)] 2013 International Conference on Social Computing - Partitioning and

Partitioning and Scaling Signed Bipartite Graphsfor Polarized Political Blogosphere

Sedat GokalpComputer Science

Arizona State University

[email protected]

M’hamed TemkitMathematical Statistics

Arizona State University

[email protected]

Hasan DavulcuComputer Science

Arizona State University

[email protected]

I. Hakki TorosluComputer Engineering

Middle East Technical University

[email protected]

Abstract—Blogosphere plays an increasingly important roleas a forum for public debate. In this paper, given a mixed setof blogs debating a set of political issues from opposing camps,we use signed bipartite graphs for modeling debates, and wepropose an algorithm for partitioning both the blogs, and theissues (i.e. topics, leaders, etc.) comprising the debate into binaryopposing camps. Simultaneously, our algorithm scales both theblogs and the underlying issues on a univariate scale. Usingthis scale, a researcher can identify moderate and extreme blogswithin each camp, and polarizing vs. unifying issues. Throughperformance evaluations we show that our proposed algorithmprovides an effective solution to the problem, and performsmuch better than existing baseline algorithms adapted to solvethis new problem. In our experiments, we used both real datafrom political blogosphere and US Congress records, as wellas synthetic data which were obtained by varying polarizationand degree distribution of the vertices of the graph to show therobustness of our algorithm.

I. INTRODUCTION

Blogosphere plays an increasingly important role [1] asa forum of public debate, with knock-on consequences forthe media, politics, and policy. Hotly debated issues spanall spheres of human activity; from liberal vs. conservativepolitics, to extremist vs. counter-extremist religious debate, toclimate change debate in scientific community, to globalizationdebate in economics, and to nuclear disarmament debate insecurity. There are many applications [2], [3], [4], [5], [6]for recognizing politically-oriented sentiment in texts. Previouswork [7] studied linking patterns and discussion topics ofpolitical bloggers by measuring the degree of interactionbetween liberal and conservative blogs, and to uncover theirdifferences. In this paper, given a mixed set of blogs debatinga set of related issues from two opposing camps, we proposean algorithm to determine (i) which blog lies in which camp,(ii) what are the contested issues, and, (iii) who are mentionedas the key individuals within each camp.

Bipartite graphs [8], [9], [10] have been widely used torepresent relationships between two sets of entities. We usebipartite graphs to model the relationships between blogsand issues (i.e. topics, individuals, etc.) mentioned withinblogs. We use signed weighted edges to represent opinionstrengths, where positive edges denote support, and negativeedges denote opposition between a blog and an issue.

We develop algorithms to solve the following problems onsigned bipartite graphs modeling blog debates:

1) Partitioning of both the blogs, and the underlyingissues mentioned in blogs, into two opposing camps;

2) Scaling of both the blogs and the underlying issueson a univariate scale such that the position of avertex is closer to the positions of the vertices itis connected with positive edges, and further awayfrom the positions of the vertices it is connected withnegative edges.

Using this scale, a researcher can identify both the mod-erate and extreme blogs within each camp, and the polarizingvs. unifying issues. Partitioning and scaling help a researcherto better understand the structure of a social, political oreconomic debate, or even the details of an emerging geopo-litical conflict in the world. While extremist ends of a scale,may represent blogs with irreconcilable viewpoints, in somecases, moderate blogs may represent viewpoints that are moreamenable to engage in a constructive dialog through a setof unifying issues. Moderates may sympathize with someof the claims and grievances of the other side. Longitudinalanalysis using our proposed algorithms could reveal interestingdynamics, such as, moderates from opposing camps could bein the process of forming a coalition by making the necessarycompromises to reach a consensus. All the while, moderatesmay be alienating extremists in their own camps who maychoose to focus on polarizing issues only, and lash out violentor demonizing rhetoric on everyone else who do not share theirexclusivist viewpoints.

To the best of our knowledge, simultaneous scaling andpartitioning on signed weighted bipartite graphs has not beenstudied in the literature, and this paper is the first attempt tointroduce the problem and provide an effective solution andevaluation strategies.

Major contributions of this paper are: an iterative algorithm,named Alternatingly Normalized CO-HITS (ANCO-HITS), topropagate the scores on a signed bipartite graph to solvethe partitioning and scaling problems described above; andperformance evaluations of blog/issue partitioning and scalingalgorithms using synthetic data sets which were obtained byvarying polarization and degree distribution of signed bipartitegraphs, and analysis with two real data sets: (i) partitioningand scaling of Republicans/Democrats and their roll call votes1

based on the 111th US Congress voting records, and (iii)partitioning and scaling of the top 22 liberal and conservativeblogs, and the most influential individuals mentioned in these

1http://thomas.loc.gov/home/rollcallvotes.html

SocialCom/PASSAT/BigData/EconCom/BioMedCom 2013

978-0-7695-5137-1/13 $26.00 © 2013 IEEE

DOI 10.1109/SocialCom.2013.32

168

Page 2: [IEEE 2013 International Conference on Social Computing (SocialCom) - Alexandria, VA, USA (2013.09.8-2013.09.14)] 2013 International Conference on Social Computing - Partitioning and

blogs. In our experiments, variance in polarization relates to thedistributions of the ratio of vertices corresponding to extremesvs. moderates.

Alongside our proposed ANCO-HITS, we also evaluatedtwo baseline algorithms, namely CO-HITS [8] and spectralclustering [11]. Although Co-HITS was designed for scalingunsigned bipartite graphs, it can be directly applied for scal-ing signed bipartite graphs, and partitioning by consideringthe signs of vertex values. Spectral clustering algorithm wasdesigned for partitioning of graphs, and it can also producea scale by using the component values of the eigenvectorassociated with the second smallest positive eigenvalue of thegraph Laplacian [12], [13]. Our experiments showed that theANCO-HITS algorithm is the only robust algorithm in thepresence of variance in polarization and vertex degrees.

The rest of this paper is organized as follows. We reviewrelated work in Section 2. Section 3 presents the problemformulation. In Sections 4 and 5, we present the spectral clus-tering and CO-HITS algorithms as the baselines. In Section 6,we present our proposed ANCO-HITS algorithm. We describeand report experimental evaluations in Section 7. Finally, wepresent conclusions and future work in Section 8.

II. RELATED WORK

Scaling vertices of a graph based on the network structurerather than individual properties has been of great interest formore than a decade. Two most well-known algorithms arethe PageRank [14] and the HITS [15] algorithms. They weredesigned to rank the vertices of graphs with positive weightededges. Spectral analysis show that both PageRank and HITSalgorithms converge. An important distinction between the twoalgorithms is that; the HITS algorithm provides two differ-ent types of rankings corresponding to hubs and authorities,whereas PageRank provides only a single ranking.

Many data types from data mining applications can bemodeled as bipartite graphs, examples include terms anddocuments in a text corpus, customers and items purchasedin market basket analysis and bloggers writing about currentissues.

Based on variations of HITS and PageRank, many re-searchers have proposed algorithms. In [8], the authors proposea modification of the HITS algorithm to work on bipartitegraphs called CO-HITS. The main difference between HITSand CO-HITS is that; HITS provides two scores for eachvertex, whereas CO-HITS provides one score for each typeof vertex. In this paper, we use CO-HITS as one of thebaseline algorithms, and in order to overcome its deficiencies,we extend it with normalization steps.

The clustering coefficient was first introduced in [16] tomeasure how much multiplicative transitivity property thegraph exhibits, which reflects the tendency of the vertices toform small groups. In [17], authors define a new coefficient us-ing the multiplicative transitivity for signed graphs to measurestructural equilibrium.

Data mining methods such as clustering have been usedquite extensively for exploratory data mining applications [18],[19]. Clustering analysis [20] provides a partitioning of thedata into subsets, called clusters such that the objects in a

cluster are more similar than those in distinct clusters. Spectralclustering [21], [11], [12] is a powerful clustering method thatis able to outperform K-means clustering [22], which has amajor drawback of not being able to separate clusters thatare non-linearly separable in input space [21]. The method isbased on computing the eigenvalues of the normalized versionof the graph Laplacian, and has theoretical connections withthe normalized cut of the graph. In particular when clustering abipartite graph into two balanced clusters, the second smallestpositive eigenvalue [12] is the solution to the normalized cut ofthe graph. In recent years, several authors have used spectralclustering to analyze bipartite graphs [10]. Furthermore, somework has been done to take into account a signed adjacencymatrix by using an augmented adjacency matrix [23]. In thispaper, spectral clustering was also used as one of our baselinemethods for partitioning and scaling signed bipartite graphs.

III. PROBLEM FORMULATION

A. CoScaling for signed bipartite graphs

Given

• G = (U ∪ V,A) is a bipartite graph consisting oftwo disjoint sets of vertices U and V , and a signedadjacency matrix A

• U = {u1, u2, . . . , um}, a set of m vertices

• V = {v1, v2, . . . , vn}, a set of n vertices

• A ∈ Rm×n, where aij represents the signed edge

between ui and vj

Find

• X = (x1, x2, . . . , xm), where xi ∈ R is the assignedvalue of the vertex ui

• Y = (y1, y2, . . . , yn), where yi ∈ R is the assignedvalue of the vertex vi

such that

• sgn(xi) and sgn(yi) shall determine the polarity ofthe vertices i.e. −1 and +1 as the opposing polarities

• xi value for a vertex ui should be closer to theyj values of the vertices that it supports (connectspositively), and further away from the yk values ofthe vertices that it opposes (connects negatively). Themagnitudes of xi and yj denote the extremity of thenodes ui and vj . i.e. magnitudes closer to 0 meaningmore moderate and larger magnitudes meaning moreextreme.

Figure 1 depicts a perfectly polarized bipartite graph. Thetwo axes X and Y represent the univariate scale for the nodesin U and V . The vertices to the right of zero have positivevalues, and the vertices to the left have negative values onthe scale. A green solid line between the nodes ui and vjrepresents support, and a red dashed line represents opposition.

Figure 2 shows an example of two vertices; u1 beingextreme and u2 being more moderate. u1 supports the verticesof same polarity, and opposes the vertices of the oppositepolarity. However, u2 has mixed support and opposition.

169

Page 3: [IEEE 2013 International Conference on Social Computing (SocialCom) - Alexandria, VA, USA (2013.09.8-2013.09.14)] 2013 International Conference on Social Computing - Partitioning and

u1 u2 u3 um-2 um-1 um

v1 v2 v3 vn-2 vn-1 vn

X

Y

0

Fig. 1. Perfectly polarized bipartite graph

u1 X

Y

0

u2 X

Y

0

Fig. 2. Extreme vs. Moderate vertices

Although partitioning algorithms can be utilized to detectthe polarity of vertices, it is not possible to distinguish ex-tremes from moderates. Scaling overcomes this problem andmakes it possible to compare two vertices of same polarity.In this paper, we are not only able to compare pairs ofvertices, but also provide the exact locations on the scale,therefore providing valuable information about the shape ofthe distribution as well.

To solve this co-scaling problem, we present two baselinemethods. The first one is a common modification [24], [10],[25] of the well-known Spectral Clustering approach to workon graphs with signed edges. The second one is the CO-HITS [8] algorithm, that is a modification of the well-knownHITS algorithm, designed for bipartite graphs.

Finally, we compare these baseline methods with a novelalgorithm we developed for co-scaling problem, named Alter-natingly Normalized CO-HITS (ANCO-HITS).

IV. SPECTRAL CLUSTERING

Spectral clustering [12] uses spectral graph theory, whichis the study of graphs using linear algebra methods. In thiscontext, for a given graph, its edge set is represented byan adjacency matrix. The eigenvectors of the normalizedLaplacian of the adjacency matrix are used to partition thegraph into clusters, where objects in a cluster are more similarthan those in distinct clusters. Spectral clustering incorporatesthe properties of a graph via the adjacency matrix and is able tooutperform K-means clustering in many situations, especiallyin the presence of non-convex groups of data. This methodhas close connections with the normalized cut [11] of thegraph. In particular when clustering a bipartite graph into twobalanced clusters, the second smallest positive eigenvalue ofthe Laplacian matrix [21] is the solution to the problem ofminimizing the normalized cut of the graph.

Spectral clustering embeds the data into the subspace ofthe eigenvectors of the Laplacian. In this paper, we will usespectral clustering to partition both types of vertices of a signedbipartite graph into two polarities and provide a scale for thevertices. We will do this by using the sign and the value of

u1 X

Y

0

u2 X

Y

0

Fig. 3. Extremity vs Degree

the components of the eigenvector associated with the smallestpositive eigenvalue of the Laplacian.

Spectral clustering uses an adjacency matrix with all posi-tive entries. However, our problem assumes a signed adjacencymatrix. One of the common techniques to circumvent thisproblem is to augment the matrix into a bigger matrix [10],[25], [24], such that all entries are positive. The first half ofthe augmented matrix is reserved for the entries with positivevalues, and the second half is reserved for the entries withnegative values.

V. CO-HITS

In [8], the authors modify the well-known HITS [15]algorithm and propose the CO-HITS algorithm which is usedto rank vertices of a bipartite graph. Even though the adjacencymatrix has only positive values in the original HITS paper, thetheory still holds for adjacency matrices with signed entries.

Algorithm 1 describes the steps of the CO-HITS algorithmfor the co-scaling problem.

Data: Adjacency matrix AResult: Scale vectors X and YInitiate X<0> = (1, 1, . . . , 1) ;Initiate Y <0> = (1, 1, . . . , 1) ;repeat

Update X;Update Y ;

until X vector converges;Algorithm 1: Iterative update procedure for CO-HITS

The update functions for X and Y are defined as follows:

x<k>i =

n∑

j=1

aijy<k−1>j y<k>

j =m∑

i=1

aijx<k>i

and convergence is achieved when∥∥X<k> −X<k−1>

∥∥2< ε

with ε a small positive value.

The drawback of this method is its sensitivity to the vertexdegrees. Two vertices with same polarities but different degreeswill result with the higher degree vertex having a higher scoreon the scale.

For example, let us consider two vertices u1 and u2 withu1 having a smaller degree than u2. However, let u1 be morepolarized than u2 as shown in Figure 3. In this scenario,the corresponding scale values for u1 and u2 should satisfy|x1| > |x2|. But, this will not be the case with the CO-HITS.This suggests a better algorithm that accounts for the negativeimpact of degree variation through some normalization mech-anism.

170

Page 4: [IEEE 2013 International Conference on Social Computing (SocialCom) - Alexandria, VA, USA (2013.09.8-2013.09.14)] 2013 International Conference on Social Computing - Partitioning and

VI. ALTERNATINGLY NORMALIZED

CO-HITS (ANCO-HITS)

According to our problem formulation, the values of thevertices on the scale shall not be sensitive to their degrees, butrather be sensitive to what kind of relations they have with theother set of vertices.

For this purpose, we propose ANCO-HITS algorithm,which introduces a normalization mechanism to address theissue of degree sensitivity of CO-HITS. The proposed methoduses the same iteration procedure described in Algorithm 1.The update functions for X and Y are modified such that theyare normalized as follows:

x<k>i =

n∑j=1

aijy<k−1>j

n∑j=1

|aij |y<k>j =

m∑i=1

aijx<k>i

m∑i=1

|aij |

VII. EXPERIMENTS & EVALUATIONS

To validate our algorithm, we have used two differentdatasets that are US Congress and political blogosphere. Inaddition to real data, we introduced a model to generatesynthetic data to analyze the performance of the algorithmsfor various parameters.

A. US Congress

The US Congress has been collecting data since the veryfirst congress of the US history. This data has been encodedas XML files and publicly shared through the govtrack.usproject2.From various types of data available at the projectsite, we collected the roll call votes for the 111th US Congresswhich includes The Senate and The House of Representativesand covers the years 2009-2010. According to The Library ofCongress3,

A roll call vote guarantees that every Member’s voteis recorded, but only a minority of bills receive a rollcall vote.

The 111th Senate has 1084 senators and the data containstheir votes on 696 bills, and The 111th House has 451 repre-sentatives and the data contains their votes on 1655 bills.

We extracted the adjacency matrix A ∈ {−1, 0, 1}|U |×|V |,with U vertices representing the congressmen, and the Vvertices representing the bills. The values aij are 1 if the con-gressman ui votes ‘Yea’ for the bill vj , -1 if the congressmanvotes ‘Nay’, and 0 if he did not attend the session.

The aforementioned scaling algorithms will scale both thecongressmen and the bills. In presence of partisanship5 in theCongress, the sign of the scale values for the congressmenshould correspond to the Democrat and Republican parties,and the magnitude of the scale values should represent the

2http://www.govtrack.us/data3http://thomas.loc.gov/home/rollcallvotes.html4Normally, each congress has 100 senators (2 from each state), however in

many of the congresses, there are unexpected changes on the seats caused bydisplacements or deaths.

5Partisanship can be defined as being devoted to or biased in support of aparty.

111th US Senate

Fig. 4. Vote matrix of the 111th US Senate after scaling with ANCO-HITS

Fig. 5. Bipartite graph of the 111th US Senate after scaling with ANCO-HITS

amount of partisanship. The first two columns of Table Iprovide information about this data as well as the partitioningaccuracies of the algorithms.

We analyzed the congressmen that have been assigned tobe moderate by each algorithm. We observed that the baselinealgorithms tend to have the congressmen with less number ofvotes (i.e. lesser degree) to be moderate regardless of theirpartisanship. On the other hand, when we queried the namesassigned to be most moderates by the ANCO-HITS, for bothDemocrats and Republicans, we were able to identify a numberof supporting articles matching the ANCO-HITS scaling [26],[27], [28], [29].

Figure 4 depicts the vote matrices of the 111th USCongress, with rows representing the congressmen and thecolumns representing the bills. The light green color represents’Yea’ votes, and dark red represents ’Nay’ votes. Scaling thesegraphs leads to a re-ordering of the rows and columns wherecongressmen and bills are co-clustered together.

Figure 5 represents the bipartite graph of the 111th USCongress data after scaling both the congressmen and the billswith ANCO-HITS. The light green colored edges represent’Yea’ votes, and dark red represents ’Nay’ votes. Similar to ourmotivating Figure 1, this figure also shows partisan behaviorin the 111th US Congress.

B. Political Blogosphere

As Web 2.0 platforms gained popularity, it became easyfor web users to be a part of the web and express theiropinions, mostly through blogs. Most blogs are maintainedby individuals, whereas there are also professional blogs witha group of authors. In this study, we focus on a set of popularpolitical liberal and conservative blogs that have clearly de-clared positions. These blogs contain discussions about social,political, economic issues and related key individuals. Theyexpress positive sentiment towards individuals whom theyshare ideologies with, and negative sentiment towards others.In these blogs, it is common to see criticism of people withinthe same camp, or support for people from the other camp.

171

Page 5: [IEEE 2013 International Conference on Social Computing (SocialCom) - Alexandria, VA, USA (2013.09.8-2013.09.14)] 2013 International Conference on Social Computing - Partitioning and

In this experiment, we collected a list of 22 most popularliberal and conservative blogs from the Technorati6 rankings.For each blog, we fetched the posts for the period of 6 monthsbefore the 2008 US presidential elections (May - October,2008) due to the intensity of the debates and discussions.

We use AlchemyAPI7 to run a named entity tagger toextract people names from the posts, and entity-level sentimentanalysis which provided us with weighted sentiment (positivevalues indicating support, and negative indicating opposition)for each person. This information was used to synthesize asigned bipartite graph, where the blogs and people correspondto the two sets of vertices U and V . The aij values of theadjacency matrix A are the cumulative sum of sentiment valuesfor each mention of the person vj by the blog ui.

To get a gold standard list of the most influential liberal andconservative people, we used The Telegraph List8 for 2007.The third column of Table I provides information about thisdata as well as the partitioning accuracies of the algorithms.

C. Synthetic Data

The actual partitioning information for the real datasetswere available, which made it possible to check the partitioningaccuracy of the algorithms. However, to thoroughly check thescaling accuracy of the algorithms, we developed a method togenerate random bipartite graphs with the following properties:

• The degrees and the scores for the vertices in Uand V follow independent probability distribution withvarying parameters and shapes.

Following procedure describes the method to generaterandom graphs.

RandomGraph(m, n, Ddegree, Dscale)Let A, U , V , X and Y be defined as in Definition 1.

1) Create m vertices for U , and n vertices for V2) Independently assign degrees 1 ≤ d(ui) ≤ m and

1 ≤ d(vj) ≤ n with probability distribution Ddegree

3) Independently assign −1 ≤ xi, yj ≤ +1 values withprobability distribution Dscale.

4) Generate an adjacency list of pairs (ui, vj), whereeach node ui and vj occur in the list d(ui) and d(vj)times respectively.

5) For each adjacency pair (ui, vj), assign the entry inthe adjacency matrix aij with value

a) sgn(xi)×sgn(yj), with probability 1− (1−|xi|)(1− |yj |)

b) −sgn(xi) × sgn(yj), with probability (1 −|xi|)(1− |yj |)

In our experiments, the number of vertices of the graph ism = n = 100. We used four different distributions for varyingpolarization. These were perfectly polarized, Beta, bimodaland uniform distributions [30]. Perfectly polarized distributionwas obtained by mapping all vertices to the extremes of bothsides with equal probability. We used three different normal

6http://technorati.com/7http://www.alchemyapi.com/8http://www.telegraph.co.uk/news/uknews/1435447/The-top-US-

conservatives-and-liberals.html

distributions for varying the degree distributions of vertices.Degree distributions were obtained by N (μ = 30, σ = 2),N (μ = 30, σ = 5) and N (μ = 30, σ = 10) in order toevaluate the effect of degree variance on the performance ofthe algorithms. We also experimented with different μ valuesof 10, 30, and 50 in order to measure the effect of densityvariations of the graph on the performance of the algorithms,which did not show any significant impact.

For each polarization and degree distribution we testedthe performance of two baseline algorithms and our proposedalgorithm, and made the following observations:

• Across all polarizations, as the vertex degree varianceincreases, overall errors for baseline algorithms in-crease due to their sensitivity to vertex degrees.

• Between the baseline algorithms, spectral clusteringconsistently outperforms CO-HITS.

• Even though spectral clustering performs almost asgood as our proposed ANCO-HITS for bimodal anduniform polarization distributions, when the polariza-tion is high, as in the other two distributions, itsperformance degrades.

• As polarization increases, ANCO-HITS performancealso increases. In case of perfect polarization, ANCO-HITS has almost no error.

Overall, in every single case our proposed ANCO-HITSalgorithm outperforms the baselines.

VIII. CONCLUSIONS & FUTURE WORK

In this paper, we introduced a new problem for scaling andpartitioning signed weighted bipartite graphs. We adapted twoexisting algorithms, and proposed a new algorithm to solve thisproblem. We used both real data from political blogosphereand US Congress records, as well as synthetic data to evaluatethese algorithms. Our experiments showed that our proposedalgorithm is very effective and outperforms the two otherbaselines. The algorithms in source code and the test data isavailable online at http://www.PartisanScale.com/paperdata

In real world graphs, it is rarely the case that all the nodeshave the same degree. Some bloggers have more extensivecoverage than others. Similarly, some senators miss or abstainon more votes than others. Partisanship shall not be affectedby the variance in node degrees. A marginal example is thecase when a senator supersedes a deceased one through theend of the term. All the baseline algorithms assign this lowdegree senator to a moderate location, even though he or shecan be extremely partisan. ANCO-HITS manages the degreebias by a normalizing scheme, and places this senator to anappropriate location.

In order to analyze longitudinal voting patterns for theUS Congresses, we downloaded all voting records since the1st US Congress, executed ANCO-HITS, and produced aninteractive visualization system as described in [31]. Thissystem is accessible online at www.PartisanScale.com

Our future work involves developing techniques for detect-ing and presenting both friendly and unfriendly neighborhoodsof a blog or an issue, and their agreements and disagreements.

172

Page 6: [IEEE 2013 International Conference on Social Computing (SocialCom) - Alexandria, VA, USA (2013.09.8-2013.09.14)] 2013 International Conference on Social Computing - Partitioning and

TABLE I. DESCRIPTIVE SUMMARIES OF THE GRAPHS FOR EACH DATASET WITH THE PARTITIONING ACCURACIES FOR EACH ALGORITHM

111th US Senate 111th US House Political Blogosphere

Vertices in U64 Democrat + 42 Republican 268 Democrat + 183 Republican 13 Liberal + 9 Conservative

Senators Representatives Blogs

Vertices in V696 1655 20 Liberal + 14 ConservativeBills Bills People

Graph Density 88.36% 91.23% 39.04%

Spectral Clustering 100.00% 99.11% 75.39%

CO-HITS 100.00% 99.56% 98.21%

ANCO-HITS 100.00% 99.56% 98.21%

ACKNOWLEDGMENTS

We would like to thank Dananjayan Thirumalai for helpingus collect the political blogosphere data. This research wassupported in part by US DOD Minerva Research Initiativegrant N00014-09-1-0815.

REFERENCES

[1] D. Drezner and H. Farrell, “The power and politics of blogs,” PublicChoice, vol. 134, pp. 15–30, 2008.

[2] T. Mullen and R. Malouf, “A preliminary investigation into sentimentanalysis of informal political discourse,” in AAAI symposium on com-putational approaches to analysing weblogs (AAAI-CAAW), 2006, pp.159–162.

[3] R. Malouf and T. Mullen, Graph-based user classification for informalonline political discourse, 2007.

[4] M. Thomas, B. Pang, and L. Lee, “Get out the vote: Determiningsupport or opposition from congressional floor-debate transcripts,” inIn Proceedings of EMNLP, 2006, pp. 327–335.

[5] M. Bansal, C. Cardie, and L. Lee, “The power of negative thinking:Exploiting label disagreement in the min-cut classification framework,”Proceedings of COLING: Companion volume: Posters, pp. 13–16, 2008.

[6] W. Lin and A. Hauptmann, “Are these documents written from differentperspectives?: a test of different perspectives based on statistical distri-bution divergence,” in Proceedings of the 21st International Conferenceon Computational Linguistics and the 44th annual meeting of the Asso-ciation for Computational Linguistics. Association for ComputationalLinguistics, 2006, pp. 1057–1064.

[7] L. A. Adamic and N. Glance, “The political blogosphere and the 2004u.s. election: divided they blog,” in Proceedings of the 3rd internationalworkshop on Link discovery, ser. LinkKDD ’05. New York, NY, USA:ACM, 2005, pp. 36–43.

[8] H. Deng, M. Lyu, and I. King, “A generalized co-hits algorithm andits application to bipartite graphs,” in Proceedings of the 15th ACMSIGKDD international conference on Knowledge discovery and datamining. ACM, 2009, pp. 239–248.

[9] M. Rege, M. Dong, and F. Fotouhi, “Co-clustering documents and wordsusing bipartite isoperimetric graph partitioning,” in Data Mining, 2006.ICDM’06. Sixth International Conference on. IEEE, 2006, pp. 532–541.

[10] H. Zha, X. He, C. Ding, H. Simon, and M. Gu, “Bipartite graph par-titioning and data clustering,” in Proceedings of the tenth internationalconference on Information and knowledge management. ACM, 2001,pp. 25–32.

[11] U. V. Luxburg, “A tutorial on spectral clustering,” 2007.

[12] A. Ng, M. Jordan, and Y. Weiss, “On spectral clustering: Analysis andan algorithm,” in Advances in Neural Information Processing Systems14: Proceeding of the 2001 Conference, 2001, pp. 849–856.

[13] J. Shi and J. Malik, “Normalized cuts and image segmentation,” PatternAnalysis and Machine Intelligence, IEEE Transactions on, vol. 22, no. 8,pp. 888 –905, aug 2000.

[14] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citationranking: Bringing order to the web.” 1999.

[15] J. Kleinberg, “Authoritative sources in a hyperlinked environment,”Journal of the ACM (JACM), vol. 46, no. 5, pp. 604–632, 1999.

[16] D. Watts and S. Strogatz, “Collective dynamics of small-world net-works,” Nature, vol. 393, no. 6684, pp. 440–442, 1998.

[17] J. Kunegis, A. Lommatzsch, and C. Bauckhage, “The slashdot zoo:mining a social network with negative edges,” in Proceedings of the18th international conference on World wide web. ACM, 2009, pp.741–750.

[18] I. Dhillon, J. Fan, and Y. Guan, “Efficient clustering of very largedocument collections,” Data mining for scientific and engineeringapplications, pp. 357–381, 2001.

[19] N. Slonim and N. Tishby, “Document clustering using word clusters viathe information bottleneck method,” in Proceedings of the 23rd annualinternational ACM SIGIR conference on Research and development ininformation retrieval. ACM, 2000, pp. 208–215.

[20] P. Berkhin, “Survey of clustering data mining techniques,” GroupingMultidimensional Data: Recent Advances in Clustering, pp. 25–71,2006.

[21] I. Dhillon, Y. Guan, and B. Kulis, “Kernel k-means: spectral clusteringand normalized cuts,” in Proceedings of the tenth ACM SIGKDDinternational conference on Knowledge discovery and data mining.ACM, 2004, pp. 551–556.

[22] J. Hartigan and M. Wong, “Algorithm as 136: A k-means clusteringalgorithm,” Journal of the Royal Statistical Society. Series C (AppliedStatistics), vol. 28, no. 1, pp. 100–108, 1979.

[23] J. Kunegis, S. Schmidt, A. Lommatzsch, J. Lerner, E. De Luca,and S. Albayrak, “Spectral analysis of signed graphs for clustering,prediction and visualization,” in Proc SDM. Citeseer, 2010.

[24] I. Dhillon, “Co-clustering documents and words using bipartite spectralgraph partitioning,” in Proceedings of the seventh ACM SIGKDDinternational conference on Knowledge discovery and data mining.ACM, 2001, pp. 269–274.

[25] X. Fern and C. Brodley, “Solving cluster ensemble problems by bipartitegraph partitioning,” in Proceedings of the twenty-first internationalconference on Machine learning. ACM, 2004, p. 36.

[26] S. T. Dennis, “Senate moderates look for more influence,”http://www.rollcall.com/issues/56 90/-203808-1.html.

[27] J. Newton-Small, “Can ben nelson get a bipartisan stimulus win,”http://www.time.com/time/politics/article/0,8599,1877535,00.html.

[28] “Factions in the republican party (united states),”

http://en.wikipedia.org/wiki/Factions in the Republican Party (United States).

[29] “Blue dog coalition,” http://ross.house.gov/BlueDog/Members/.

[30] V. Rohatgi and A. Saleh, An introduction to probability and statistics.Wiley-India, 2008.

[31] S. Gokalp and H. Davulcu, “Partisan scale,” in Proceedings of the 21stinternational conference companion on World Wide Web. ACM, 2012,pp. 349–352.

173