social network clustering - math.ucla.edubertozzi/workforce/summer2011/gang-networks-2011/... ·...

39
Preliminaries Attempted Solutions and Results Recommended Solution and Results Artificial Data Future Work Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California Los Angeles August 2, 2011 Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Upload: vantruc

Post on 21-Aug-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Social Network Clustering

Kyle Luh, Peter Elliott, and Raymond Ahn

University of California Los Angeles

August 2, 2011

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 2: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Outline

1 Preliminaries

2 Attempted Solutions and Results

3 Recommended Solution and Results

4 Artificial Data

5 Future Work

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 3: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Hollenbeck Gang Activity

Hollenbeck has an area ofapproximately 15.2 miles.

In this area, 31 violent gangsreside.

Hollenbeck is one of the topthree most violent LApolicing regions.

Gang violence in this regionhas existed since beforeWWII.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 4: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Hollenbeck Gang Activity

The LAPD has provided an Excel database of non-criminal stopsthey have made in the Hollenbeck area. The data includes:

time of stop

location (gang territory andcoordinates)

gang affiliation

sex

age

ethnicity

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 5: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Goals

Use clustering techniques to predict unknown gang affiliations.

Detect other social structures that may not be captured bygang affiliation.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 6: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Graph Models

Convert individuals into nodes.

Edge weights indicate similarity.

Unfortunately, data is sparse.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 7: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Choosing a Measure of Similarity

A function of Euclidean distance

Dot product of feature vector

Gang territoryIndividuals and their gang associationsIndividual to individual interactions

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 8: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Results

Actual Gang Clusters

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 9: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Outline

1 Preliminaries

2 Attempted Solutions and Results

3 Recommended Solution and Results

4 Artificial Data

5 Future Work

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 10: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

K-means algorithm

Algorithm

Choose number of partitions.

Compute centroids.

Shift centers to the centroid of their affiliated points.

Repeat until equilibrium is achieved.

Cons

The K-means algorithm only accounts for location. We hope toutilize more of the data.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 11: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Results: K-means Approach

6.47 6.48 6.49 6.5 6.51 6.52 6.53

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 12: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

A Metric for Cluster Evaluation

We define purity to be

purity(Ω,C) =1

N

∑k

maxj |ωk ∩ cj |

where Ω = ω1, · · · , ωK are the clusters andC = c1, · · · , cj are the actual classes.

Another measure we used was Adjusted MutualInformation which may be more appropriate since our gangsvary significantly in size.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 13: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Results: K-means Approach

Purity ≈ 0.4 and AMI ≈ 0.4

6.47 6.48 6.49 6.5 6.51 6.52 6.53

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 14: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Modularity Maximization

Modularity compares the number of edges within a cluster tothe number expected

Maximize modularity.

We can calculate the change in modularity at each step andstop when the change is not positive

[M.J. Newman, 2006]

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 15: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Results: Modularity Maximization

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 16: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Convergence of Iterated Correlations (CONCOR)

Compute correlations of entries to the mean of rows/columns

Continue to calculate the correlations of the correlation matrixuntil we are left with +1 and −1.

The method is repeated on each cluster to achieve a finerpartition.

[Wasserman, 1994]

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 17: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Results: CONCOR

Purity ≈ 0.5 and AMI ≈ 0.46.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 18: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Outline

1 Preliminaries

2 Attempted Solutions and Results

3 Recommended Solution and Results

4 Artificial Data

5 Future Work

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 19: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Spectral Clustering

Create a matrix of eigenvectors of the Adjacency matrix.The eigenvectors capture the axes which contain the mostvariation in the data.Run k-means algorithm on new space.

[Ng et al, 2001]Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 20: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Eigenvalues

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 21: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Eigenvector Plots: Distance Only

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 22: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Eigenvector Plots: Social Information Only

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 23: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Where to go from here?

The geographic data provides no insights.

The social data is so sparse that its eigenvectors arecompletely useless alone.

We decided to combine the two adjacency matrices,αA + (1− α)B, where α is a weighting parameter.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 24: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Eigenvector Plots: Combined

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 25: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Clustering Results

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 26: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Clustering Results

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 27: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Clustering Results

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 28: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Results: Spectral Approach

Purity ≈ .7 and AMI ≈ .65

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 29: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Outline

1 Preliminaries

2 Attempted Solutions and Results

3 Recommended Solution and Results

4 Artificial Data

5 Future Work

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 30: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Artificial Data

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 31: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Artificial Data

Inputs:

Number of people

Number of communities

Gang multiplier

Threshold

G (x , y) =(ηx+ηy )σdist(x ,y) (1 + Mδij)

[N Masuda, 2005]

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 32: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Artificial Data

Inputs:

Number of people

Number of gangs

Probability within gangs

Probability outside gangs

[A. Lancichinetti, 2008]Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 33: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Artificial Data

Inputs:

Number of people

Number of gangs

Average degree

Mixing parameter

[E.N. Gilbert, 1959]

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 34: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Artificial Data

Inputs:

Number of peopleNumber of gangsAverage degreeMixing parameter

[E.N. Gilbert, 1959]

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 35: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Artificial Data

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 36: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Outline

1 Preliminaries

2 Attempted Solutions and Results

3 Recommended Solution and Results

4 Artificial Data

5 Future Work

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 37: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Future Work

Matrix Completion and Link Prediction

Robustness of Algorithms

Artificial Testing Data

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 38: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

Acknowledgements

We would like to thank Professor Yves van Gennip for advising usthroughout the project. We are also grateful to Professors BlakeHunter and Allon Percus for their helpful advice.

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering

Page 39: Social Network Clustering - math.ucla.edubertozzi/WORKFORCE/Summer2011/Gang-Networks-2011/... · Social Network Clustering Kyle Luh, Peter Elliott, and Raymond Ahn University of California

PreliminariesAttempted Solutions and Results

Recommended Solution and ResultsArtificial Data

Future Work

References

Ng et al. (2001)

On Spectral Clustering: Analysis and an Algorithm

Advances in Neural Information Processing Systems 14(2), 849 – 857.

Wasserman (1994)

Social Network Analysis

Cambridge University Press

M.E.J. Newman (2006)

Modularity and community structure in networks

Proceedings of the National Academy of Sciences in the United States ofAmerica 103 (23) 8577-8582

Gilbert (1959)

Random Graphs

The Annals of Mathematical Statistics

Lancichinetti et al. (2008)

Benchmark Graphs for Testing Community Detection Algorithms

Physical Review E

Masuda et al. (2005)

Geographical Threshold Graphs with Small World and Scale FreeProperties

Physical Review E

Kyle Luh, Peter Elliott, and Raymond Ahn Social Clustering