link prediction 방법의 개념 및 활용
TRANSCRIPT
Link Prediction방법의개념및활용
Kyunghoon Kim
UNIST Mathematical Sciences
2015. 9. 3.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 1 / 86
About me
Speaker
Kyunghoon Kim (Graduate Student)
UNIST (Ulsan National Institute of Science and Technology)
Mathematical Sciences, School of Natural Sciences
Lab
Adviser : Bongsoo Jang
Homepage : http://amath.unist.ac.kr
“Be the light that shines the world with science and technology.”
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 2 / 86
목차
1 Social Network
2 Link Prediction
Research Trend
Definition
Framework
Example
Theory
3 Link Prediction with Python
4 데모
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 3 / 86
Social Network
A social network is a social structure made up of
a set of social actors (such as individuals or organizations)
and a set of the dyadic ties (or interactions, relations) between these actors.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 4 / 86
Social Network : Internet
Ref: http://supraliminalsolutions.com/
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 5 / 86
Social Network : Information exchange
Ref: https://niftynotcool.files.wordpress.com/2013/12/internet-wallpaper-hd.jpg
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 6 / 86
Social Network : Betweenness Centrality
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 8 / 86
Social Network : IoT (Internet of Things)
Ref: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=XB&infotype=PM&appname=GBSE_GB_TI_
USEN&htmlfid=GBE03620USEN&attachment=GBE03620USEN.PDF
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 9 / 86
Social Network : Problem
Non-trivial taskincompletion
dynamic
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 10 / 86
Research Trend of Link Prediction
Keyword “link prediction social network”
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.” Science China Information Sciences 58.1 (2015):
1-38.Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 11 / 86
Application of Link Prediction
1 추천시스템 (links)
친구추천 (12’)
공동저자추천 (07’)
온라인쇼핑몰의상품추천 (11’)
특허추천 (13’)
타분야협력자추천 (12’)
연락처추천 (11’)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 12 / 86
Application of Link Prediction
2 복잡계연구 (links)
네트워크진화연구 (02’)
웹사이트링크예측 (02’)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 13 / 86
Application of Link Prediction
3 다양한분야에적용 (links)
헬스케어 (12’)
단백질네트워크 (12’)
비정상적커뮤니케이션확인 (09’)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 14 / 86
Research Trend of Link Prediction
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.” Science China Information Sciences 58.1 (2015)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 15 / 86
Research Trend of Link Prediction
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.” Science China Information Sciences 58.1 (2015)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 16 / 86
Research Trend of Link Prediction
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.” Science China Information Sciences 58.1 (2015)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 17 / 86
Research Trend of Link Prediction
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.” Science China Information Sciences 58.1 (2015)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 18 / 86
Research Trend of Link Prediction
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.” Science China Information Sciences 58.1 (2015)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 19 / 86
Definition of Link Prediction
사회망(social networks)에서링크예측이란
지금의네트워크에서빠진링크를예측하는것
미래의네트워크에서새롭게나타나거나사라질링크를예측하는것
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 20 / 86
Definition of Link Prediction
사회망
G (V ,E ) at t
에대해,
링크가생기거나사라지는것을 (t ′ > t)
빠진링크나관찰되지않은링크가있는것을 (at t)
찾아내는것.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 23 / 86
Framework of Link Prediction
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.”
Science China Information Sciences 58.1 (2015): 1-38.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 24 / 86
Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 25 / 86
Link Prediction Example : Terrorist Networks
Problems of criminal network analysis
1 Incompleteness - the inevitability of missing nodes and links that the
investigators will not uncover.
2 Fuzzy boundaries - the difficulty in deciding who to include and who
not to include.
3 Dynamic - these networks are not static, they are always changing.
http://pear.accc.uic.edu/ojs/index.php/fm/article/view/941/863
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 26 / 86
Link Prediction Example : Terrorist Networks
Several summaries of data about hijackers in major newspaper
Sydney Morning Herald, 2001
Washington Post, 2001
From 2 to 6 weeks after the event, it appeared that a new relationship
or node was added to the network on a daily basis.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 27 / 86
Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 28 / 86
Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 29 / 86
Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 30 / 86
Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 31 / 86
Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 32 / 86
Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 33 / 86
링크예측의이론
https://www.cs.umd.edu/class/spring2008/cmsc828g/
Slides/link-prediction.pdf
Liben‐Nowell, David, and Jon Kleinberg. “The link‐prediction problem
for social networks.” Journal of the American society for information
science and technology 58.7 (2007): 1019-1031.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 34 / 86
링크예측의세분화
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.”
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 35 / 86
Link Prediction with Python
Contents
Scikit-learn
Large-scale Matrix
Books
NumPy & Pandas
Morpheme Analyzer
NetworkX
IPython & D3.js
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 37 / 86
K-means Algorithm
1 from sklearn import cluster
2
3 k = 2
4 kmeans = cluster.KMeans(n_clusters=k)
5 kmeans.fit(data)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 39 / 86
K-means Algorithm
http://cjauvin.blogspot.kr/2014/03/k-means-vs-louvain.html
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 40 / 86
얼마나큰행렬을다룰수있나요?
NetworkX는기본네트워크구조로 “dictionary of dictionaries of
dictionaries”를사용
dict-of-dicts-of-dicts 자료 구조의 장점:Find edges and remove edges with two dictionary look-ups.
Prefer to “lists” because of fast lookup with sparse storage.
Prefer to “sets” since data can be attached to edge.
G[u][v] returns the edge attribute dictionary.
n in G tests if node n is in graph G.
for n in G: iterates through the graph.
for nbr in G[n]: iterates through neighbors.
https://networkx.github.io/documentation/latest/reference/introduction.html
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 41 / 86
얼마나큰행렬을다룰수있나요?
Million-scale Graphs Analytic Frameworks
SNAP : http://snap.stanford.edu/snappy/index.html
Billion-scale Graphs Analytic Frameworks
Apache Hama : https://hama.apache.org/ (소개글)
Pegasus : http://www.cs.cmu.edu/~pegasus/
s2graph : https://github.com/daumkakao/s2graph (슬라이드)
Graph Database
Neo4j : http://neo4j.com/
OrientDB : http://orientdb.com/
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 43 / 86
네트워크공부를위한기본서적
1 Networks: An Introduction by Mark Newman
2 링크 : 21세기를지배하는네트워크과학 LINKED The New Science of Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 44 / 86
링크를예측하기위한준비운동
1 NumPy :계산속도에최적화된모듈
2 Pandas :데이터구조
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 45 / 86
NumPy: Numerical Python
다차원배열
1 근접메모리를사용하고, C언어로구성됨
2 하나의데이터타입
3 연산이한번에배열내의모든요소에적용됨
http://www.numpy.org/
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 46 / 86
NumPy: Numerical Python
1 tic = timeit.default_timer()
2 for index, value in enumerate(b):
3 b[index] = value*1.1
4 toc = timeit.default_timer()
5 print toc-tic
6 1.82178592682
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 47 / 86
NumPy: Numerical Python
1 import numpy as np
2 import timeit
3 a = np.arange(1e7)
4 b = list(a)
5 tic = timeit.default_timer()
6 a = a*1.1
7 toc = timeit.default_timer()
8 print toc-tic
9 0.029629945755
사용방법에따라, ndarray의연산속도는 list()보다훨씬빠름.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 48 / 86
Pandas: Python Data Analysis Library
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 49 / 86
Pandas / get data yahoo
1 %pylab inline
2 import pandas as pd
3 import pandas.io.data
4 import datetime
5 start=datetime.datetime(2015,1,1); end=datetime.datetime(2015,8,26)
6 text = """A, AAPL, AMCC, AMD, AMGN, AMKR, AMNT.OB, AMZN, APC, ASOG.PK, AULO.OB, BAC, BBD-A.TO, BBD-B.TO, BEEI.OB, BKSD.OB, BP.BA, BPMI.PK, C, CAJT.PK, CAT, CGFI.OB, CHINA, CHKP, CIEN, CL, CLEC, CLNE, CNLG, COKE, CPAH.OB, CPHD, CPRT, CRDN, CRGN, CSCO, CSRVE.OB, CTS, CTXS, CVM, CVX, DE, DELL, DLTR, DO, DOG, DSCM, DVNNF.OB, DYN, EGN, ELNK, ELX, EP, ERJ, ETFC, EVEH.OB, FARO, FDO, FILE, FLIR, FNLY.OB, FNPR.OB, FORC.OB, FPP, GAGO.PK, GBVS.OB, GCAP.PK, GDKI.PK, GDTI.OB, GE, GEPT.PK, GERN, GFCI.PK, GILD, GLW, GOOG, GRDB.OB, GRMN, GS, GTXO.OB, GWGI.PK, HAL, HAST, HCKT, HD, HK, HPQ, HRAL.PK, IBM, IMDS.OB, IMGM.OB, INFY, INTC, IOC, IRF, JAVA, JCP, JDSU, JNJ, JNPR, JYHW.OB, K, KDSM.OB, KKD, KLDO.OB, KO, LEG, LOW, LRCX, LU, medx, MINI, MKC, MLNK, MNLU.OB, MO, MON, MOT, MRK, MSFT, MVBY.PK, MYL, NGEN, NGLS, NOIZ, NOK, NOVL, NOVOE.OB, NTGR, NVDA, NVS, NXRA.PK, OMI, ORCL, OSTK, PDLI, PEGA, PEP, PFE, PG, PHLI.OB, PNM, PPBV.OB, PWER, PYTO.OB, PZE, PZZA, Q, QCOM, QPCIE.OB, RIG, RIO, S, SBR, SCMR, SCON, SGAL.OB, SGP, SHRPQ.PK, SIFY, SIRI, SLB, smd, SUN, SVA, T, TAMO.OB, TASR, TEF, TNEN.OB, TRXAQ.PK, TWX, TXN, UTVG.OB, VOD, VRTL.PK, VSCI, VTSS.PK, VZ, WAG, WCYO.OB, WFC, WLL, WMT"""
7 text = text.replace(’ ’, ’’).split(’,’)
8 corps = []
9 for t in text:
10 if ’.’ not in t:
11 corps.append(t)
Code : https://goo.gl/8ddrnSKyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 50 / 86
Pandas / get data yahoo
1 df = pd.io.data.get_data_yahoo(corps, start=start, end=end)
2 df[’Adj Close’].head()
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 51 / 86
Pandas / Return Value
1 returns = df[’Adj Close’].pct_change()
2 corr = returns.corr()
3 corr
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 52 / 86
Pandas / Correlation
1 bm = corr>0.5
2 bm.astype(int)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 53 / 86
Pandas / Convert to array
1 mat = bm.astype(int).values
2 mat
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 54 / 86
NetworkX / from numpy matrix
1 import networkx as nx
2 graph = nx.from_numpy_matrix(mat)
3 graph = nx.relabel_nodes(graph, dict(enumerate(bm.columns)))
4 nx.draw(graph, with_labels=True)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 55 / 86
NetworkX / figsize
1 plt.figure(figsize=(20, 20))
2 nx.draw_spring(graph, with_labels=True)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 56 / 86
NetworkX / figsize
1 first = sorted(nx.connected_components(graph),
2 key=len, reverse=True)[0]
3 G = graph.subgraph(first)
4 nx.draw(G, with_labels=True)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 57 / 86
NetworkX /결국 Gephi에서작업?
1 nx.write_gexf(G, ’graph.gexf’)
Gephi에서 gexf열기
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 58 / 86
mecab-ko
은전한닢 프로젝트( http://eunjeon.blogspot.kr/ )
검색에서쓸만한오픈소스한국어형태소분석기를만들자! by이용운,유영호
1 $ sudo docker pull koorukuroo/mecab-ko
2 $ sudo docker run -i -t koorukuroo/mecab-ko:0.1
3 안녕하세요
4 안녕 NNG,*,T,안녕,*,*,*,*
5 하 XSV,*,F,하,*,*,*,*
6 세요 EP+EF,*,F,세요,Inflect,EP,EF,시/EP/*+어요/EF/*
7 EOS
https://github.com/koorukuroo/mecab-ko
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 60 / 86
mecab-ko-web
1 $ sudo docker pull koorukuroo/mecab-ko-web
2 $ sudo docker run -i -t koorukuroo/mecab-ko-web:0.1
3 172.17.0.43 (Docker Container IP)
4 127.0.0.1
5 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
1 >>> import urllib2
2 >>> response = urllib2.urlopen(’http://172.17.0.43:5000/?text=안녕’)
3 >>> text = response.read()
4 >>> print text
5 안녕 NNG,*,T,안녕,*,*,*,*
6 EOS
https://github.com/koorukuroo/mecab-ko-web
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 62 / 86
mecab api
1 http://information.center/api/korean?sc=APIKEY&s=안녕하세요
http://information.center/korean
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 63 / 86
mecab api
1 import Umorpheme.morpheme as um
2 from collections import OrderedDict
3
4 s = ’유니스트는 울산에 있습니다’
5 server = ’http://information.center/api/korean’
6 apikey = ’’ # Register at http://information.center/korean
7 data = um.analyzer(s, server, apikey, ’유니스트,UNIST’, 1)
8
9 temp =
10 for key, value in data.items():
11 temp[int(key)] = value
12 data = OrderedDict(sorted(temp.items()))
13
14 for i, j in data.iteritems():
15 print i, j[’data’], j[’feature’]
0 유니스트 CUSTOM
1 는 JX
2 울산 NNP
3 에 JKB
4 있 VV
5 습니다 EC
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 64 / 86
링크예측의기본정의
Γ (x) :점 x의이웃들의집합
|Γ (x)| :점 x의이웃들의개수
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 66 / 86
공통이웃들
공통 이웃들(Common Neighbors):
CN(u, v) = |Γ (u) ∩ Γ (v)|
본 그래프는 실제가 아닌 가상으로 설정된 상황임을 알려드립니다
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 67 / 86
리소스할당지수
리소스 할당 지수(Resource Allocation Index):∑w∈Γ (u)∩Γ (v)
1|Γ (w)|
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 68 / 86
리소스할당지수
리소스 할당 지수(Resource Allocation Index):∑w∈Γ (u)∩Γ (v)
1|Γ (w)|
preds = nx.resource_allocation_index(G)
for u, v, p in preds:
print ’(%s, %s) -> %.8f’ % (u, v, p)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 69 / 86
리소스할당지수
(수지, 혜리) -> 0.33333333
(수지, 경훈) -> 0.83333333
(아이유, 민호) -> 1.00000000
(혜리, 민호) -> 0.00000000
(혜리, 경훈) -> 0.33333333
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 70 / 86
리소스할당지수
∑w∈Γ (u)∩Γ (v)
1|Γ (w)|
(수지, 혜리) -> 0.33333333
(수지, 경훈) -> 0.83333333
(아이유, 민호) -> 1.00000000
(혜리, 민호) -> 0.00000000
(혜리, 경훈) -> 0.33333333
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 71 / 86
한국어표시하기
1 pip install --upgrade
2 git+https://github.com/koorukuroo/networkx_for_unicode
1 import matplotlib.font_manager as fm
2 fp1 = fm.FontProperties(fname="./NotoSansKR-Regular.otf")
3 nx.set_fontproperties(fp1)
4 G = nx.Graph()
5 G.add_edge(u’한국어’,u’영어’)
6 nx.draw(G, with_labels=True)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 72 / 86
선호적연결
선호적 연결(Preferential attachment):
|Γ (u)||Γ (v)|
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 73 / 86
선호적연결
nx.draw_networkx_nodes(G, pos, node_size=500, node_color=’yellow’)
nx.draw_networkx_edges(G, pos, alpha=0.2)
nx.draw_networkx_labels(G, pos, font_size=20);
selected_lines = []
for u in G.nodes_iter():
preds = nx.preferential_attachment(G, [(u, v) for v in nx.non_neighbors(G, u)])
largest = heapq.nlargest(5, preds, key = lambda x: x[2])
for l in largest:
selected_lines.append(l)
subG = nx.Graph()
for line in selected_lines:
print line[0], line[1], line[2]
if line[2]>1:
subG.add_edge(line[0], line[1])
pos_subG = dict()
for s in subG.nodes():
pos_subG[s] = pos[s]
nx.draw_networkx_edges(subG, pos_subG, edge_color=’red’)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 74 / 86
선호적연결
1 degree = nx.degree_centrality(G)
2
3 nx.draw_networkx_nodes(G, pos, node_color=’yellow’, nodelist=degree.keys(),
4 node_size=np.array(degree.values())*10000)
5 nx.draw_networkx_edges(G, pos, alpha=0.2)
6 nx.draw_networkx_labels(G, pos, font_size=20);
7
8 selected_lines = []
9 for u in G.nodes_iter():
10 preds = nx.preferential_attachment(G, [(u, v) for v in nx.non_neighbors(G, u)])
11 largest = heapq.nlargest(5, preds, key = lambda x: x[2])
12 for l in largest:
13 selected_lines.append(l)
14 subG = nx.Graph()
15 for line in selected_lines:
16 print line[0], line[1], line[2]
17 if line[2]>1:
18 subG.add_edge(line[0], line[1])
19 pos_subG = dict()
20 for s in subG.nodes():
21 pos_subG[s] = pos[s]
22 nx.draw_networkx_edges(subG, pos_subG, edge_color=’red’)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 76 / 86
NetworkX의 Link Prediction함수들
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 78 / 86
LPmade
https://github.com/rlichtenwalter/LPmade
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 79 / 86
ipython과 d3.js
1 from IPython.display import display, HTML
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 81 / 86
d3.js (Data-Driven Documents)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 82 / 86
ipython에서 d3.js가동하기
코드 https://goo.gl/LpxsKc
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 84 / 86
ipython과 d3.js
1 edges = d3_graph(G)
2 make_html_graph(edges, 1000, 500) # make_html_graph(edges)
3
4 %%HTML
5 <iframe src="d3.html" width=100% height=500 frameborder=0></iframe>
Demo 화면 : http://i.imgur.com/FeQ9kii.gif
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 85 / 86