quantum annealing for dirichlet process mixture models with applications to network clustering

24
Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering Issei Sato, Shu Tanaka, Kenichi Kurihara, Seiji Miyashita, and Hiroshi Nakagawa Neurocomputing 121, 523 (2013)

Upload: shu-tanaka

Post on 29-Nov-2014

3.465 views

Category:

Technology


0 download

DESCRIPTION

Our paper entitled “Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering" was published in Neurocomputing. This work was done in collaboration with Dr. Issei Sato (Univ. of Tokyo), Dr. Kenichi Kurihara (Google), Professor Seiji Miyashita (Univ. of Tokyo), and Prof. Hiroshi Nakagawa (Univ. of Tokyo). http://www.sciencedirect.com/science/article/pii/S0925231213005535 The preprint version is available: http://arxiv.org/abs/1305.4325 佐藤一誠さん(東京大学)、栗原賢一さん(Google)、宮下精二教授(東京大学)、中川裕志教授(東京大学)との共同研究論文 “Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering" が Neurocomputing に掲載されました。 http://www.sciencedirect.com/science/article/pii/S0925231213005535 プレプリントバージョンは http://arxiv.org/abs/1305.4325 からご覧いただけます。

TRANSCRIPT

Page 1: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Issei Sato, Shu Tanaka, Kenichi Kurihara, Seiji Miyashita, and Hiroshi Nakagawa

Neurocomputing 121, 523 (2013)

Page 2: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Main ResultsWe considered the efficiency of quantum annealing method for Dirichlet process mixture models. In this study, Monte Carlo simulation was performed.

- We constructed a method to apply quantum annealing to network clustering.

- Quantum annealing succeeded to obtain a better solution than conventional methods.

- The number of classes can be changed. (cf. K. Kurihara et al. and I. Sato et al., UAI2009)

K. Kurihara et al., I. Sato et al., Proceedings of UAI2009.

21000

21100

21200

21300

2.5 3 3.5 4 4.5 5

Diff

. of l

og-li

kelih

ood

K0

Wikivote

Bette

r

Page 3: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

BackgroundOptimization problem To "nd the state (best solution) where the real-valued cost function is minimized.

If the size of problem is small, we can easily obtain the best solution by brute-force calculation.

However...if the size of problem is large, we cannot obtain the best solution by brute-force calculation in practice.

We should develop methods to obtain the best solution (at least, better solution) efficiently.

Page 4: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

BackgroundCost function of most optimization problems can be represented by Hamiltonian of classical discrete spin systems.

We can use the knowledge of statistical physics.

To "nd the state where the cost function is minimized.

To "nd the ground state of the Hamiltonian.

Simulated annealing (SA) By decreasing the temperature (thermal !uctuation) gradually, the ground state of the Hamiltonian is obtained.

S. Kirkpatrick, C. D. Gelatte, and M. P. Vecchi, Science, 220, 671 (1983).

SA can be adopted to both stochastic methods such as Monte Carlo method and deterministic method.

Page 5: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

BackgroundQuantum annealing (QA) By decreasing the quantum !uctuation gradually, the ground state of the Hamiltonian is obtained.

T. Kadowaki and H. Nishimori, Phys. Rev. E, 58, 5355 (1998).E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lundgren, and D. Preda, Science, 292, 472 (2001).

G. E. Santoro, R. Martonak, E. Tosatti, and R. Car, Science, 295, 2427 (2002).

Review articlesG. E. Santoro and E. Tosatti, J. Phys. A: Math. Gen., 39, R393 (2006).

A. Das and B. K. Chakrabarti, Rev. Mod. Phys., 80, 1061 (2008).S. Tanaka and R. Tamura, Kinki University Series on Quantum Computing Series "Lectures on Quantum

Computing, Thermodynamics and Statistical Physics" (2012).

QA is better than SA?

Page 6: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

What is CRP?

Page 7: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Chinese Restaurant Process (CRP)

1 2 3 54

1 2 3

Restaurant (entire set)

Customer (data point)

Table (data class)

Chinese Restaurant Process (CRP) assigns a probability for the seating arrangement of the customers.

Page 8: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Chinese Restaurant Process (CRP)Seating arrangement of the customers: Z = {zi}N

i=1

zi = kcustomer i sits at the k-th table: N: the number of customers

When customer i enters a restaurant with K occupied tables at which other customers are already seated, customer i sits at a table with the following probability:

p(zi|Z\zi;�) �

���

��

Nk�+N�1

��+N�1

(k-th occupied table)

(new unoccupied table)

Nk: the number of customers sitting at the k-th table� : hyper parameter of the CRP

The log-likelihood of is given byZ p(Z) =�K(Z)

�N�=1(N � � + �)

K(Z)�

k=1

(Nk � 1)!

Page 9: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

What is QACRP?

Page 10: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Quantum annealing for CRP (QACRP)QACRP uses multiple restaurants (m restaurants).

zj,i = kcustomer i sits at the k-th table in the j-th restaurant:Seating arrangement of the customers in the j-th restaurant: Zj = {zj,i}

In the j-th restaurant, when customer i enters a restaurant with K occupied tables at which other customers are already seated, customer i sits at a table with the following probability:

(k-th occupied table)

(new unoccupied table)

pQA(zj,i| {Zd}md=1 \ {zj,i} ;�, �) �

�����

����

�Nj,k

�+N�1

��/me(c�j,k(i)+c+

j,k(i))f(�,�)

��

�+N�1

��/m

� : inverse temperature (thermal #uctuation) : quantum #uctuation�

Page 11: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Quantum annealing for CRP (QACRP)

(k-th occupied table)

(new unoccupied table)

pQA(zj,i| {Zd}md=1 \ {zj,i} ;�, �) �

�����

����

�Nj,k

�+N�1

��/me(c�j,k(i)+c+

j,k(i))f(�,�)

��

�+N�1

��/m

c±j,k(i) : the number of customers who sit at the k-th table in the j-th restaurant and share tables with customer i in the -th restaurant.

(j ± 1)

1

j-1-th CRP

2 3 54

2

j+1-th CRP

4 1 53

1

j-th CRP

4 3 5

2

1 2 3 1 2 3 1 2 3

The above fact will be proven in the following.

Page 12: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Quantum annealing for CRP (QACRP)Bit matrix representation for CRPA bit matrix : adjacency matrix of customersB

1 4 3 52

1 1 0 1 01 2 3 4 5

1 1 0 1 00 0 1 0 11 1 0 1 00

12345 0 1 0 1

B� � = �Ni=1 �N

n=1 �̃i,n

Sitting arrangement can be represented by the Ising model with constraints.

Bi,n = Bn,i

Bi,i = 1 (i = 1, 2, · · · , N)

�i, �, Bi/|Bi| · B�/|B�| = 1or 0

Seating conditions �̃

Page 13: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Quantum annealing for CRP (QACRP)Bit matrix representation for CRP

1 4 3 52

1 1 0 1 01 2 3 4 5

1 1 0 1 00 0 1 0 11 1 0 1 00

12345 0 1 0 1

12

1 1 0 1 001

2

0

1 4 3 5

1 0 1 01 2 3 4 5

0 1 0 11 0 1 00

12345 1 0 1

customers who share a table with customer 2.

11 1 0 1 0010

00 1 1 0 1101

00 1 0 0 0000

a set of the states that customer 2 can take under the seating conditions.

Page 14: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Quantum annealing for CRP (QACRP)Density matrix representation for “classical” CRP

Hc = diag[E(�(1)), E(�(2)), · · ·E(�(2N2))]

E(�(�)) =

�� ln p(�(�)) �(�) � �̃+� �(�) � �\�̃

p(�) =�Te�Hc��� �Te�Hc�

=:�Te�Hc�

Zc

Sitting arrangement can be represented by the Ising model with constraints.

Page 15: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Quantum annealing for CRP (QACRP)Formulation for quantum CRP

H = Hc +Hq Hc : classical CRPHq : quantum #uctuation

pQA(�;�, �) =�Te��(Hc+Hq)��� �Te��(Hc+Hq)�

p(�̃i|�\�̃i) =�Te�Hc���̃i

�Te�Hc�

Classical CRP

p(zi|Z\zi;�) �

���

��

Nk�+N�1

��+N�1

(k-th occupied table)

(new unoccupied table)

Page 16: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Quantum annealing for CRP (QACRP)Formulation for quantum CRP

H = Hc +Hq Hc : classical CRPHq : quantum #uctuation

pQA(�;�, �) =�Te��(Hc+Hq)��� �Te��(Hc+Hq)�

Quantum CRPpQA(�̃i|�\�̃i;�, �) =

�Te��(Hc+Hq)���̃i

�Te��(Hc+Hq)�

Hq = ��N�

i=1

N�

n=1

�xi,n, E =

�1 00 1

�, �x =

�0 11 0

�Transverse !eld as a quantum "uctuation

Page 17: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Quantum annealing for CRP (QACRP)

pQA(�;�, �) =�Te��(Hc+Hq)��� �Te��(Hc+Hq)�

=�

�j(j�2)

pQA�ST(�,�2, · · · ,�m;�, �) + O�

�2

m

Approximation inference for QACRP

pQA�ST(�1,�2, · · · ,�m;�, �) =m�

j=1

e��E(�j)/mef(�,�)s(�j ,�j+1)

Z(�, �)

f(�, �) = 2 ln coth�

��m

s(�j ,�j+1) =N�

i=1

N�

n=1

�(�̃j,i,n, �̃j+1,i,n)

Z(�, �) =�sinh

���m

��2N �

e��E(�)

m

By the Suzuki-Trotter decomposition, pQA can be approximately expressed by the classical CRP.

Page 18: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

ExperimentsNetwork model & dataset

We consider multiple running CRPs in which sj!j" 1;…;m#indicates the seating arrangement of the j-th CRP and representsthe j-th bit matrix Bj. We correspond Bj;i;n " 1 to ~sj;i;n " !1;0#⊤ andBj;i;n " 0 to ~s j;i;n " !0;1#⊤, which means that we can represent Bj assj by using Eq. (5). We derive the following theorem:

Theorem 3.1. pQA!s; !;"# in Eq. (10) is approximated by the Suzuki–Trotter expansion as follows:

pQA!s; !;"# "1Zs⊤e−!!Hc$Hq#s

" ∑sj!j≥2#

pQA−ST!s; s2;…; sm; !;"# $O !2

m

!

; !15#

where we rewrite s as s1, and

pQA−ST!s1; s2;…; sm; !;"# " ∏m

j " 1

1Z!!;"#

e−!=mE!sj#ef !!;"#s!sj ;sj$1#; !16#

f !!;"# " 2 log coth!m

"

! "; !17#

s!sj; sj$1# " ∑N

i " 1∑N

n " 1#! ~sj;i;n; ~s j$1;i;n#; !18#

Z!!;"# " sinh!m

"

! "# $2N∑se−!=mE!s#: !19#

Note that sm$1 " s1. The proof is given in Appendix A. Note thatour derived f in Eq. (17) does not include the number of classes, K,whereas the f in existing work [12,20] is formulated by using afixed K.

Eq. (15) is interpreted as follows. pQA!s; !;"# is approximated bymarginalizing out other states fsjgj≥2 of pQA−ST!s1; s2;…; sm; !;"#.As shown in Eq. (16), pQA−ST!s1; s2;…; sm; !;"# looks like the jointprobability of the states of m dependent CRPs. In Eq. (16), e−!=mE!sj#

corresponds to the classical CRP with inverse temperature andef !!;"#s!sj ;sj$1# indicates the quantum effect part. If f !!;"# " 0, whichmeans CRPs are independent, pQA−ST!s1; s2;…; sm; !;"# is equal tothe products of probability of m classical CRPs. s!sj; sj$1#!40# is

regarded as a similarity function between the j-th and (j+1)-th bitmatrices. If they are the same matrices, then s!sj; sj$1# "N2. InEq. (2), log pSA!sj# corresponds to log e−!=mE!sj#=Z and the regularizerterm f % R!s1;…; sm# is log ∏m

j " 1ef !!;"#s!sj ;sj$1# " f !!;"#∑m

j " 1s!sj; sj$1#.Note that we aim at deriving the approximation inference for

pQA! ~sijs\ ~s i; !;"# in Eq. (13). Using Theorem 3.1, we can deriveEq. (4) as the approximation inference. The details of the deriva-tion are provided in Appendix B.

4. Experiments

We evaluated QA in a real application. We applied QA to a DPMmodel for clustering vertices in a network where a seatingarrangement of the CRP indicates a network partition.

4.1. Network model

We used the Newman model [17] for network modeling in thiswork. The Newman model is a probabilistic generative networkmodel. This model is flexible, which enables researchers to analyzeobserved graph data without specifying the network structure(disassortative or assortative) in advance.

In an assortative network, such as a social network, themembers (vertices) of each class are mostly connected to theother members of the same class. The communications betweenmembers in three social groups is illustrated in Fig. 5, where onesees that the members generally communicate more with othersin the same group than they do with those outside the group. In adisassortative network, the members (vertices) have most of theirconnections outside their class. An election network of supportersand candidates is illustrated in Fig. 5b, where a link indicatessupport for a candidate. The Newman model can model not onlythese two kinds of networks but also a mixture of them, such as acitation network (see Fig. 5c), but, the user must decide in advancethe number of classes. We therefore used the DPM extension ofthe Newman model as described in Appendix C.

Fig. 5. Examples of network structures. (a) Social network (assortive network), (b) election network (disassortative network) and (c) citation network (mixture of assortativeand disassortative network).

I. Sato et al. / Neurocomputing 121 (2013) 523–531 527

We consider multiple running CRPs in which sj!j" 1;…;m#indicates the seating arrangement of the j-th CRP and representsthe j-th bit matrix Bj. We correspond Bj;i;n " 1 to ~sj;i;n " !1;0#⊤ andBj;i;n " 0 to ~s j;i;n " !0;1#⊤, which means that we can represent Bj assj by using Eq. (5). We derive the following theorem:

Theorem 3.1. pQA!s; !;"# in Eq. (10) is approximated by the Suzuki–Trotter expansion as follows:

pQA!s; !;"# "1Zs⊤e−!!Hc$Hq#s

" ∑sj!j≥2#

pQA−ST!s; s2;…; sm; !;"# $O !2

m

!

; !15#

where we rewrite s as s1, and

pQA−ST!s1; s2;…; sm; !;"# " ∏m

j " 1

1Z!!;"#

e−!=mE!sj#ef !!;"#s!sj ;sj$1#; !16#

f !!;"# " 2 log coth!m

"

! "; !17#

s!sj; sj$1# " ∑N

i " 1∑N

n " 1#! ~sj;i;n; ~s j$1;i;n#; !18#

Z!!;"# " sinh!m

"

! "# $2N∑se−!=mE!s#: !19#

Note that sm$1 " s1. The proof is given in Appendix A. Note thatour derived f in Eq. (17) does not include the number of classes, K,whereas the f in existing work [12,20] is formulated by using afixed K.

Eq. (15) is interpreted as follows. pQA!s; !;"# is approximated bymarginalizing out other states fsjgj≥2 of pQA−ST!s1; s2;…; sm; !;"#.As shown in Eq. (16), pQA−ST!s1; s2;…; sm; !;"# looks like the jointprobability of the states of m dependent CRPs. In Eq. (16), e−!=mE!sj#

corresponds to the classical CRP with inverse temperature andef !!;"#s!sj ;sj$1# indicates the quantum effect part. If f !!;"# " 0, whichmeans CRPs are independent, pQA−ST!s1; s2;…; sm; !;"# is equal tothe products of probability of m classical CRPs. s!sj; sj$1#!40# is

regarded as a similarity function between the j-th and (j+1)-th bitmatrices. If they are the same matrices, then s!sj; sj$1# "N2. InEq. (2), log pSA!sj# corresponds to log e−!=mE!sj#=Z and the regularizerterm f % R!s1;…; sm# is log ∏m

j " 1ef !!;"#s!sj ;sj$1# " f !!;"#∑m

j " 1s!sj; sj$1#.Note that we aim at deriving the approximation inference for

pQA! ~sijs\ ~s i; !;"# in Eq. (13). Using Theorem 3.1, we can deriveEq. (4) as the approximation inference. The details of the deriva-tion are provided in Appendix B.

4. Experiments

We evaluated QA in a real application. We applied QA to a DPMmodel for clustering vertices in a network where a seatingarrangement of the CRP indicates a network partition.

4.1. Network model

We used the Newman model [17] for network modeling in thiswork. The Newman model is a probabilistic generative networkmodel. This model is flexible, which enables researchers to analyzeobserved graph data without specifying the network structure(disassortative or assortative) in advance.

In an assortative network, such as a social network, themembers (vertices) of each class are mostly connected to theother members of the same class. The communications betweenmembers in three social groups is illustrated in Fig. 5, where onesees that the members generally communicate more with othersin the same group than they do with those outside the group. In adisassortative network, the members (vertices) have most of theirconnections outside their class. An election network of supportersand candidates is illustrated in Fig. 5b, where a link indicatessupport for a candidate. The Newman model can model not onlythese two kinds of networks but also a mixture of them, such as acitation network (see Fig. 5c), but, the user must decide in advancethe number of classes. We therefore used the DPM extension ofthe Newman model as described in Appendix C.

Fig. 5. Examples of network structures. (a) Social network (assortive network), (b) election network (disassortative network) and (c) citation network (mixture of assortativeand disassortative network).

I. Sato et al. / Neurocomputing 121 (2013) 523–531 527

We consider multiple running CRPs in which sj!j" 1;…;m#indicates the seating arrangement of the j-th CRP and representsthe j-th bit matrix Bj. We correspond Bj;i;n " 1 to ~sj;i;n " !1;0#⊤ andBj;i;n " 0 to ~s j;i;n " !0;1#⊤, which means that we can represent Bj assj by using Eq. (5). We derive the following theorem:

Theorem 3.1. pQA!s; !;"# in Eq. (10) is approximated by the Suzuki–Trotter expansion as follows:

pQA!s; !;"# "1Zs⊤e−!!Hc$Hq#s

" ∑sj!j≥2#

pQA−ST!s; s2;…; sm; !;"# $O !2

m

!

; !15#

where we rewrite s as s1, and

pQA−ST!s1; s2;…; sm; !;"# " ∏m

j " 1

1Z!!;"#

e−!=mE!sj#ef !!;"#s!sj ;sj$1#; !16#

f !!;"# " 2 log coth!m

"

! "; !17#

s!sj; sj$1# " ∑N

i " 1∑N

n " 1#! ~sj;i;n; ~s j$1;i;n#; !18#

Z!!;"# " sinh!m

"

! "# $2N∑se−!=mE!s#: !19#

Note that sm$1 " s1. The proof is given in Appendix A. Note thatour derived f in Eq. (17) does not include the number of classes, K,whereas the f in existing work [12,20] is formulated by using afixed K.

Eq. (15) is interpreted as follows. pQA!s; !;"# is approximated bymarginalizing out other states fsjgj≥2 of pQA−ST!s1; s2;…; sm; !;"#.As shown in Eq. (16), pQA−ST!s1; s2;…; sm; !;"# looks like the jointprobability of the states of m dependent CRPs. In Eq. (16), e−!=mE!sj#

corresponds to the classical CRP with inverse temperature andef !!;"#s!sj ;sj$1# indicates the quantum effect part. If f !!;"# " 0, whichmeans CRPs are independent, pQA−ST!s1; s2;…; sm; !;"# is equal tothe products of probability of m classical CRPs. s!sj; sj$1#!40# is

regarded as a similarity function between the j-th and (j+1)-th bitmatrices. If they are the same matrices, then s!sj; sj$1# "N2. InEq. (2), log pSA!sj# corresponds to log e−!=mE!sj#=Z and the regularizerterm f % R!s1;…; sm# is log ∏m

j " 1ef !!;"#s!sj ;sj$1# " f !!;"#∑m

j " 1s!sj; sj$1#.Note that we aim at deriving the approximation inference for

pQA! ~sijs\ ~s i; !;"# in Eq. (13). Using Theorem 3.1, we can deriveEq. (4) as the approximation inference. The details of the deriva-tion are provided in Appendix B.

4. Experiments

We evaluated QA in a real application. We applied QA to a DPMmodel for clustering vertices in a network where a seatingarrangement of the CRP indicates a network partition.

4.1. Network model

We used the Newman model [17] for network modeling in thiswork. The Newman model is a probabilistic generative networkmodel. This model is flexible, which enables researchers to analyzeobserved graph data without specifying the network structure(disassortative or assortative) in advance.

In an assortative network, such as a social network, themembers (vertices) of each class are mostly connected to theother members of the same class. The communications betweenmembers in three social groups is illustrated in Fig. 5, where onesees that the members generally communicate more with othersin the same group than they do with those outside the group. In adisassortative network, the members (vertices) have most of theirconnections outside their class. An election network of supportersand candidates is illustrated in Fig. 5b, where a link indicatessupport for a candidate. The Newman model can model not onlythese two kinds of networks but also a mixture of them, such as acitation network (see Fig. 5c), but, the user must decide in advancethe number of classes. We therefore used the DPM extension ofthe Newman model as described in Appendix C.

Fig. 5. Examples of network structures. (a) Social network (assortive network), (b) election network (disassortative network) and (c) citation network (mixture of assortativeand disassortative network).

I. Sato et al. / Neurocomputing 121 (2013) 523–531 527

Citeseercitation network dataset for 2110 papers.

Netsciencecoauthorship network of scientists working on a network that has 1589 scientists.

Wikivotea bipertite network constructed using administrator elections.7115 Wikipedia users.

Page 19: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

ExperimentsAnnealing schedule

m : Trotter number, the number of replicas; m = 16

� =

���

��

�0 ln(1 + t)�0

�t

�0t

We tested several schedules of inverse temperature.

�0 = 0.2m, 0.4m, 0.6m

t : t-th iteration.

� = 0.4m�

t is a better schedule in SA (MAP estimation).

��m

= �0T

tis a schedule of quantum #uctuation.

T : Total number of iterations

Page 20: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Results

21000

21100

21200

21300

2.5 3 3.5 4 4.5 5

Diff

. of l

og-li

kelih

ood

K0

Wikivote

Bette

r

LmaxBeam : the maximum log-likelihood of the beam search

Lmax16SAs

: the maximum log-likelihood of 16 CRPs in SA

Page 21: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Results

21000

21100

21200

21300

2.5 3 3.5 4 4.5 5

Diff

. of l

og-li

kelih

ood

K0

Wikivote

400

500

600

700

1 1.5 2 2.5 3

Diff

. of l

og-li

kelih

ood

K0

Netscience

1200

1400

1600

1.5 2 2.5 3 3.5

Diff

. of l

og-li

kelih

ood

K0

Citeseer

We consider multiple running CRPs in which sj!j" 1;…;m#indicates the seating arrangement of the j-th CRP and representsthe j-th bit matrix Bj. We correspond Bj;i;n " 1 to ~sj;i;n " !1;0#⊤ andBj;i;n " 0 to ~s j;i;n " !0;1#⊤, which means that we can represent Bj assj by using Eq. (5). We derive the following theorem:

Theorem 3.1. pQA!s; !;"# in Eq. (10) is approximated by the Suzuki–Trotter expansion as follows:

pQA!s; !;"# "1Zs⊤e−!!Hc$Hq#s

" ∑sj!j≥2#

pQA−ST!s; s2;…; sm; !;"# $O !2

m

!

; !15#

where we rewrite s as s1, and

pQA−ST!s1; s2;…; sm; !;"# " ∏m

j " 1

1Z!!;"#

e−!=mE!sj#ef !!;"#s!sj ;sj$1#; !16#

f !!;"# " 2 log coth!m

"

! "; !17#

s!sj; sj$1# " ∑N

i " 1∑N

n " 1#! ~sj;i;n; ~s j$1;i;n#; !18#

Z!!;"# " sinh!m

"

! "# $2N∑se−!=mE!s#: !19#

Note that sm$1 " s1. The proof is given in Appendix A. Note thatour derived f in Eq. (17) does not include the number of classes, K,whereas the f in existing work [12,20] is formulated by using afixed K.

Eq. (15) is interpreted as follows. pQA!s; !;"# is approximated bymarginalizing out other states fsjgj≥2 of pQA−ST!s1; s2;…; sm; !;"#.As shown in Eq. (16), pQA−ST!s1; s2;…; sm; !;"# looks like the jointprobability of the states of m dependent CRPs. In Eq. (16), e−!=mE!sj#

corresponds to the classical CRP with inverse temperature andef !!;"#s!sj ;sj$1# indicates the quantum effect part. If f !!;"# " 0, whichmeans CRPs are independent, pQA−ST!s1; s2;…; sm; !;"# is equal tothe products of probability of m classical CRPs. s!sj; sj$1#!40# is

regarded as a similarity function between the j-th and (j+1)-th bitmatrices. If they are the same matrices, then s!sj; sj$1# "N2. InEq. (2), log pSA!sj# corresponds to log e−!=mE!sj#=Z and the regularizerterm f % R!s1;…; sm# is log ∏m

j " 1ef !!;"#s!sj ;sj$1# " f !!;"#∑m

j " 1s!sj; sj$1#.Note that we aim at deriving the approximation inference for

pQA! ~sijs\ ~s i; !;"# in Eq. (13). Using Theorem 3.1, we can deriveEq. (4) as the approximation inference. The details of the deriva-tion are provided in Appendix B.

4. Experiments

We evaluated QA in a real application. We applied QA to a DPMmodel for clustering vertices in a network where a seatingarrangement of the CRP indicates a network partition.

4.1. Network model

We used the Newman model [17] for network modeling in thiswork. The Newman model is a probabilistic generative networkmodel. This model is flexible, which enables researchers to analyzeobserved graph data without specifying the network structure(disassortative or assortative) in advance.

In an assortative network, such as a social network, themembers (vertices) of each class are mostly connected to theother members of the same class. The communications betweenmembers in three social groups is illustrated in Fig. 5, where onesees that the members generally communicate more with othersin the same group than they do with those outside the group. In adisassortative network, the members (vertices) have most of theirconnections outside their class. An election network of supportersand candidates is illustrated in Fig. 5b, where a link indicatessupport for a candidate. The Newman model can model not onlythese two kinds of networks but also a mixture of them, such as acitation network (see Fig. 5c), but, the user must decide in advancethe number of classes. We therefore used the DPM extension ofthe Newman model as described in Appendix C.

Fig. 5. Examples of network structures. (a) Social network (assortive network), (b) election network (disassortative network) and (c) citation network (mixture of assortativeand disassortative network).

I. Sato et al. / Neurocomputing 121 (2013) 523–531 527

We consider multiple running CRPs in which sj!j" 1;…;m#indicates the seating arrangement of the j-th CRP and representsthe j-th bit matrix Bj. We correspond Bj;i;n " 1 to ~sj;i;n " !1;0#⊤ andBj;i;n " 0 to ~s j;i;n " !0;1#⊤, which means that we can represent Bj assj by using Eq. (5). We derive the following theorem:

Theorem 3.1. pQA!s; !;"# in Eq. (10) is approximated by the Suzuki–Trotter expansion as follows:

pQA!s; !;"# "1Zs⊤e−!!Hc$Hq#s

" ∑sj!j≥2#

pQA−ST!s; s2;…; sm; !;"# $O !2

m

!

; !15#

where we rewrite s as s1, and

pQA−ST!s1; s2;…; sm; !;"# " ∏m

j " 1

1Z!!;"#

e−!=mE!sj#ef !!;"#s!sj ;sj$1#; !16#

f !!;"# " 2 log coth!m

"

! "; !17#

s!sj; sj$1# " ∑N

i " 1∑N

n " 1#! ~sj;i;n; ~s j$1;i;n#; !18#

Z!!;"# " sinh!m

"

! "# $2N∑se−!=mE!s#: !19#

Note that sm$1 " s1. The proof is given in Appendix A. Note thatour derived f in Eq. (17) does not include the number of classes, K,whereas the f in existing work [12,20] is formulated by using afixed K.

Eq. (15) is interpreted as follows. pQA!s; !;"# is approximated bymarginalizing out other states fsjgj≥2 of pQA−ST!s1; s2;…; sm; !;"#.As shown in Eq. (16), pQA−ST!s1; s2;…; sm; !;"# looks like the jointprobability of the states of m dependent CRPs. In Eq. (16), e−!=mE!sj#

corresponds to the classical CRP with inverse temperature andef !!;"#s!sj ;sj$1# indicates the quantum effect part. If f !!;"# " 0, whichmeans CRPs are independent, pQA−ST!s1; s2;…; sm; !;"# is equal tothe products of probability of m classical CRPs. s!sj; sj$1#!40# is

regarded as a similarity function between the j-th and (j+1)-th bitmatrices. If they are the same matrices, then s!sj; sj$1# "N2. InEq. (2), log pSA!sj# corresponds to log e−!=mE!sj#=Z and the regularizerterm f % R!s1;…; sm# is log ∏m

j " 1ef !!;"#s!sj ;sj$1# " f !!;"#∑m

j " 1s!sj; sj$1#.Note that we aim at deriving the approximation inference for

pQA! ~sijs\ ~s i; !;"# in Eq. (13). Using Theorem 3.1, we can deriveEq. (4) as the approximation inference. The details of the deriva-tion are provided in Appendix B.

4. Experiments

We evaluated QA in a real application. We applied QA to a DPMmodel for clustering vertices in a network where a seatingarrangement of the CRP indicates a network partition.

4.1. Network model

We used the Newman model [17] for network modeling in thiswork. The Newman model is a probabilistic generative networkmodel. This model is flexible, which enables researchers to analyzeobserved graph data without specifying the network structure(disassortative or assortative) in advance.

In an assortative network, such as a social network, themembers (vertices) of each class are mostly connected to theother members of the same class. The communications betweenmembers in three social groups is illustrated in Fig. 5, where onesees that the members generally communicate more with othersin the same group than they do with those outside the group. In adisassortative network, the members (vertices) have most of theirconnections outside their class. An election network of supportersand candidates is illustrated in Fig. 5b, where a link indicatessupport for a candidate. The Newman model can model not onlythese two kinds of networks but also a mixture of them, such as acitation network (see Fig. 5c), but, the user must decide in advancethe number of classes. We therefore used the DPM extension ofthe Newman model as described in Appendix C.

Fig. 5. Examples of network structures. (a) Social network (assortive network), (b) election network (disassortative network) and (c) citation network (mixture of assortativeand disassortative network).

I. Sato et al. / Neurocomputing 121 (2013) 523–531 527

We consider multiple running CRPs in which sj!j" 1;…;m#indicates the seating arrangement of the j-th CRP and representsthe j-th bit matrix Bj. We correspond Bj;i;n " 1 to ~sj;i;n " !1;0#⊤ andBj;i;n " 0 to ~s j;i;n " !0;1#⊤, which means that we can represent Bj assj by using Eq. (5). We derive the following theorem:

Theorem 3.1. pQA!s; !;"# in Eq. (10) is approximated by the Suzuki–Trotter expansion as follows:

pQA!s; !;"# "1Zs⊤e−!!Hc$Hq#s

" ∑sj!j≥2#

pQA−ST!s; s2;…; sm; !;"# $O !2

m

!

; !15#

where we rewrite s as s1, and

pQA−ST!s1; s2;…; sm; !;"# " ∏m

j " 1

1Z!!;"#

e−!=mE!sj#ef !!;"#s!sj ;sj$1#; !16#

f !!;"# " 2 log coth!m

"

! "; !17#

s!sj; sj$1# " ∑N

i " 1∑N

n " 1#! ~sj;i;n; ~s j$1;i;n#; !18#

Z!!;"# " sinh!m

"

! "# $2N∑se−!=mE!s#: !19#

Note that sm$1 " s1. The proof is given in Appendix A. Note thatour derived f in Eq. (17) does not include the number of classes, K,whereas the f in existing work [12,20] is formulated by using afixed K.

Eq. (15) is interpreted as follows. pQA!s; !;"# is approximated bymarginalizing out other states fsjgj≥2 of pQA−ST!s1; s2;…; sm; !;"#.As shown in Eq. (16), pQA−ST!s1; s2;…; sm; !;"# looks like the jointprobability of the states of m dependent CRPs. In Eq. (16), e−!=mE!sj#

corresponds to the classical CRP with inverse temperature andef !!;"#s!sj ;sj$1# indicates the quantum effect part. If f !!;"# " 0, whichmeans CRPs are independent, pQA−ST!s1; s2;…; sm; !;"# is equal tothe products of probability of m classical CRPs. s!sj; sj$1#!40# is

regarded as a similarity function between the j-th and (j+1)-th bitmatrices. If they are the same matrices, then s!sj; sj$1# "N2. InEq. (2), log pSA!sj# corresponds to log e−!=mE!sj#=Z and the regularizerterm f % R!s1;…; sm# is log ∏m

j " 1ef !!;"#s!sj ;sj$1# " f !!;"#∑m

j " 1s!sj; sj$1#.Note that we aim at deriving the approximation inference for

pQA! ~sijs\ ~s i; !;"# in Eq. (13). Using Theorem 3.1, we can deriveEq. (4) as the approximation inference. The details of the deriva-tion are provided in Appendix B.

4. Experiments

We evaluated QA in a real application. We applied QA to a DPMmodel for clustering vertices in a network where a seatingarrangement of the CRP indicates a network partition.

4.1. Network model

We used the Newman model [17] for network modeling in thiswork. The Newman model is a probabilistic generative networkmodel. This model is flexible, which enables researchers to analyzeobserved graph data without specifying the network structure(disassortative or assortative) in advance.

In an assortative network, such as a social network, themembers (vertices) of each class are mostly connected to theother members of the same class. The communications betweenmembers in three social groups is illustrated in Fig. 5, where onesees that the members generally communicate more with othersin the same group than they do with those outside the group. In adisassortative network, the members (vertices) have most of theirconnections outside their class. An election network of supportersand candidates is illustrated in Fig. 5b, where a link indicatessupport for a candidate. The Newman model can model not onlythese two kinds of networks but also a mixture of them, such as acitation network (see Fig. 5c), but, the user must decide in advancethe number of classes. We therefore used the DPM extension ofthe Newman model as described in Appendix C.

Fig. 5. Examples of network structures. (a) Social network (assortive network), (b) election network (disassortative network) and (c) citation network (mixture of assortativeand disassortative network).

I. Sato et al. / Neurocomputing 121 (2013) 523–531 527

Better solution can be obtained by QA.

Page 22: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Results

21000

21100

21200

21300

2.5 3 3.5 4 4.5 5

Diff

. of l

og-li

kelih

ood

K0

Wikivote

400

500

600

700

1 1.5 2 2.5 3

Diff

. of l

og-li

kelih

ood

K0

Netscience

1200

1400

1600

1.5 2 2.5 3 3.5

Diff

. of l

og-li

kelih

ood

K0

Citeseer SA(T=30,m=1) QA(T=30,m=16)

calc. time 13 sec. 15 sec.

16 SAs 1600 SAs beam search QA(m=16)

# classes 35 30 57 37

SA(T=30,m=1) QA(T=30,m=16)

calc. time 22 sec. 25 sec.

16 SAs 1600 SAs beam search QA(m=16)

# classes 22 65 61 26

SA(T=30,m=1) QA(T=30,m=16)

calc. time 76 sec. 79 sec.

16 SAs 1600 SAs beam search QA(m=16)

# classes 8 8 27 8

Page 23: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Main ResultsWe considered the efficiency of quantum annealing method for Dirichlet process mixture models. In this study, Monte Carlo simulation was performed.

- We constructed a method to apply quantum annealing to network clustering.

- Quantum annealing succeeded to obtain a better solution than conventional methods.

- The number of classes can be changed. (cf. K. Kurihara et al. and I. Sato et al., UAI2009)

K. Kurihara et al., I. Sato et al., Proceedings of UAI2009.

21000

21100

21200

21300

2.5 3 3.5 4 4.5 5

Diff

. of l

og-li

kelih

ood

K0

Wikivote

Bette

r

Page 24: Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering

Thank you !

Issei Sato, Shu Tanaka, Kenichi Kurihara, Seiji Miyashita, and Hiroshi Nakagawa

Neurocomputing 121, 523 (2013)