exploiting group-level behavior pattern for session-based

111

Exploiting Group-level Behavior Pattern forSession-based Recommendation

ZIYANG WANG, Cognitive Computing and Intelligent Information Processing (CCIIP) Laboratory, Schoolof Computer Science and Technology, Huazhong University of Science and Technology, ChinaWEI WEI†, Cognitive Computing and Intelligent Information Processing (CCIIP) Laboratory, School ofComputer Science and Technology, Huazhong University of Science and Technology, ChinaSHANSHAN FENG, Harbin Institute of Technology (Shenzhen), China, UAEXIAO-LI LI, Institute for Infocomm Research, SingaporeXIAN-LING MAO, School of Computer Science and Technology, Beijing Institute of Technology, ChinaMINGHUI QIU, Alibaba Group, China

Session-based recommendation (SBR) is a challenging task, which aims to predict users’ future interests basedon anonymous behavior sequences. Existing methods leverage powerful representation learning approaches toencode sessions into a low-dimensional space. However, despite such achievements, all the existing studies focuson the instance-level session learning, while neglecting the group-level users’ preference, which is significant tomodel the users’ behavior. To this end, we propose a novel Repeat-aware Neural Mechanism for Session-basedRecommendation (RNMSR). In RNMSR, we propose to learn the user preference from both instance-leveland group-level, respectively: (i) instance-level, which employs GNNs on a similarity-based item-pairwisesession graph to capture the users’ preference in instance-level. (ii) group-level, which converts sessions intogroup-level behavior patterns to model the group-level users’ preference. In RNMSR, we combine instance-leveluser preference and group-level user preference to model the repeat consumption of users, i.e., whether userstake repeated consumption and which items are preferred by users. Extensive experiments are conducted onthree real-world datasets, i.e., Diginetica, Yoochoose, and Nowplaying, demonstrating that the proposed methodconsistently achieves state-of-the-art performance in all the tests.

CCS Concepts: • Information systems→ Recommender systems.

Additional Key Words and Phrases: Session-based Recommendation; Graph Neural Network; RepresentationLearning

ACM Reference Format:Ziyang Wang, Wei Wei†, Shanshan Feng, Xiao-Li Li, Xian-Ling Mao, and Minghui Qiu. 2018. ExploitingGroup-level Behavior Pattern for Session-based Recommendation. J. ACM 37, 4, Article 111 (August 2018),23 pages. https://doi.org/10.1145/1122445.1122456

†Corresponding author: [email protected]

Authors’ addresses: Ziyang Wang, Cognitive Computing and Intelligent Information Processing (CCIIP) Laboratory, Schoolof Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China; Wei Wei†, CognitiveComputing and Intelligent Information Processing (CCIIP) Laboratory, School of Computer Science and Technology,Huazhong University of Science and Technology, Wuhan, China; Shanshan Feng, Harbin Institute of Technology (Shenzhen),China, Abu Dhabi, UAE; Xiao-Li Li, Institute for Infocomm Research, Singapore, Singapore; Xian-Ling Mao, School ofComputer Science and Technology, Beijing Institute of Technology, Beijing, China; Minghui Qiu, Alibaba Group, Hangzhou,China.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without feeprovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the fullcitation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstractingwith credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee. Request permissions from [email protected].© 2018 Association for Computing Machinery.0004-5411/2018/8-ART111 $15.00https://doi.org/10.1145/1122445.1122456

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.

arX

iv:2

012.

0542

2v2

[cs

.IR

] 1

Jun

202

1

https://doi.org/10.1145/1122445.1122456

https://doi.org/10.1145/1122445.1122456

111:2 Trovato and Tobin, et al.

1 INTRODUCTIONRecently, session-based recommendation (SBR) has attracted extensive attention, as its success inaddressing the inaccessible issue (e.g., unlogged-in users) of user identification in a wide variety ofrealistic recommendation scenarios, such as shopping platform (e.g., Amazon and Tmall), takeawayplatform (e.g., Seamless and Eat24) and music listening service (e.g., Apple Music). Different fromconventional recommendation methods [14, 17] that commonly rely on explicit user profiles andlong-term historical interactions, SBR task is to predict the next actions of anonymous users basedon their limited temporally-ordered behavior sequences within a given time frame, ranging fromseveral hours to several weeks, even months [33, 34].

Most of prior studies mainly focus on exploiting the sequence characteristics of anonymous userinteractions for SBR. For example, Markov-Chain (MC) methods [36, 38] infer possible sequencesof user choices over all items using a Markov-Chain model and then predict a user’s next action basedon the last one. Recently, there have been numerous attempts on modeling the chronology of thesession sequence for SBR. They usually make use of recurrent neural networks (e.g., GRU4REC [13],NARM [20]) or memory networks (e.g., STAMP [25], CSRM [44]) to extract sequential session’spairwise item-transition patterns inside the session for user preference modeling. Then, graph neural

Take-outA B C A C

Session 3

Session 4

Music

Session 1

Session 2

D

A B C A CD

A B C D FE

A B C D FE

Session 2

Le CygneFor Alice Canon CanonFor Alice Meditation

Hotel California Hotel CaliforniaIt’s My Life Vincent VincentLet it be

Scarborough

FairWe Will

Rock YouSugar As Long As

You Love

Hello

PIZZA HUT McDonald’s STARBUCKS PIZZA HUT BURGER KING STARBUCKS

STARBUCKS

PIZZA HUT

HaagenDazsCHEESECAKE

FACTORY COSTA PIZZA HUT PAPA JOHN’SDOMINO’S

PIZZA

Recommend:

Recommend:

Recommend:

Recommend:

For Alice

Rolling In

the DeepSomeone

Like You

(a)

Take-outA B C A C

Session 3

Session 4

Music

Session 1

Session 2

D

A B C A CD

A B C D FE

A B C D FE

Session 2

Le CygneFor Alice Canon CanonFor Alice Meditation

Hotel California Hotel CaliforniaIt’s My Life Vincent VincentLet it be

Scarborough

FairWe Will

Rock YouSugar As Long As

You Love

Hello

PIZZA HUT McDonald’s STARBUCKS PIZZA HUT BURGER KING STARBUCKS

STARBUCKS

PIZZA HUT

HaagenDazs CHEESECAKE

FACTORYCOSTA PIZZA HUT PAPA JOHN’S

DOMINO’S

PIZZA

Recommend:

Recommend:

Recommend:

Recommend:

For Alice

Rolling In

the DeepSomeone

Like You

(b)

Fig. 1. Example of sessions and correspond group-level behavior patterns.

networks (GNN) methods [22, 48, 49] with self-attention mechanisms (e.g., SR-GNN [48], GCE-GNN [47]) are proposed for SBR due to their powerful capability in learning the embedding ofnon-Euclidean data, which learn the representation of the entire session by computing the relativeimportance based on the session’s pairwise item-transition between each item and the last one.However, the user’s preference is much more complicated than a solely consecutive time pattern


Exploiting Group-level Behavior Pattern for Session-based Recommendation 111:3

in the transition of item choices, and thus the above methods easily fall into the following twosituations: i) For users who prefer to discover new items, these methods recommend items thathave been consumed by them. ii) Recommend new items to the users who tend to re-consumeitems in the current session. This is because they calculate the recommendation scores only relyingon the relevance between the historical items within the session and the next item to be clicked,and ignore to explicitly capture repeat consumption behavior patterns (e.g., re-clicking an itemrepeatedly) for SBR. Indeed, repeat consumption is a critical factor for improving the performance ofsequential/session-based recommendation, as it usually accounts for a large proportion of user-iteminteractions in many recommendation scenarios [34, 43, 55]. Thus, a significant challenge is, how toeffectively leverage the repeat consumption behaviors to improve the performance of SBR.

The repeat consumption has not been extensively studied in recommender systems. RepeatNet [34]is the only work to explicitly model the repeat consumption for SBR, which partially addresses suchchallenge, since it solely learns the item embeddings in instance-level (i.e., item-dependent sessions)while ignoring the behavior information across sessions in group-level (i.e., item-independentbehavior sequences), which is so-called group-level behavior pattern (GBP, rf. Definition 1) in thispaper. We illustrate the GBP with an example in Fig 1. Without loss of generality, suppose we haveseveral sessions, we can simply map them into item-independent alphabet sequence according toDefinition 1, which can be regarded as the GBP for these sessions. For music listening in Fig 1a, itcan be observed that users with GBP “𝐴→ 𝐵→ 𝐶 → 𝐴→ 𝐷 → 𝐶” (e.g., session 1) clearly showthe habit of listening to songs repeatedly, among which the user with session 1 may prefer to listen toitems "A" ("For Alice") and "C" ("Canon") repeatedly. By contrast, users with GBP “𝐴→ 𝐵→ 𝐶

→ 𝐷 → 𝐸→ 𝐹” (e.g., session 2) show a distinct habit and are more inclined to explore new musicsongs. For take-out ordering in Fig 1b, we can obtain similar observations that user with session3 may prefer to order take-out repeatedly (e.g., "PIZZA HUT" and "STARBUCKS") while userwith session 4 incline to discover new restaurants. However, current methods may not work wellas they maintain the same recommendation strategy when facing different behavior sequences: i)These methods are hard to capture that which sessions should recommend more items that have beenconsumed by users (e.g., session 1 and session 3) and which sessions ought to recommend morenew items (e.g., session 2 and session 4). ii) These methods are difficult to learn which items aremore likely to be repurchased. When recommending items, these methods mainly focus on the lastfew items (e.g., "BURGER KING" and "STARBUCKS" in session 3) within the sessions. However,compared to item "D" and item "C", item "A" (e.g., "PIZZA HUT" in session 3) may be morelikely to be repurchased in sessions with pattern “𝐴 → 𝐵 → 𝐶 → 𝐴→ 𝐷 → 𝐶” according to thestatistic of Yoochoose. Thus, introducing GBPs into the SBR may help to solve the above two issues.Further, considering that different sessions may correspond to the same GBP and we treat all sessionscorresponding to the same GBP as a group, it is worthy to explore the common property of eachgroup for SBR.

For ease of understanding, we conduct empirical data analysis on Yoochoose (from RecSysChallenge 2015). The probability distribution of group-level behavior patterns is summarized inTable 1. Taking the first row as an example, “𝐴→ 𝐵→ 𝐵→ 𝐶→ 𝐶→ 𝐵” is a group-level behaviorpattern (i.e., “Pattern 1”). (“A”,5%), (“B”, “83%′′) and (New Item, “7%′′) denotes the probabilityof the next item to be “A”, to be “B”, and to be a new item (i.e., have not appeared in “Pattern 1”),respectively. Therefore, from Tabel 1, we observe that: (1) GBPs exhibit different inclinations torepeated items. For instance, “Pattern 1” (Sum, 93%) is more likely to be a repeat mode, whichmeans that the next item is more likely to be appeared items (with the probability of 93%). Incontrast, “Pattern 7” (“Sum”, 22%), is more likely be a explore mode, which indicates that a newitem will occur next (with the probability of 78%); (2) Regarding the repeat mode, GBPs havedifferent probability distributions for the repeated items. For example, in “Pattern 1”, compared with



Table 1. The probability distribution of group-level behavior patterns on Yoochoose.

Group-level Behavior Pattern Repeat Mode Explore ModeA B C D E F Sum New Item

1: A→ B→ B→ C→ C→ B 5% 83% 5% / / / 93% 7%2: A→ B→ A→ C→ C→ B 39% 20% 24% / / / 83% 17%3: A→ B→ C→ A→ D→ C 28% 14% 20% 14% / / 76% 24%4: A→ B→ C→ D→ B→ D 8% 28% 16% 21% / / 73% 27%5: A→ B→ C→ B→ D→ E 4% 8% 5% 10% 8% / 35% 65%6: A→ B→ B→ C→ D→ E 4% 7% 3% 6% 12% / 32% 68%7: A→ B→ C→ D→ E→ F 3% 2% 2% 3% 5% 7% 22% 78%

item “A” (5%), item “B” (83%) is more likely to be re-clicked. While in ‘Pattern 2”, the item withhighest repeating probability is “A” (39%). Base on the observations, two key problems arise, namely

• How to learn the switch probabilities between the repeat mode (i.e., re-clicking an appeareditem) and the explore mode (i.e., clicking a new item);• How to learn the inherent importance of each letter (i.e., position) in a group-level behavior

pattern, i.e., which item within GBP is more likely to be re-clicked.

To address the above issues, in this paper we propose a novel unified SBR model, named RNMSR,which explicitly models the SBR in a repeat-explore manner and leverages the group-level behaviorpatterns to learn the switch probabilities and the inherent importance of each item. Specifically, it firstlearns the session features from two levels: i) instance-level, which learns the item representationsaccording to a GNN layer on a similarity-based item-pairwise session graph; ii) group-level, whichemploys a mapping table to convert each session to GBP and encode GBPs into a low-dimensionalspace. Then in repeat module, the inherent importance of each item is learned based on the attentionof group-level behavior patterns on the different positions. And in explore module, a session-levelrepresentation is learned to predict the probability of each item to be clicked. The final predictionis decided by a discriminate module, which employs group-level behavior patterns to compute themode switch probabilities and combines and the recommendation probability of each item under thetwo modes (i.e., repeat and explore) in a probabilistic way.

The major contributions are summarized as follows,

• To the best of our knowledge, this is the first work of incorporating group-level behaviorpatterns to explicitly modeling the repeat consumption behaviors for SBR. We also proposea novel SBR method to simultaneously capture the instance-level and group-level users’preference of the given session.• In instance-level item representation learning, we propose a novel similarity-based item-

pairwise session graph to capture the dependencies within the session, which can better learnthe item representation.• Extensive experiments are conducted on three benchmark datasets, which demonstrates the

effectiveness of our proposed model, especially for the improvement at the top-n recommenda-tion.

2 RELATED WORKThis study relates with two research areas: session-based recommendation and repeat consumption.Next, we will present an overview of the most related work in each area.



2.1 Session-based Recommendation

Markov Chain-based SBR. Several Markov-Chain (MC) based methods (e.g., Markov DecisionProcesses, MDPs [38]) are proposed for SBR. They usually consider sequentiality into SBR bylearning the dependence of sequential items to infer the next action via Markov Chain. For example,Rendle et al. [36] propose a combination model of matrix factorization and first-order Markov Chain(MC) to capture the transitions over user-item interactions. Shani et al. [38] view the problem ofrecommendations as a sequential optimization problem and employ Markov Decision Processes(MDPs) for SBR, and the simplest Markov Decision Processes boid down to first-order MarkovChain where the predictions are computed through the transitions probability of the adjacent items[20].

Although MC-based methods can be applied for SBR without giving user information, they stillexist limitations as heavily relying on short-term dependency, i.e., the sequential transition of adjacentpairwise items. In fact, the users’ preference is much more sophisticated than simple sequentiallyitem patterns in the transition of item clicks. Therefore, in this paper we study the item transitionpattern by constructing a new representation of the session graph and put forward a similarity-metricfunction to measure the similarity of accurate short/long-term item-pairwise transition patterns (ratherthan sequentially item-transitions), via a simple permutation invariant operation, i.e., mean pooling.

Deep-learning based SBR. Recently, deep neural networks have shown significant improvementsover recommendation systems [21, 35, 51, 54] and also dominate the in SBR task. In the literature,there are two major classes of deep learning-based approaches, namely RNN-based and GNN-based.First, RNN-based methods [6, 10, 20] usually take into account the session sequence informationto be as the input for RNN. Hidasi et al. [13] apply RNN with Gated Recurrent Unit (GRU) forSBR. Li et al. [20] take the user’s main purpose into account and propose NARM to explore a hybridGRU encoder with an attention mechanism to model the sequential behavior of user. To emphasizethe importance of the last-click in the session, Liu et al. [25] propose an attention-based short-termmemory networks (named STAMP) to capture user’s current interests. With the boom of graph neuralnetworks (GNN), GNN-based methods attract increasing attention in SBR. Hence, several proposals[29, 32, 50] employ GNN-based model on the graph built from the current session to learn itemembeddings for SBR recently. Wu et al. [48] employ Gated GNN to learn the item embedding fromsession graph and use attentions to integrate each learnt item embedding. Qiu et al. [31] proposeFGNN that applies multi-head attention to learn each item representation, which is then extended[30] by exploiting the cross-session information. Meng et al. [27] propose MKMSR which capturesthe transitions between the successive items by introducing mirco-behavior and extern knowledgegraph. Chen et al. [5] consider two types of information loss in the session graph. Pan et al. [28]model the transitions within the session by adding an overall node into graph and employ highwaynetwork to avoid overfitting problem. Wang et al. [47] propose GCE-GNN which introduces globalcontext information into SBR to capture the global level item transitions.

However, these methods ignore to model the repeat consumption pattern for SBR, except Re-peatNet [34]. Specifically, Ren et al. propose an encoder-decoder structure to model the regularhabits of users, which treats sessions as the minimum granularity for learning item-dependent (i.e.,instance-level) session representation, then it explicitly models the user repeat consumption in arepeat-explore manner, and learns the switch probabilities of re-clicking an old item or newly-clickinga new item for recommendation. However, RepeatNet only models the instance-level session learn-ing while neglecting the group-level behavior pattern learning. In contrast, in this paper we fullyexploit group-level behavior patterns to learn the mode switch probabilities and the recommendationprobability of each item under the two modes for SBR in a probabilistic way.



Collaborative Filtering-based SBR. Although deep learning-based methods have achieved excel-lent performance, Collaborative filtering (CF) based methods can still provide competitive results.Item-KNN [37] can be extended for SBR by recommending items that are most similar to the last itemof the current session. Based on KNN approaches, KNN-RNN [16] incorporates the co-occurrence-based KNN model into GRU4REC [13] to extract the sequential patterns, STAN [9] takes the positionand recency information in the current and neighbor sessions into account. Wang et al. [44] proposeCSRM to enrich the representation of the current session by exploring the neighborhood sessions.CoSAN [26] incorporates neighborhood sessions embedding into the items of current session andemploys multi-head self-attention to capture the dependencies between each item. However, thesemethods do not fully explore the significant repeat consumption of users. Besides, the collaborativeinformation these methods used is still item-dependent, which makes these collaborative filtering-based methods easily suffer from the data-sparse and noise. In contrast, the proposed group-levelbehavior pattern in RNMSR is item-independent, which is effective to learn the group-level userpreference and model the repeat consumption of users.

2.2 Repeat ConsumptionRepeat consumption is a common phenomenon in many real-world scenarios, whose importancehas been highlighted in various domains, including web revisitation [1, 24, 53], repeated querysearch [40, 41], music listening [18], information re-finding [7, 8] and consumption predicting [2, 3,23, 43].

The early studies for repeat consumption in traditional recommendation problem can be dividedinto two categories, the former [2, 3] aims to predict which item the user is most likely to consumeagain, for example, Bhagat et al. [3] present various repeat purchase recommendation models onAmazon.com website, which lead to 7% increase in the product click through rate. And the latter [4]focus on whether the user will repeat consumption in the next action. Recently, Wang et al. [43]propose SLRC which incorporates two temporal dynamics (i.e., short-term effect and life-timeeffect) of repeat consumption into kernel function to capture the users’ intrinsic preference. Zhou etal. [55] develop an additive model of recency to exploit the temporal dynamic of behavior forrepeat consumption. Hu et al. [15] emphasize the importance of frequency information in user-item interactions and explain the weakness of LSTM in modeling repeated consumption problems.Different from these approaches which rely on user profile and long-term historical interactions, wefocus on the repeat consumption within anonymous sessions. Besides, these methods still focus onthe instance-level learning, which are hard to predict the repeat consumption behavior of users. Incontrast, the proposed method focus on the group-level behavior patterns to learn group-level userspreference and could learn the users’ preference in both instance-level and group-level, which leadsto a better performance.

3 PRELIMINARIESIn this section, we first present the problem statement and then introduce the related concepts,i.e., group-level behavior pattern and mapping table, which are useful signals to model the repeatconsumption of users across sessions.

3.1 Problem DefinitionLet S = {S𝑖 } |S | be all sessions andV = {𝑣𝑖 } |V | be all items over sessions. Each session is denotedby S𝑖 = {𝑣𝑖1, 𝑣𝑖2, ..., 𝑣𝑖𝑛}, consisting of a sequence of 𝑛 actions (e.g., an item clicked by a user) inchronological order, where 𝑣𝑖𝑗 ∈ V is the 𝑗-th interaction with session S𝑖 .



Given a session S, the problem of session-based recommendation aims to recommend top-𝑁 itemsthat are more likely to be clicked by the user in the next action (i.e., 𝑣𝑖𝑛+1).

3.2 Group-level Behavior PatternRepeat consumption commonly appear in many recommendation scenarios (e.g., e-commerce, music,and movie) [2, 15, 43], which also has been proven to be effective for session-based recommendation[34]. However, existing methods mainly focus on instance-level behavior patterns, i.e., which itemsare more likely to be re-clicked repeatedly, via learning the item-dependent representations withan attention-based mechanism [34]. They do not fully explore the group-level behavior pattern thatfocuses on the learning of the item-independent representations for each item in the correspondinggroup, which is formally defined as follows.

DEFINITION 1. (Group-level Behavior Pattern (GBP) (P𝐺 )). Given any session S = {𝑣1, 𝑣2, · · · , 𝑣𝑚},the group-level behavior pattern indicates an anonymous sequence that projects the items in 𝑆 into asequence (i.e., a list with length of m) of keys according to a mapping table1, i.e., 𝑓P𝐺 : 𝑆 → P𝐺 (𝑆),

P𝐺 (𝑆) =(M(𝑣𝑠1),M(𝑣𝑠2), · · · ,M(𝑣𝑠𝑚)

). (1)

Here,M(.) indicates a mapping table that records the number of distinct items in 𝑆 when eachitem 𝑣𝑠𝑖 firstly appears in 𝑆 , which is defined as follows.

DEFINITION 2. (Mapping Table (M)). For any session S = {𝑣1, 𝑣2, · · · , 𝑣𝑚},M is a mappingfunction that assigns a key to each element (𝑣𝑠𝑖 ) of session 𝑆 according to the order of appearance,i.e., 𝑓M : 𝑣𝑠𝑖 →M(𝑣𝑠𝑖 ),

M(𝑣𝑠𝑖 ) = |{𝑣1, 𝑣2, · · · , 𝑣𝑝 }|, 𝑝 =𝑚𝑖𝑛 𝑗 {𝑣 𝑗 = 𝑣𝑖 }. (2)

Remark. The key difference between the group-level behavior pattern and the original session isthat the former is an item-independent sequence. For ease of understanding, we take an examplefor illustration, given two sessions like 𝑆𝑖 = {𝑣4, 𝑣5, 𝑣6, 𝑣4, 𝑣6} (𝑀 (𝑣4) = 𝐴,𝑀 (𝑣5) = 𝐵,𝑀 (𝑣6) = 𝐶])and 𝑆 𝑗 = {𝑣7, 𝑣8, 𝑣9, 𝑣7, 𝑣9} (𝑀 (𝑣7) = 𝐴,𝑀 (𝑣8) = 𝐵,𝑀 (𝑣9) = 𝐶]), which correspond to a same repeatbehavior pattern P𝐺 (𝑆𝑖 ) = P𝐺 (𝑆 𝑗 ) = (𝐴, 𝐵,𝐶,𝐴,𝐶) .

4 THE PROPOSED METHODIn this section, we first present an overview of our proposed GNN-based Neural Mechanism methodRNMSR, which is shown in Figure 2. Then we introduce the detail of each component of RNMSR.

4.1 OverviewThe goal of SBR is to learn a prediction function parameterized by \ that recommends the next itembased on a sequence of items within a given session. To explicitly take the group-level behaviorpattern into consideration, here we follow the principle of RepeatNet [34] to define our SBR modelin a repeat-explore manner. The parameters are optimized by maximizing the posterior probability ofpredicting the most likely action 𝑣𝑛+1 for a given session S = {𝑣1, 𝑣2, · · · , 𝑣𝑛},

\ ∗ ← argmax\ (Pr(𝑣𝑛+1 |S;P𝐺 (𝑆);\ ))= argmax\ (Pr(R|S;P𝐺 (𝑆);\ ) Pr(𝑣𝑛+1 ∈ S|S;P𝐺 (𝑆);R;\ )+Pr(E|S;P𝐺 (𝑆);\ ) Pr(𝑣𝑛+1 ∈ V − S|S;E;\ )),

(3)

where R and E denote the repeat mode and the explore mode, respectively; Pr(R|S;P𝐺 (𝑆);\ )and Pr(E|S;P𝐺 (𝑆);\ ) indicate the probability of selecting the repeat mode and the explore mode1To present the group-level behavior patterns more clearly, in this paper we map the key of Mapping Table (M) from{1, 2, 3, · · · } to {𝐴, 𝐵,𝐶, · · · }.



V1 V3 V9

V5

Grap

h N

eural N

etw

ork

s

Explo

re P

red

ictionExplore

Module

V1

V3

V5

V9

V2

V4

Vn

Items:

...

...

𝒗𝟏 → 𝒗𝟑 → 𝒗𝟓 → 𝒗𝟑 → 𝒗𝟗

Input Session

Similarity-based

Item-Pairwise

Session Graph

Re

pe

at Pre

dictio

n

𝒘𝒆𝒊𝒈𝒉𝒕𝒆𝒙𝒑𝒍𝒐𝒓𝒆

𝒘𝒆𝒊𝒈𝒉𝒕𝒓𝒆𝒑𝒆𝒂𝒕

Note: (1) Repeated Item: Item that has appeared in the session.

(2) New Item: Item that does not appear in the session.

V8

𝒗𝟐 𝒗𝟒 ⋯ 𝒗𝒏

Repeat

Module

𝒗𝟓

𝒗𝟖

A

B

C

D

𝒗𝟏

𝒗𝟑

𝒗𝟓

𝒗𝟗

𝒗𝟓

𝒗𝟖

1

2...

n B CA

Group-level Behavior Pattern

[… 0 0 1 0 0 …]

...

One-hot Encoding

A

B D

Mapping Table

Position

Vectors

Group-level Behavior

Pattern Learning

Atten

tion

Netw

ork

Group-level Behavior

Pattern Representation

Item-Pairwise

𝒗𝟏, 𝒗𝟑

𝒗𝟏, 𝒗𝟓

𝒗𝟏, 𝒗𝟗

𝒗𝟑, 𝒗𝟓

𝒗𝟑, 𝒗𝟗

𝒗𝟓, 𝒗𝟗

...

...

Instance-level Item Representation Learning

V3R

ep

eate

d Ite

m Sco

reN

ew Item

Score ...

Discrim

ina

te Mo

du

le

V2

V9

V5

V1

Recommendation Score

Repeated

ItemNew

Item

0.20

0.18

0.15

0.10

0.08

0.05

B B C C B

A B A C B A

E

𝒗𝟔 → 𝒗𝟖 → 𝒗𝟖 → 𝒗𝟐 → 𝒗𝟐 → 𝒗𝟖

𝒗𝟕 → 𝒗𝟏𝟖 → 𝒗𝟒 → 𝒗𝟔

...

𝒗𝟔

𝒗𝟖

𝒗𝟐

𝒗𝟕

𝒗𝟏𝟖

𝒗𝟒

𝒗𝟔

...

...

...

...

...

Item Embedding

𝐏𝐆(𝐒)

𝐮𝐏𝐆 𝐒 𝐮𝐩𝐆 𝐒

Concatenation Multiplication

Position Vectors

Atten

tion

Netw

ork

Fig. 2. An overview of the proposed RNMSR framework. We first construct a similarity-based item-pairwise session graph and learn the instance-level items representations via GNNs. Then, we obtainthe group-level behavior pattern representation and learn its impact on reversed position vectors tocompute the scores for repeated items in the Repeat Module. The Explore Module computes scoresfor new items based on Attention Networks. Finally, the Discriminate Module leverages group-levelbehavior patterns to assign weights for two kinds of scores, and obtains the combined scores for allitems.

over the clicked items 𝑣1:𝑛 and corresponding group-level behavior patterns, respectively. Pr(𝑣𝑛+1 ∈S|S;P𝐺 (𝑆);R;\ ) indicates the probability of recommending the historical item (i.e., 𝑣𝑛+1 ∈ S) inthe repeat mode by considering the group-level behavior patterns. Pr(𝑣𝑛+1 ∈ V − S|S; E;\ ) denotesthe probability of recommending the new item (i.e., 𝑣𝑛+1 ∈ V − S) under the explore mode. We willdetail them in the following sections, respectively.

4.2 Instance-level Item Representation LearningTo learn the instance-level item representations, we design a novel GNN-based model, where thetopology of the graph is determined by the similarity between items in the latent space.

Similarity-based Item-Pairwise Session Graph. Traditional GNN-based approaches learn nodeembeddings via the local permutation-invariant neighborhood aggregation based on the skeletonof the session graph, which is generated according to the natural order of the session graph. Theygenerally rely on a assumption that the consecutive items within a session have correlations, whereasthis assumption might not be true since a session may contain irrelevant short-term item transitionsand relevant long-term item transitions. Therefore, in this section, we present a new representation ofsession graph.

For any session S = {𝑣 𝑗 } |S | , let G𝑠 = (V𝑠 , E𝑠 ) be the corresponding directed session graph, whereV𝑠 = {𝑣𝑠𝑖 } |V𝑠 | and E𝑠 = {𝑒𝑠𝑖 𝑗 } |E𝑠 | denote the node set and the edge set, respectively. The 𝑘-th iterationof aggregating information from neighboring nodes can be formed as,

h𝑁,(𝑘)𝑖

= Agg(h(𝑘)𝑗|𝑣 𝑗 ∈ N(𝑣𝑖 )

), (4)

where h(𝑘)𝑗

denotes the hidden embedding of node 𝑣 𝑗 at 𝑘-th iteration. Agg(.) serves as the aggregationfunction of collecting information from neighboring nodes, which can be any permutation invariant



operation. Here, we use the mean function as the aggregate function, which empirically obtains goodperformance. In particular, N(𝑣𝑖 ) is the neighboring node set of node 𝑣𝑖 , which is defined as follows,

N(𝑣𝑖 ) = {𝑣 𝑗 |𝑣𝑖 , 𝑣 𝑗 ∈ S; 𝑆𝑖𝑚(h𝑖 , h𝑗 ) ≥ [}, (5)

where [ is a hyper-parameter that controls the size of the neighborhoods. 𝑆𝑖𝑚(𝑣𝑖 , 𝑣 𝑗 ) is a similarity-metric function for measuring the similarity of the item pair (𝑣𝑖 , 𝑣 𝑗 ). The cosine similarity functionhas been proven effective [46] in modeling the relevance of two items (𝑣𝑖 , 𝑣 𝑗 ) within a session, thuswe adopt it as our item-pairwise similarity metric. In particular, the items in the original sequenceare in chronological order, and thus we treat the items at the left-hand side of item 𝑣𝑖 in the originalsequence as the in-link nodes, and the items at the right-hand side of item 𝑣𝑖 as the out-link nodes,which are denoted by

−→N (𝑣𝑖 ) and

←−N (𝑣𝑖 ), respectively.

Item Representation Learning. Based on the built similarity-based item-pairwise session graphand aggregation function, the representations of item 𝑖’s neighboring nodes can be obtained: i.e.,

h−→𝑁,(𝑘)𝑖

and h←−𝑁,(𝑘)𝑖

for in-link neighbor nodes and out-link neighbor nodes, respectively.Then we use a fully connected layer to transform self information and neighbor information into a

new latent feature,

o(𝑘)𝑖

= tanh(W𝑠h

(𝑘−1)𝑖

+W𝑁 [h−→𝑁,(𝑘−1)𝑖

| |h←−𝑁,(𝑘−1)𝑖

] + b𝑁), (6)

where W𝑁 ∈ R𝑑×2𝑑 ,W𝑠 ∈ R𝑑×𝑑 , and b𝑁 ∈ R𝑑 are trainable parameters. To reduce the transmissionloss, we additionally use a residual connection [11],

h(𝑘)𝑖

= o(𝑘)𝑖+ h𝑎𝑙𝑙,(𝑘−1)

𝑖+ h(𝑘−1)

𝑖, (7)

where h𝑎𝑙𝑙,(𝑘−1)𝑖

denotes an overall representation to further increase the influence of neighbor features

during propagation, where h𝑎𝑙𝑙,(𝑘−1)𝑖

= Mean(h(𝑘−1)𝑗|𝑣 𝑗 ∈

−→N (𝑣𝑖 ) ∪

←−N (𝑣𝑖 ) ∪ 𝑣𝑖

).

4.3 Group-level Behavior Pattern LearningGiven a session S, we now present how to obtain its corresponding group-level behavior pattern.First, a unique mapping tableM is constructed according to Definition 2, where each distinct item 𝑣𝑖within the session S is assigned with a unique keyM(𝑣𝑖 ):

M(𝑣𝑖 ) = 𝑓M (𝑣𝑖 ). (8)

Based on the mapping tableM, the session S is projected into an anonymous sequence (i.e., group-level behavior pattern P𝐺 (S)) by looking up the key of each item in the mapping tableM accordingto Definition 1:

P𝐺 (S) = 𝑓P𝐺 (S), (9)where the original session sequence S = (𝑣𝑠1, 𝑣𝑠2, · · · , 𝑣𝑠𝑚) is converted into group-level behaviorpattern P𝐺 (S) =

(M(𝑣𝑠1),M(𝑣𝑠2), · · · ,M(𝑣𝑠𝑚)

). Each group-level behavior pattern corresponds to

a group of sessions, which is significant to learn the group-level preference for predicting whetherthey would like to conduct repeated consumption and which items within the session are preferred.However, directly denoting them as sequences is insufficient to obtain the latent factors from P𝐺 (S),thus it is necessary to leverage a representation approach to encode P𝐺 (S) into vectors, from wherewe can capture the features of varying group-level behavior patterns.

Note that P𝐺 (S) is an item-independent sequence, where each element in the P𝐺 (S) does notcontain the item’s features. And the order of the elements is important, for example, “𝐴→ 𝐵→ 𝐶→ 𝐶

→ 𝐴” and “𝐴→ 𝐴→ 𝐵→ 𝐶 → 𝐶” represent two distinct behavior patterns. Hence we mainly focuson the representation learning of the entire group-level behavior pattern, rather than each element



within the sequence. Specifically, the whole P𝐺 (S) sequence is treated as an element, which isencoded into a unique one-hot encoding 𝑢P𝐺 (S) . Then each 𝑢P𝐺 (S) is projected into a unified lowdimensional vector using an embedding layer,

uP𝐺 (S) = 𝐸𝑚𝑏𝑒𝑑𝑝𝑎𝑡𝑡𝑒𝑟𝑛 (𝑢P𝐺 (S) ), (10)

where 𝐸𝑚𝑏𝑒𝑑𝑝𝑎𝑡𝑡𝑒𝑟𝑛 denotes the embedding layer for encoding repeated behavior patterns, anduP𝐺 (S) ∈ R𝑑 is the representation of session S’s group-level behavior pattern, from where we cancapture the features of group-level behavior patterns P𝐺 (S).

4.4 Repeat-Explore MechanismAfter feeding the session into the Instance-level Item Representation Learning Layer and Group-levelBehavior Pattern Learning layer, we can obtain the new representation h′ for each item and thegroup-level behavior pattern representation P𝐺 (S) for current session. Now we focus on how topredict the probability for each item based on h′ and P𝐺 (S).

4.4.1 Discriminate Module. The discriminate module computes the probability of executing therepeat module and explore module. As shown in Table 1, groups with different group-level behaviorpatterns exhibit distinct trends in whether repeat consumption in the next action. For example, userswith pattern "𝐴 → 𝐵 → 𝐶 → 𝐴 → 𝐷 → 𝐶" are more likely to repeat consumption in next actionwhile users with pattern "𝐴 → 𝐵 → 𝐶 → 𝐷 → 𝐸 → 𝐹" incline to explore new items. To modelthe user preference, we incorporate group-level behavior pattern to enable our model to learn thegroup-level users’ preference,

z = [uP𝐺 (S) | |s𝑑 ], (11)

where ∥ indicates concatenation and s𝑑 ∈ R𝑑 is a fixed-length representation, which is obtained by aself-attention [42] over all items within the session,

𝛽𝑖 = softmax(q𝑇𝑑tanh(W𝑑h

′𝑖 + b𝑑 ))

s𝑑 =∑︁𝑖

𝛽𝑖h𝑖 ,(12)

where W𝑑 ∈ R𝑑×𝑑 and q𝑑 , b𝑑 ∈ R𝑑 are trainable parameters. Then we use a 𝐿-layer Multi-layerPerceptron (MLP) to extract the latent condensed features from group-level behavior patterns anditem features,

z𝐿 = FC(FC(· · · FC(z))) = FC𝐿 (z), (13)

where FC(z) = 𝜎 (Wz + b) is a single fully-connected layer, z𝐿 ∈ R𝑑 is the output of 𝐿-layerMulti-layer Perceptron. The probability distribution is obtained through the softmax function,

[Pr(R|S;P𝐺 (𝑆);\ ),Pr(E|S;P𝐺 (𝑆);\ )] = softmax(W𝑝z𝐿), (14)

where W𝑝 ∈ R2×𝑑 is a learnable transform weight, Pr(R|S;P𝐺 (𝑆);\ )) and Pr(E|S;P𝐺 (𝑆);\ ) aretwo scalars (i.e.,𝑤𝑒𝑖𝑔ℎ𝑡𝑟𝑒𝑝𝑒𝑎𝑡 and𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑥𝑝𝑙𝑜𝑟𝑒 in Figure 2), representing the probability of executingrepeat module and explore module, respectively.

4.4.2 Repeat module. Here, the repeat module aims to predict the possibility of items in thesession being re-clicked. As mentioned in Table 1, sessions with different GBPs have differentprobability distributions for items that have appeared when re-consumption. Hence, we learn theinherent importance of each item (i.e., position) in GBPs by incorporating GBPs into repeat module.Now we present how we leverage group-level behavior pattern P𝐺 (S) and position information forrepeat module. First, a trainable reversed position matrix P = {p1, p2, · · · , p𝑛} is used to encode theposition of each item of session S into a vector according to [47], where p1 is the vector of the first



position and corresponds to the last item 𝑣𝑠𝑛 in the session sequence. Then we learn the impact ofP𝐺 (S) on different positions by a fully connected layer:

m𝑖 = tanh(W𝑚 [p𝑖 ∥uP𝐺 (S) ] + b𝑚), (15)

where W𝑚 ∈ R𝑑×2𝑑 and 𝑏 ∈ R𝑑 are trainable parameters.Then the learned impact vector is integrated with item features, and the score of each item to be

re-clicked is computed by an attention mechanism as follows,

𝑠𝑐𝑜𝑟𝑒𝑟𝑖 = q𝑇𝑟 tanh(W𝑟h𝑖 +U𝑟m𝑛−𝑖+1 + b𝑟 ), (16)

where W𝑟 ,U𝑟 ∈ R𝑑×𝑑 and q𝑟 , b𝑟 ∈ R𝑑 are trainable parameters. Finally, the probability of each itemis obtained by the softmax function:

Pr(𝑣𝑖 ∈ S|S;P𝐺 (𝑆);R;\ ) =exp (𝑠𝑐𝑜𝑟𝑒𝑠𝑟𝑖 )∑

𝑣𝑘 ∈𝑆 exp (𝑠𝑐𝑜𝑟𝑒𝑠𝑟𝑘 ), (17)

4.4.3 Explore Module. The explore module predicts the possibility of new items to be clicked.Due to the huge diversity in non-repeat sessions, P𝐺 (S) has limited effect on explore module, whichmeans Pr(𝑣𝑖 ∈ V − S|S;P𝐺 (𝑆); E;\ ) = Pr(𝑣𝑖 ∈ V − S|S; E;\ ). In explore module, a session-levelrepresentation is learned to capture the main preference of users. As each item in the sequence has adifferent importance to the current session, we utilize an attention mechanism to learn the importanceweights for each item based on reversed position vectors,

𝛼𝑖 = q𝑇𝑒 tanh(W𝑒h𝑖 +U𝑒p𝑛−𝑖+1 + b𝑒 ), (18)

where W𝑒 ,U𝑒 ∈ R𝑑×𝑑 and q, b𝑒 ∈ R𝑑 are trainable parameters. Then we adopt softmax to normalizethe importance weights and the session representation is computed by weighted sum of each item’sfeatures,

s𝑒 =∑︁𝑖

softmax(𝛼𝑖 )h𝑖

s′𝑒 = tanh(W𝑠se + b𝑠 ) + se,(19)

where W𝑠 ∈ R𝑑×𝑑 is a trainable parameter and s′𝑒 ∈ R𝑑 is the learned session representation. Thefinal score of each item is computed by the inner product between s′𝑒 and its own features,

𝑠𝑐𝑜𝑟𝑒𝑠𝑒𝑖 =

{−∞ 𝑣𝑖 ∈ S𝑠 ′𝑒

𝑇 𝑣𝑖 𝑣𝑖 ∈ 𝑉 − S

Pr(𝑣𝑖 ∈ V − S|S;E;\ ) =exp(𝑠𝑐𝑜𝑟𝑒𝑠𝑒𝑖 )∑𝑚𝑘=1 exp(𝑠𝑐𝑜𝑟𝑒𝑠𝑒𝑘 )

,

(20)

where −∞ denotes negative infinity.

4.5 OptimizationThe output prediction probability for each item can be computed according to Equation 3. Our goalis to maximize the prediction probability of the ground truth item, the loss function is defined as thecross-entropy of the prediction results:

L = −|𝑉 |∑︁𝑖=1

y𝑖 log(Pr(𝑣𝑖 |S;\ )), (21)

where y denotes the one-hot encoding vector of the ground truth item.



Table 2. Statistics of the used datasets.

Dataset Yoochoose 1/64 Yoochoose 1/4 Diginetica Nowplaying

# clicks 557,248 8,326,407 982,961 1,587,776# items 16,766 29,618 43,097 60,417# train sessions 369,859 5,917,745 719,470 825,304# test sessions 55,898 55,898 60,858 89,824avg. len. 6.16 5.71 5.12 7.42

5 EXPERIMENTSIn this section, we first present the experimental settings. Then we compare the proposed RNMSRwith other comparative methods and make detailed analysis on the experimental results.

5.1 Experimental Settings

Datasets. To evaluate the performance of our method, three representative benchmark datasets areemployed, namely, Yoochoose, Diginetica, Nowplaying:

• Yoochoose2: The Yoochoose dataset is obtained from the RecSys Challenge 2015, whichconsists of six mouth click-streams of an E-commerce website.• Diginetica3: The Diginetica dataset comes from CIKM Cup 2016, containing anonymous

transaction data within five months of an E-commerce platform, which is suitable for session-based recommendation.• Nowplaying4: The Nowplaying dataset comes from music-related tweets [52], which describes

the music listening behavior sequences of users.

Following [47–49], we conduct preprocessing steps over three datasets. Firstly, sessions of length 1and items appearing less than 5 times are filtered on both datasets. We set the sessions of the last day(latest data) as the test data for Yoochoosethe, sessions of the last week as the test data for Diginetica,sessions of last two months as the test data for Nowplaying, and the remaining historical data is fortraining. Further, for a session 𝑠 =

[𝑣𝑠1, 𝑣

𝑠2, ..., 𝑣

𝑠𝑛

], we use a sequence splitting preprocessing [25, 48] to

generate sequences and corresponding labels, i.e., ([𝑣𝑠1

], 𝑣𝑠2), (

[𝑣𝑠1, 𝑣

𝑠2

], 𝑣𝑠3), ..., (

[𝑣𝑠1, 𝑣

𝑠2, ..., 𝑣

𝑠𝑛−1

], 𝑣𝑠𝑛)

for both training and testing. Since the training set of Yoochoose is extremely large, following [48],we use the most recent portions 1/64 and 1/4 of the training sequences, denoted as ”𝑌𝑜𝑜𝑐ℎ𝑜𝑜𝑠𝑒1/64”and ”𝑌𝑜𝑜𝑐ℎ𝑜𝑜𝑠𝑒1/4” datasets, respectively. The statistics of preprocessed datasets are summarizedin Table 2.

Evaluated State-of-the-art Methods. To evaluate the performance for session based recommen-dation, we compare our proposed method with nine baselines including several state-of-the-artmodels:

• POP: A simple method which directly recommends the most popular items in the training set.• Item-KNN [37]: An item-based collaborative filtering algorithm that recommends items

similar to the historical items.• FPMC [36]: A personalized Markov chain model that utilizes matrix factorization for session

based recommendation.

2https://competitions.codalab.org/competitions/111613http://cikm2016.cs.iupui.edu/cikm-cup/4http://dbis-nowplaying.uibk.ac.at/#nowplaying



• GRU4Rec [13]: A RNN-based neural network mechanism which uses Gated Recurrent Unitto model the sequential behavior of users.• NARM [20]: A hybrid model which improves the GRU4Rec by incorporating an attention

mechanism into RNN.• STAMP [25]: An attention-based deep learning model which mainly uses the last item to

capture the short-term interest of user.• RepeatNet [34]: A state-of-the-art GRU-based method which exploits a repeat-explore mech-

anism to model the repeat consumption of users.• SR-GNN [48]: It employs a gated GNN layer to learn item embeddings, followed by a self-

attention of the last item to obtain the session level representation.• GCE-GNN [47]: A state-of-the-art GNN-based model that additionally employs global context

information and reversed position vectors.

Evaluation Metrics. We adopt two widely used ranking based metrics for SBR: P@N and MRR@Nby following previous work [47, 48]. The P@N score indicates the precision of the top-𝑁 recom-mended items. The MRR@N score is the average of reciprocal rank of the correctly-recommendeditems in the top-𝑁 recommendation items. The MRR score is set to 0 when the rank of ground-truthitem exceeds 𝑁 . In this paper, we set 𝑁 = 20 for both P@N and MRR@N. In addition, we also adoptNDCG@N as a metric to fully evaluate the ranking quality of our proposed method.

Implementation Details. Following previous methods [25, 48], the dimension of the latent vectorsis fixed to 100, and the size for mini-batch is set to 100. And we keep the hyper-parameters of allevaluated methods consistent for a fair comparison. For our model, all parameters are initializedwith a Gaussian distribution with a mean of 0 and a standard deviation of 0.1. We use the Adamoptimizer [19] with the initial learning rate 0.001, which will decay by 0.1 after every 3 epochs,and the L2 penalty is set to 10−5. To avoid overfitting, we adopt dropout layer [39] after theembedding layer of items. The dropout ratio is searched in {0, 0.25, 0.5} and threshold [ is searched in{0, 0.1, 0.2, · · · , 0.9} on a validation set, which is a random 10% subset of the training set. Moreover,the maximum length of GBPs is set 6, and we obtain the GBPs from the last 6 items of them forsessions longer than 6.

5.2 Experimental ResultsThe experimental results of the nine baselines and our proposed method are reported in Table 3,where the best result of each column is highlighted in boldface.

Overall Comparison. From Table 3, we observe that RNMSR consistently outperforms both tradi-tional methods and neural network methods, which demonstrates the effectiveness of our proposedmethod. To better understand the performance of different models, we provide thorough discussionsas follows.

Among the traditional methods, the performance of POP is relatively poor, as it ignores thepreference of users and simply recommends top-𝑁 popular items. FPMC performs better thanPOP over four datasets, which shows the effectiveness of using first-order Markov Chain to modelsession sequences, and it also indicates that the next action of a user is closely related to the last item.Comparing with POP and FPMC, Item-KNN achieves better performance by computing the similaritybetween items, which indicates the importance of co-occurrence information in SBR. However, itfails to capture the sequential transitions between the items as it neglects the chronological orders inthe sessions.

Different from traditional methods, deep learning-based baselines obtain better performance overall datasets. GRU4Rec is the first RNN-based method for SBR, which is able to achieve similar or



Table 3. The performance of evaluated methods on four datasets.

MethodYoochoose 1/64 Yoochoose 1/4 Diginetica Nowplaying

P@20 MRR@20 P@20 MRR@20 P@20 MRR@20 P@20 MRR@20POP 7.31 1.69 1.37 0.31 1.18 0.28 2.28 0.86

Item-KNN 51.60 21.81 52.31 21.70 35.75 11.57 15.94 4.91FPMC 45.62 15.01 51.86 17.50 22.14 6.66 7.36 2.82

GRU4Rec 60.64 22.89 59.53 22.60 30.79 8.22 7.92 4.48NARM 68.32 28.63 69.73 29.23 48.32 16.00 18.59 6.93STAMP 68.74 29.67 70.44 30.00 46.62 15.13 17.66 6.88

RepeatNet 70.06 30.55 70.71 31.03 48.49 17.13 18.84 8.23SR-GNN 70.57 30.94 71.36 31.89 50.73 17.59 18.87 7.47

GCE-GNN 70.90 31.26 71.40 31.49 54.22 19.04 22.37 8.40RNMSR 72.11* 33.01* 72.22* 33.43* 54.66* 20.00* 22.84* 10.26*

Improv. 1.7% 5.6% 1.1% 4.8% 0.8% 5.0% 2.1% 22.1%𝑝-value <0.01 <0.001 <0.01 <0.001 <0.01 <0.001 <0.001 <0.001

better results than traditional methods. The result shows the strength of RNN in modeling sequentialdata. However, GRU4Rec is incapable of capturing the user’s preference as it merely regardsSBR as a sequence modeling task. The subsequent methods, NARM and STAMP, significantlyoutperform GRU4Rec over four datasets. NARM explicitly captures the main preferences of users byincorporating attention mechanism into RNN and STAMP utilizes the self attention of last item toconsider user’s short-term preference, which makes them perform better. By considering the repeatbehavior of users, RepeatNet outperforms other RNN-based methods, which shows the importanceof users’ repeat consumption in SBR. However, the improvement of RepeatNet is marginal comparewith other baselines, which is caused by two reasons: (i) RepeatNet captures the repeat consumptionof users in instance level and only consider item-dependent features, thus it is hard to accuratelymodel the repeat consumption and user preference in SBR. (ii) RepeatNet is unable to capture thecollective dependencies [48] due to its RNN architecture.

By converting every session sequence into a subgraph and encoding items within the sessionvia GNNs, SR-GNN and GCE-GNN achieve better results than RNN models. Specifically, SR-GNN employs a gated GNN layer to learn the collective dependencies within the session and usesself-attention to obtain the session representation. GCE-GNN explores the global context of eachitem from the transitions in all sessions and leverages reversed position information to learn theimportance of each item. The results demonstrate the strength of GNNs to model the dependenciesbetween items within the session. However, all these methods focus on the instance-level sessionlearning, while neglecting the group-level users’ preference learning. Besides, the GNNs layers usedin SR-GNN and GCE-GNN are hard to capture the long-term dependencies and easily influenced bytransmission loss.

The proposed RNMSR model outperforms all the baselines. Specifically, RNMSR outperforms thebest result of baselines by 0.8% - 2.1% in terms of P@20 and 5.0% - 22.1% in terms of MRR@20 onfour datasets. This is because RNMSR employs group-level behavior patterns, which split sessionsinto different groups to capture the preference of users in group-level. Moreover, RNMSR constructsa similarity-based item-pairwise session graph in instance-level item representation learning layer,



Table 4. Performance of RNMSR on Yoochoose 1/64, Diginetica and Nowpalying.

(a) Performance in terms of MRR@N when N = 1, 3, 5 and 10.

MethodYoochoose 1/64 Diginetica Nowplaying

@1 @3 @5 @10 @1 @3 @5 @10 @1 @3 @5 @10

NARM 15.71 23.95 26.28 28.12 7.46 11.73 13.96 15.37 3.62 5.03 5.59 6.13RepeatNet 17.15 25.25 27.79 29.53 9.23 13.25 14.87 16.28 5.19 6.70 7.16 7.61SR-GNN 17.51 26.05 28.33 30.05 9.02 13.56 15.07 16.53 5.06 6.53 7.00 7.47

GCE-GNN 17.33 26.12 28.72 30.46 9.38 14.54 16.50 18.15 4.81 6.74 7.43 8.04

RNMSR 20.57 28.31 30.52 32.27 10.75 15.90 17.60 19.15 6.90 8.64 9.20 9.74Improv. 16.9% 8.3% 6.2% 5.9% 14.6% 8.6% 6.6% 5.5% 32.94% 28.19% 23.8% 21.1%

(b) Performance in terms of NCDG@N when N = 1, 3, 5 and 10.

MethodYoochoose 1/64 Diginetica Nowplaying

@1 @3 @5 @10 @1 @3 @5 @10 @1 @3 @5 @10

NARM 15.71 26.96 31.12 35.51 7.46 13.21 16.10 19.86 3.62 6.52 7.83 9.12RepeatNet 17.15 28.00 32.04 36.31 9.23 15.21 17.66 20.79 5.19 7.21 8.04 9.12SR-GNN 17.51 28.98 33.14 37.36 9.02 15.07 18.01 21.59 5.06 7.03 7.87 9.00

GCE-GNN 17.33 29.27 33.53 37.73 9.38 16.49 19.55 23.39 4.81 7.40 8.65 10.14

RNMSR 20.57 30.91 34.91 39.12 10.75 17.67 20.74 24.39 6.90 9.15 10.14 11.48Improv. 16.9% 5.6% 4.1% 3.6% 14.6% 7.1% 6.0% 4.2% 32.94% 23.6% 17.2% 13.2%

which enables the RNMSR to capture the long-term dependencies in the sessions and obtain betteritem representations.

Evaluation of Recommendation Quality. The quality of the recommended list is important forrecommendation system as users usually only focus on the first few items in the recommended list.Thus we compare our approach with representative baseline methods by MRR@N and NDCG@Nwhen 𝑁 = 1, 3, 5 and 10 on Yoochoose 1/64, Diginetica and Nowplaying 5.

From Table 4, we observe that RNMSR significantly outperforms other baselines in terms ofMRR@N and NDCG@N, which indicates the high recommendation quality of RNMSR. Thepromising performance of RNMSR can be attributed to two aspects:

• Firstly, we can observe that the improvements of RNMSR in terms of P@1 and MRR@1 arehuge. It is because RNMSR uses group-level behavior patterns to determine the user’s tendencybetween repeat mode and explore mode, which allows the model to effectively capture theuser’s preferences in group-level and generates a higher quality recommendation list.• Secondly, we leverage the group-level behavior patterns with position vectors to identify the

items that are more likely to be repeatedly adopted by users. By incorporating GBPs into therepeat module, we can learn the inherent importance of each item and improve the quality ofthe recommendation list.

5In Table 4, we omit the results of RNMSR on Yoochoose 1/4, because the trends of RNMSR’s performance on yoochoose1/4 and yoochoose 1/64 are consistent.



Table 5. The ablation study. IIRL is the instance-level item representation learning layer, SSG denotesthe similarity-based session graph, and GBP refers to the global-level behavior pattern.

MethodYoochoose 1/64 Yoochoose 1/4 Diginetica Nowplaying

P@20 MRR@20 P@20 MRR@20 P@20 MRR@20 P@20 MRR@20

(1) w/o IIRL 71.08 32.37 71.52 32.59 53.06 19.82 21.24 9.30(2) w/o SSG 71.78 32.73 71.94 33.09 54.26 19.94 22.48 9.97

(3) w/o GBP-r 71.94 32.48 71.89 32.81 54.34 19.40 22.70 10.01(4) w/o GBP-d 71.91 32.36 71.81 32.70 54.18 18.86 22.19 9.48(5) w/o GBP 71.90 31.72 71.88 32.26 54.14 18.38 22.01 9.26

(6) RNMSR 72.11 33.01 72.22 33.43 54.66 20.00 22.84 10.26

5.3 Ablation and Effectiveness AnalysesTo investigate the effectiveness of proposed group-level behavior patterns and similarity-based sessiongraphs, we conduct the following ablation studies as stated in table 5. Specifically, we investigatefive variants of RMMSR to compare with the original RMMSR.

• In (1), we remove the instance-level item representation learning layer from RNMSR and usethe original item features.• In (2), we construct the session graph as previous studies [48, 49].• In (3), we remove the group-level behavior pattern from repeat module.• In (4), we remove the group-level behavior pattern from discriminate module.• In (5), we remove the group-level behavior pattern learning layer from RNMSR.• In (6), the overall RNMSR model is presented.

From the results presented in table 5, we have the following observations. First, we can observe aobvious improvement from (1) and (6),the results validate the effectiveness of the proposed instance-level item representation learning layer, where the mean pooling-based GNNs is powerful to extractfeatures from sessions and improve the performance of model. Second, the comparison between(2) and (6) indicates that using our similarity-based item-pairwise session graphs can enhance themodel performance, which shows that the natural order in the session is not essential for sessiongraph. And our proposed similarity-based item-pairwise session graph presents a new perspective toobtain the topological structure of session graph in the latent space. Third, by comparing (3) and (6),we can observe that incorporating global-level behavior pattern into repeat module can improve theperformance of RNMSR. It shows that GBP can lead model learn the importance of each item in thesession, which demonstrates that GBP is important when predicting the repeat consumption behaviorof users. Fourth, by comparing (4) and (6), we can observe that incorporating global-level behaviorpatterns into the discriminate module can highly improve the performance, especially in terms ofMRR@20. It confirms the strength of GBP in learning the switch probabilities between the repeatmode and the explore mode. Lastly, from (5) and (6), the results show that using global-level behaviorpatterns in both repeat module and discriminate module can further improve the performance ofRNMSR, which also indicates the importance of repeat consumption and the effectiveness of GBP inSBR.

5.4 Impact of the length of GBPsIn RNMSR, we set the maximum length of GBP to 6 for two reasons: i) It can be observed fromtable 2 that the average length of sessions is from 5.12 to 7.42 for four datasets. ii) The amount of



1 2 3 4 5 6 7 8length

31.5

31.7

31.9

32.1

32.3

32.5

32.7

32.9

33.1

33.3

MRR

@20

(%)

Yoochoose 1/64

(a)

1 2 3 4 5 6 7 8length

32.0

32.2

32.4

32.6

32.8

33.0

33.2

33.4

33.6

MRR

@20

(%)

Yoochoose 1/4

(b)

1 2 3 4 5 6 7 8length

18.218.418.618.819.019.219.419.619.820.020.220.4

MRR

@20

(%)

Diginetica

(c)

1 2 3 4 5 6 7 8length

9.0

9.2

9.4

9.6

9.8

10.0

10.2

10.4M

RR@

20(%

)Nowplaying

(d)

Fig. 3. The impact of the length of GBPs in terms of MRR@20.

distinct GBPs grows factorially with the length of the session. As the GBP plays an important role inRNMSR, it is necessary to exploit the impact of the length of GBPs. In this section, we evaluate theperformance of RNMSR when the maximum length of GBP is set from 1 to 8 over four datasets.The experiments results are shown in Fig 3, from where we have the following observations,

• When the maximum length of GBP is set to 1, it is equivalent to removing the GBP fromRNMSR as there is only one pattern "A" left, where the performance of the model is notsatisfactory. By increasing the length of GBP to 2 or 3, the performance of RNMSR onfour datasets has been significantly improved, which demonstrates that even short length ofGBP (e.g., “𝐴→ 𝐵→ 𝐴" and “𝐴→ 𝐵→ 𝐶") can have an obvious effect on RNMSR. Thisshows that GBP makes the RNMSR more robust to handle different lengths of sessions.• RNMSR obtains the best performance when the maximum length of GBP is set 6 on Yoo-

choose 1/64 and 7 on Diginetica. However, we can observe that the performance of RNMSRdeteriorates when the length is further increased. This may because with the growth of themaximum length of GBP, the number of GBPs will increase while the average number ofsessions in each GBPs will decrease, which makes it difficult for the model to capture thefeatures of different GBPs. Thus, setting GBP to 6-7 is a suitable choice for most datasets inSBR task.



0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.071.4

71.5

71.6

71.7

71.8

71.9

72.0

72.1

72.2

Reca

ll@20

(%)

Yoochoose 1/64

(a)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.071.7

71.8

71.9

72.0

72.1

72.2

72.3

72.4

Reca

ll@20

(%)

Yoochoose 1/4

(b)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.054.0

54.1

54.2

54.3

54.4

54.5

54.6

54.7

54.8

Reca

ll@20

(%)

Diginetica

(c)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.022.1

22.2

22.3

22.4

22.5

22.6

22.7

22.8

22.9

23.0Re

call@

20(%

)Nowplaying

(d)

Fig. 4. The impact of threshold [ in terms of Recall@20.

5.5 Impact of threshold [

[ is an important hyper-parameter for controlling the topological structure of the similarity-baseditem-pairwise session graph. The number of neighbors of each node in the graph will increasewhen [ is set smaller, which means each node can obtain more effective context information whileintroducing more noise during GNN process. And the bigger [ indicates that each node tends toincorporate less neighbor information and maintain its own features more in the propagation of GNN.To better evaluate the impact of [ on the proposed method, extra experiments are conducted onYoochoose, Diginetica, and Nowplaying. 6

The results are shown in Figure 4, it can be observed that when the [ is close to 1, the performanceof RNMSR becomes worse on both datasets, as each item has only a few neighbors and loses thecontextual information within the session. Moreover, the model does not perform well when [ isset close to 0 on Diginetica dataset, because there will be too much connection noise. The modelachieves satisfactory performance when [ is set from 0 to 0.2 on Yoochoose 1/64 and Yoochoose 1/4,and from 0.3 to 0.5 on Diginetica and Nowpalying, respectively.

6The range of similarity weight 𝑠𝑖𝑚 (𝑖, 𝑗) is [−1, 1] as it is calculated by cosine similarity function, and we evaluate theimpact of [ from 0 to 1.0.



Behavior Pattern

Attention Weight in

Repeat Module

𝐀 → 𝐁 → 𝐀 → 𝐂 → 𝐂 → 𝐁

Attention Weight in

Discriminate Module

𝐀 → 𝐁 → 𝐂 → 𝐀 → 𝐃 → 𝐂 𝐀 → 𝐁 → 𝐁 → 𝐂 → 𝐃 → 𝐄

0.34 0.66 0.84 0.160.28 0.72

𝒘𝒆𝒊𝒈𝒉𝒕𝒓𝒆𝒑𝒆𝒂𝒕 𝒘𝒆𝒊𝒈𝒉𝒕𝒏𝒆𝒘 𝒘𝒆𝒊𝒈𝒉𝒕𝒓𝒆𝒑𝒆𝒂𝒕 𝒘𝒆𝒊𝒈𝒉𝒕𝒏𝒆𝒘 𝒘𝒆𝒊𝒈𝒉𝒕𝒓𝒆𝒑𝒆𝒂𝒕 𝒘𝒆𝒊𝒈𝒉𝒕𝒏𝒆𝒘

Fig. 5. Visual results of RNMSR on Yoochoose 1/4.

5.6 Visualization of RNMSRAs we only use the target items as labels in loss function to update the parameters \ , and thereis no explicit constraint to the learned weight (e.g., 𝑤𝑒𝑖𝑔ℎ𝑡𝑟𝑒𝑝𝑒𝑎𝑡 and 𝑤𝑒𝑖𝑔ℎ𝑡𝑛𝑒𝑤) and the learnedembedding (e.g., uP𝐺 (S) ), it is worth exploring that whether RNMSR has captured the hidden featuresof group-level behavior and learned the inherent importance of each item within the session. Thusin this section, we provide visual results to detailed analyze the effect of GBP. Here we choosethree example patterns from Table 1 (i.e., pattern 2, pattern 3, and pattern 6) on Yoochoose 1/4. Toavoid the impact of item features and better show the effect of GBP, the session representations(i.e., s𝑑 in Eq. 11) and item representations (i.e., h in Eq. 16) are set as zero vectors during thevisualization experiments. The visual results are presented in Figure 5, from where we have thefollowing observations,

• Firstly, from Attention Weight for Repeat Module we can observe that different behavior patternshave different probability distributions for items within the session, where the attention weightsRNMSR learns are consistent with the statistical results in Table 1. For example, the itemwith highest repeating probability is "A" (39%) in pattern “𝐴 → 𝐵 → 𝐴 → 𝐶 → 𝐶 → 𝐵”from Table 1, and in the repeat module of RNMSR, item "A" obtains the highest attentionweight (0.41) compare to item "B" (0.25) and item "C" (0.34). It shows that RNMSR can learnthe inherent importance of each item in the repeat module by incorporating the group-levelbehavior pattern representations and position vectors.• Secondly, from Attention Weight for Discriminate Module it can be observed that different

behavior patterns show different attention weights on the repeat mode and explore mode.Specifically, for pattern “𝐴 → 𝐵 → 𝐴 → 𝐶 → 𝐶 → 𝐵”, RNMSR pays more attention to𝑤𝑒𝑖𝑔ℎ𝑡𝑟𝑒𝑝𝑒𝑎𝑡 (0.72) while it assign higher weight to 𝑤𝑒𝑖𝑔ℎ𝑡𝑛𝑒𝑤 (0.84) for pattern “𝐴→ 𝐵→ 𝐵

→ 𝐶→ 𝐷→ 𝐸”. And in Table 1, pattern “𝐴→ 𝐵→ 𝐴→ 𝐶→ 𝐶→ 𝐵” has a high probabilityto be a repeat mode (83%), while pattern “𝐴→ 𝐵→ 𝐵→ 𝐶 → 𝐷 → 𝐸” is more likely to be aexplore mode (68%). It demonstrates that RNMSR has capture the hidden features of differentgroup-level behavior patterns, which is effective to predict the switch probabilities between therepeat mode and the explore mode.



Table 6. Performance of RNMSR with different dropout rates on three datasets.

Dropout rateYoochoose 1/64 Yoochoose 1/4 Diginetica Nowplaying

P@20 MRR@20 P@20 MRR@20 P@20 MRR@20 P@20 MRR@20

p = 0 71.90 32.89 72.22 33.43 54.03 19.67 22.32 10.09p = 0.25 72.11 33.01 71.93 32.97 54.66 20.00 22.84 10.26p = 0.5 71.94 32.73 71.30 32.22 54.33 20.06 22.54 9.85

p = 0.75 70.61 31.69 68.99 29.98 51.83 19.73 20.49 8.87

5.7 Impact of dropout rateDropout is an effective technique to prevent overfitting [12, 45], the core idea of which is to randomlyremove neurons from the network with a certain probability 𝑝 during training while employing allneurons for testing.

Table 6 shows the impact of the dropout rate on RNMSR over four datasets. It can be observedthat RNMSR obtains the best performance when the dropout rate is set to 0 on Yoochoose 1/4, whilefor the other three datasets RNMSR perfroms better when the dropout rate is set to 0.25. This isbecause the size of training data of Yoochoose 1/64, Diginetica, and Nowplaying is smaller thanYoochoose 1/4, which means RNMSR is easier to be overfitting on these datasets. By setting 𝑝 to0.25 and 0.5, dropout can prevent RNMSR from overfitting and obtain better performance. And theperformance of RNMSR starts to deteriorate when the dropout rate is set to 0.75 over four datasets,as it is hard to learn with limited available neurons. Thus it is suitable to set the dropout rate to 0.25when the size of the dataset is small while to 0 when the size of the dataset is big enough.

6 CONCLUSIONIn this paper, we study the problem of SBR, which is a practical but challenging task yet. Byintroducing group-level behavior patterns into SBR, we present a new perspective to learn the group-level preference from sessions. Specifically, we incorporate group-level behavior patterns into thediscriminate module to learn the switch probabilities between the repeat mode and the explore mode.And the inherent importance of each item is learned by combining group-level behavior patterns andposition vectors. Moreover, we propose an instance-level item representation learning layer, where asimilarity-based item-pairwise session graph is built to capture the dependencies within the session.RNMSR is able to model the behavior of users in both instance-level and group-level based on theinstance-level item representation learning layer and group-level behavior pattern.

We conduct extensive experiments on Yoochoose, Diginetica and Nowplaying datasets to validatedthe effectiveness of the proposed RNMSR. First, the proposed model outperforms state-of-the-art baselines (e.g., RepeatNet, SR-GNN, and GCE-GNN) in terms of Recall@N, MRR@N, andNDCG@N. Second, we validated the impact of each component by an ablation study, where theexperiment results show that each component does help the improvement of RNMSR. As GBP is animportant component of RNMSR, we also conduct an experiment to learn the impact of the length ofGBPs, where we find that the short length of GBPs also shows effectiveness for SBR. Furthermore,a visualization experiment is conducted to help explore whether RNMSR has captured the latentfeatures of various GBPs.

In this work, we mainly focus on the effect of GBPs on repeat consumption behavior. And in futurework, we will explore the impact of GBPs on the explore module, by further mining the commonpreferences of user groups on new items. Besides, in this paper, we treat each group-level behaviorpattern as a separated individual, and a promising direction is to model the relevance of different



GBPs to better learn the features of various GBPs. We also plan to introduce more graph neuralnetwork techniques into SBR to better obtain the item representations.

ACKNOWLEDGMENTSThis work was supported in part by the National Natural Science Foundation of China under GrantNo.61602197 and Grant No.61772076.

REFERENCES[1] Eytan Adar, Jaime Teevan, and Susan T Dumais. 2008. Large scale analysis of web revisitation patterns. In Proceedings

of the SIGCHI conference on Human Factors in Computing Systems. 1197–1206.[2] Ashton Anderson, Ravi Kumar, Andrew Tomkins, and Sergei Vassilvitskii. 2014. The dynamics of repeat consumption.

In Proceedings of the 23rd international conference on World wide web. 419–430.[3] Rahul Bhagat, Srevatsan Muralidharan, Alex Lobzhanidze, and Shankar Vishwanath. 2018. Buy it again: Modeling

repeat purchase recommendations. In Proceedings of the 24th ACM SIGKDD International Conference on KnowledgeDiscovery & Data Mining. 62–70.

[4] Jun Chen, Chaokun Wang, and Jianmin Wang. 2015. Will you" reconsume" the near past? fast prediction on short-termreconsumption behaviors. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.

[5] Tianwen Chen and Raymond Chi-Wing Wong. 2020. Handling Information Loss of Graph Neural Networks forSession-based Recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on KnowledgeDiscovery & Data Mining. 1172–1180.

[6] Wanyu Chen, Fei Cai, Honghui Chen, and Maarten de Rijke. 2019. A dynamic co-attention network for session-basedrecommendation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management.1461–1470.

[7] David Elsweiler, Mark Baillie, and Ian Ruthven. 2011. What makes re-finding information difficult? a study of emailre-finding. In European Conference on Information Retrieval. Springer, 568–579.

[8] David Elsweiler, Morgan Harvey, and Martin Hacker. 2011. Understanding re-finding behavior in naturalistic emailinteraction logs. In Proceedings of the 34th international ACM SIGIR conference on Research and development inInformation Retrieval. 35–44.

[9] Diksha Garg, Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2019. Sequence and time aware neigh-borhood for session-based recommendations: Stan. In Proceedings of the 42nd International ACM SIGIR Conference onResearch and Development in Information Retrieval. 1069–1072.

[10] Cheng Guo, Mengfei Zhang, Jinyun Fang, Jiaqi Jin, and Mao Pan. 2020. Session-based Recommendation withHierarchical Leaping Networks. In Proceedings of the 43rd International ACM SIGIR Conference on Research andDevelopment in Information Retrieval. 1705–1708.

[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[12] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifyingand powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIRConference on Research and Development in Information Retrieval. 639–648.

[13] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based recommendationswith recurrent neural networks. In ICLR.

[14] Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, and Deborah Estrin. 2017. Collaborativemetric learning. In Proceedings of the 26th international conference on world wide web. 193–201.

[15] Haoji Hu, Xiangnan He, Jinyang Gao, and Zhi-Li Zhang. 2020. Modeling personalized item frequency informationfor next-basket recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research andDevelopment in Information Retrieval. 1071–1080.

[16] Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-basedrecommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems. 306–310.

[17] Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n recommendersystems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining.659–667.

[18] Komal Kapoor, Karthik Subbian, Jaideep Srivastava, and Paul Schrater. 2015. Just in time recommendations: Modelingthe dynamics of boredom in activity streams. In Proceedings of the eighth ACM international conference on web searchand data mining. 233–242.

[19] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).



[20] Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-basedrecommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.1419–1428.

[21] Xingchen Li, Xiang Wang, Xiangnan He, Long Chen, Jun Xiao, and Tat-Seng Chua. 2020. Hierarchical fashion graphnetwork for personalized outfit recommendation. In Proceedings of the 43rd International ACM SIGIR Conference onResearch and Development in Information Retrieval. 159–168.

[22] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated graph sequence neural networks. InICLR.

[23] Moshe Lichman and Padhraic Smyth. 2018. Prediction of sparse user-item consumption rates with Zero-Inflated Poissonregression. In Proceedings of the 2018 World Wide Web Conference. 719–728.

[24] Jie Liu, Chun Yu, Wenchang Xu, and Yuanchun Shi. 2012. Clustering web pages to facilitate revisitation on mobiledevices. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces. 249–252.

[25] Qiao Liu, Yifu Zeng, Refuoe Mokhosi, and Haibin Zhang. 2018. STAMP: short-term attention/memory priority modelfor session-based recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on KnowledgeDiscovery & Data Mining. 1831–1839.

[26] Anjing Luo, Pengpeng Zhao, Yanchi Liu, Fuzhen Zhuang, Deqing Wang, Jiajie Xu, Junhua Fang, and Victor S. Sheng.2020. Collaborative Self-Attention Network for Session-based Recommendation. In IJCAI.

[27] Wenjing Meng, Deqing Yang, and Yanghua Xiao. 2020. Incorporating user micro-behaviors and item knowledge intomulti-task learning for session-based recommendation. In Proceedings of the 43rd International ACM SIGIR Conferenceon Research and Development in Information Retrieval. 1091–1100.

[28] Zhiqiang Pan, Fei Cai, Wanyu Chen, Honghui Chen, and Maarten de Rijke. 2020. Star Graph Neural Networks forSession-based Recommendation. In Proceedings of the 29th ACM International Conference on Information & KnowledgeManagement. 1195–1204.

[29] Zhiqiang Pan, Fei Cai, Yanxiang Ling, and Maarten de Rijke. 2020. Rethinking Item Importance in Session-basedRecommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development inInformation Retrieval. 1837–1840.

[30] Ruihong Qiu, Zi Huang, Jingjing Li, and Hongzhi Yin. 2020. Exploiting Cross-session Information for Session-basedRecommendation with Graph Neural Networks. ACM Transactions on Information Systems (TOIS) 38, 3 (2020), 1–23.

[31] Ruihong Qiu, Jingjing Li, Zi Huang, and Hongzhi Yin. 2019. Rethinking the item order in session-based recommendationwith graph neural networks. In Proceedings of the 28th ACM International Conference on Information and KnowledgeManagement. 579–588.

[32] Ruihong Qiu, Hongzhi Yin, Zi Huang, and Tong Chen. 2020. Gag: Global attributed graph neural network for streamingsession-based recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research andDevelopment in Information Retrieval. 669–678.

[33] Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and Paolo Cremonesi. 2017. Personalizing session-basedrecommendations with hierarchical recurrent neural networks. In proceedings of the Eleventh ACM Conference onRecommender Systems. 130–137.

[34] Pengjie Ren, Zhumin Chen, Jing Li, Zhaochun Ren, Jun Ma, and Maarten De Rijke. 2019. Repeatnet: A repeat awareneural recommendation machine for session-based recommendation. In Proceedings of the AAAI Conference on ArtificialIntelligence, Vol. 33. 4806–4813.

[35] Ruiyang Ren, Zhaoyang Liu, Yaliang Li, Wayne Xin Zhao, Hui Wang, Bolin Ding, and Ji-Rong Wen. 2020. Sequentialrecommendation with self-attentive multi-adversarial network. In Proceedings of the 43rd International ACM SIGIRConference on Research and Development in Information Retrieval. 89–98.

[36] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains fornext-basket recommendation. In Proceedings of the 19th international conference on World wide web. 811–820.

[37] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommenda-tion algorithms. In Proceedings of the 10th international conference on World Wide Web. 285–295.

[38] Guy Shani, David Heckerman, Ronen I Brafman, and Craig Boutilier. 2005. An MDP-based recommender system.Journal of Machine Learning Research 6, 9 (2005).

[39] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simpleway to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958.

[40] Jaime Teevan, Eytan Adar, Rosie Jones, and Michael Potts. 2006. History repeats itself: repeat queries in Yahoo’s logs.In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in informationretrieval. 703–704.

[41] Sarah K Tyler and Jaime Teevan. 2010. Large scale query log analysis of re-finding. In Proceedings of the third ACMinternational conference on Web search and data mining. 191–200.



[42] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and IlliaPolosukhin. 2017. Attention is all you need. In NIPS.

[43] Chenyang Wang, Min Zhang, Weizhi Ma, Yiqun Liu, and Shaoping Ma. 2019. Modeling item-specific temporaldynamics of repeat consumption for recommender systems. In The World Wide Web Conference. 1977–1987.

[44] Meirui Wang, Pengjie Ren, Lei Mei, Zhumin Chen, Jun Ma, and Maarten de Rijke. 2019. A collaborative session-basedrecommendation approach with parallel memory modules. In Proceedings of the 42nd International ACM SIGIRConference on Research and Development in Information Retrieval. 345–354.

[45] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural graph collaborative filtering. InProceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval.165–174.

[46] Xiao Wang, Meiqi Zhu, Deyu Bo, Peng Cui, Chuan Shi, and Jian Pei. 2020. Am-gcn: Adaptive multi-channel graphconvolutional networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Data Mining. 1243–1253.

[47] Ziyang Wang, Wei Wei, Gao Cong, Xiao-Li Li, Xian-Ling Mao, and Minghui Qiu. 2020. Global context enhanced graphneural networks for session-based recommendation. In Proceedings of the 43rd International ACM SIGIR Conferenceon Research and Development in Information Retrieval. 169–178.

[48] Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendationwith graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.

[49] Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Xu, Fuzhen Zhuang, Junhua Fang, and XiaofangZhou. 2019. Graph Contextualized Self-Attention Network for Session-based Recommendation. In IJCAI.

[50] Feng Yu, Yanqiao Zhu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2020. TAGNN: Target attentive graph neuralnetworks for session-based recommendation. In Proceedings of the 43rd International ACM SIGIR Conference onResearch and Development in Information Retrieval. 1921–1924.

[51] Fajie Yuan, Xiangnan He, Haochuan Jiang, Guibing Guo, Jian Xiong, Zhezhao Xu, and Yilin Xiong. 2020. Future datahelps training: Modeling future contexts for session-based recommendation. In Proceedings of The Web Conference2020. 303–313.

[52] Eva Zangerle, Martin Pichl, Wolfgang Gassler, and Günther Specht. 2014. #nowplaying Music Dataset: ExtractingListening Behavior from Twitter. In Proceedings of the first international workshop on internet-scale multimediamanagement.

[53] Haimo Zhang and Shengdong Zhao. 2011. Measuring web page revisitation in tabbed browsing. In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems. 1831–1834.

[54] Sijin Zhou, Xinyi Dai, Haokun Chen, Weinan Zhang, Kan Ren, Ruiming Tang, Xiuqiang He, and Yong Yu. 2020.Interactive recommender system via knowledge graph-enhanced reinforcement learning. In Proceedings of the 43rdInternational ACM SIGIR Conference on Research and Development in Information Retrieval. 179–188.

[55] Xin Zhou, Zhu Sun, Guibing Guo, and Yuan Liu. 2020. Modelling Temporal Dynamics and Repeated Behaviors forRecommendation. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 181–193.


exploiting group-level behavior pattern for session-based

Documents