a pattern mining approach to study str ategy balance in

11
A Pattern Mining Approach to Study Strategy Balance in RTS Games The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Bosc, Guillaume; Tan, Philip; Boulicaut, Jean-Francois; Raissi, Chedy and Kaytoue, Mehdi. “A Pattern Mining Approach to Study Strategy Balance in RTS Games.” IEEE Transactions on Computational Intelligence and AI in Games 9, no. 2 (June 2017): 123–132 © 2017 Institute of Electrical and Electronics Engineers (IEEE) As Published https://doi.org/10.1109/TCIAIG.2015.2511819 Publisher Institute of Electrical and Electronics Engineers (IEEE) Version Author's final manuscript Citable link http://hdl.handle.net/1721.1/110366 Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Upload: others

Post on 18-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

A Pattern Mining Approach to StudyStrategy Balance in RTS Games

The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.

Citation Bosc, Guillaume; Tan, Philip; Boulicaut, Jean-Francois; Raissi,Chedy and Kaytoue, Mehdi. “A Pattern Mining Approach toStudy Strategy Balance in RTS Games.” IEEE Transactions onComputational Intelligence and AI in Games 9, no. 2 (June 2017):123–132 © 2017 Institute of Electrical and Electronics Engineers(IEEE)

As Published https://doi.org/10.1109/TCIAIG.2015.2511819

Publisher Institute of Electrical and Electronics Engineers (IEEE)

Version Author's final manuscript

Citable link http://hdl.handle.net/1721.1/110366

Terms of Use Article is made available in accordance with the publisher'spolicy and may be subject to US copyright law. Please refer to thepublisher's site for terms of use.

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 9, NO. 2, JUNE 2017 123

A Pattern Mining Approach to Study StrategyBalance in RTS Games

Guillaume Bosc, Philip Tan, Jean-François Boulicaut, Chedy Raïssi, and Mehdi Kaytoue

Abstract—Whereas purest strategic games such as Go and Chessseem timeless, the lifetime of a video game is short, influenced bypopular culture, trends, boredom, and technological innovations.Even the important budget and developments allocated by editorscannot guarantee a timeless success. Instead, novelties and cor-rections are proposed to extend an inevitably bounded lifetime.Novelties can unexpectedly break the balance of a game, as play-ers can discover unbalanced strategies that developers did not takeinto account. In the new context of electronic sports, an impor-tant challenge is to be able to detect game balance issues. In thispaper, we consider real-time strategy (RTS) games and presentan efficient pattern mining algorithm as a basic tool for gamebalance designers that enables one to search for unbalanced strate-gies in historical data through a knowledge discovery in databases(KDD) process. We experiment with our algorithm on StarCraft IIhistorical data, played professionally as an electronic sport.

Index Terms—Mining methods and analysis, video game.

I. INTRODUCTION

T HE recent and fast development of the video game indus-try has been catalyzed by technological innovations, a

democratized access to connected electronic devices, new eco-nomic models (free games where users may pay for extracontents), and recently with competitive gaming (esports) andvideo game live streaming platforms [1]. People not onlyenjoy playing, but also enjoy learning from watching othersperforming, as a daily leisure activity [2], [3].

Producing a video game is an expensive process, thus, keysto a massive, immediate, and durable success are sought.Pragmatically, however, one attempts to extend the game life-time after the release, by correcting bugs, introducing newfeatures, and considering user feedbacks. Whereas it could beeasily argued that bugs are not acceptable after a release, it ishard to predict the results of human creativity in presence ofrich environments that are video games.

Hopefully, companies realized in the current big data atmo-sphere that the tremendous amounts of game behavioral datathey store are valuable to face many new challenges suchas: detecting unexpected situations and bugs [4] cheaters [5],

Manuscript received February 07, 2015; revised December 16, 2015;accepted December 21, 2015. Date of publication December 23, 2015; dateof current version June 14, 2017. This work was supported in part by the MICNRS Mastodons program.

G. Bosc, J.-F. Boulicaut, and M. Kaytoue are with the Université de Lyon,CNRS, INSA-Lyon, LIRIS, UMR5205, Villeurbanne F-69621, France (e-mail:[email protected]).

P. Tan is with the Game Lab, Massachusetts Institute of Technology (MIT),Cambridge, MA 02139 USA.

C. Raïssi is with LORIA, CNRS, Inria NGE, Université de Lorraine,Vandoeuvre-lès-Nancy F-54506, France.

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCIAIG.2015.2511819

designing artificial agents [6], improving match making sys-tems, and adjusting game difficulty [7]. Analyzing these mas-sive sets of historical data by means of visualization, machinelearning, and data mining techniques is at the heart of videogame analytics for enhancing user experience and extendinggame lifetime [4].

This context roots the motivation of our work: behavioraldata can help to study the balance of a game, that is, to adjustthe rules over time while still enabling novel rules to counterboredom. This is especially important for games played as anelectronic sport, and also for game lifetime extension in general.We will focus on the concept of balance which is a core conceptin competitive game design: it consists of defining and tuningthe basic rules that prevent extreme situations, thus balancingfairness and competitive aspects.

In this paper, we define and mine patterns in game historicaldata for a better understanding of balance issues in real-time strategy (RTS) games. The rationale behind the balancedsequential pattern discovery problem is the following. Considera set of games, each of them represented by a sequence ofactions of two players (thus entailing the player strategy). Theproblem is to find patterns as sequence generalizations that fre-quently occur in the historical data and whose balance is givenby proportions of their wins and losses. Our goal is twofold:1) we give the basic algorithmic tools that enable an efficientpattern mining; and 2) we show that the extracted patternsreveal interesting knowledge.

1) We revisit the problem of strategy elicitation from twoplayer RTS games by differentiating two cases: whenboth players have access to a) different game actions andb) the same game actions. In the first case, we show howexisting pattern mining methods enable, with slight mod-ifications, to discover frequent strategies and compute abalance measure. In the second case, the most general,existing approaches fail: we propose an original algorithmBALANCESPAN.

2) We show through experiments that BALANCESPAN isscalable and able to discover patterns of interest in alarge StarCraft II data set that can help detecting bal-ance issues. For that matter, we anchor our algorithm ina knowledge discovery in databases (KDD) process [8].Pattern mining is one of the many steps of this interac-tive and iterative process guided by an expert of the datadomain who selects and interprets the patterns [9], [10].

The paper is organized as follows. Section II recalls thebasics of sequential pattern mining before the introduction ofour mining problem in Section III. Our method is developed inSections IV and V. Algorithms are designed (Section VI) and

1943-068X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

124 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 9, NO. 2, JUNE 2017

TABLE ISEQUENCE DATABASE D

experimented with E-Sport data (Section VII) before overview-ing related work and concluding.

II. PRELIMINARIES

We recall the basic definitions of frequent sequential patterns[11] and emerging patterns [12] useful in the sequel. Let I bea finite set of items. Any nonempty subset X ⊆ I is called anitemset. A sequence s = ⟨X1, . . . , Xl⟩ is an ordered list of l >0 itemsets. l is the length of the sequence, whereas

∑li=1 |Xi| is

its size. Considering I as a set of events (or actions), an itemsetdenotes simultaneous events while the order between two item-sets indicates a strict preceding relation. A sequence databaseD is a set of |D| sequences over I. Sequences may have dif-ferent lengths and sizes and are uniquely identified; see Table I(omitting the third column).

Definition (Subsequence): A sequence s = ⟨X1, . . . , Xls⟩ isa subsequence of a sequence s′ = ⟨X ′

1, . . . , X′l′s⟩, denoted s ⊑

s′, if there exists 1 ≤ j1 < j2 < . . . < jls ≤ l′s such that X1 ⊆X ′

j1 , X2 ⊆ X ′j2 , . . . , Xls ⊆ X ′

jls.

Definition (Support and Frequency): The support of asequence s in a database D is sup(s,D) = {s′ ∈ D|s ⊑ s′}.Its frequency is freq(s,D) = |sup(s,D)|/|D|.

Problem (Frequent Sequential Pattern Mining): Given aminimal frequency threshold 0 < σ ≤ 1, the problem is to findall sequences s such as freq(s,D) ≥ σ.

In some cases, each sequence of D is associated to a classlabel. Let class : D → {+,−} a mapping that associates toeach sequence a positive or negative label (hence two classes).D is accordingly partitioned into two databases, with the pos-itive (respectively, negative) sequences D+ (respectively, D−)and D = D+ ∪D−, D+ ∩D− = ∅. The growth rate charac-terizes the discriminating power of a pattern for one class[12], [13].

Definition (Growth Rate): Given a sequence database D =D+ ∪D−, the growth rate of a sequential pattern from Dx toDy (x ̸= y and x, y ∈ {+,−}), is given by

growth_rate(s,Dx,Dy) =|sup(s,Dx)|

|Dx| × |Dy||sup(s,Dy)| .

Example: Let D = {s1, s2, s3, s4} with I ={a, b, c, d, e, f, g} be the sequence database given in Table I.For brevity, we drop commas and braces for singletons. For agiven sequence s = ⟨abc⟩, we have s ⊑ s1, s ⊑ s4, s ̸⊑ s2, ands ̸⊑ s3. With σ = (3/4), ⟨acc⟩ is frequent, and ⟨a{bc}a⟩ is not.We have growth_rate(⟨cb⟩,D−,D+) = (2/2)× (2/1) = 2,i.e., ⟨cb⟩ is twice more present in class − than in class +.

TABLE IIINTERACTION SEQUENCE DATABASE R

III. THE PROBLEM OF STRATEGY ELICITATION

A zero-sum game, or competitive interaction, can be mod-eled as a sequence of actions performed by two players whereexactly one player wins (no ties). Such a sequential game canbe represented as a sequence of sets of actions, each action per-formed by one of the two players. When both players play inreal time, one can describe these interactions as sequences ofitemsets. An itemset is then a set of simultaneous actions, orwithin an insignificant interval of time.

Definition [Interaction (Sequence) Database]: Given a setof players Players and a set of actions Actions, a sequencedatabase R is called an interaction database. Each sequencedenotes one single game, i.e., an interaction between two play-ers, and is defined over the set of items I = Actions× Players.A mapping class : R→ Players gives the winner of eachinteraction.

Example: In Table II, s1 can be interpreted as: “Player p1 didaction a, then he did b and c while player p2 did c, and finallyp1 did d while p2 did a. At the end, the player p1 wins.”

Given an interaction database, the problem is to findsequences of actions of both interacting players (supposing thatthose actions are mutually dependent) as generalizations thatappear frequently and to be able to characterize their discrim-inating ability for a win or loss through a so-called balancemeasure. In the current sequential pattern mining settings, thegoal is to find frequent subsequences of actions (i.e., strate-gies) and their balance (a growth-rate-like measure). However,the notion of class has to be revisited to be able to handlewinner and loser class labels, instead of the winning player.Indeed, intuitively, mining emerging patterns from an interac-tion database with the winning players as classes (as given inTable II) does not fulfill our objectives: we wish to discriminatevictories and not victorious players themselves. As such, exist-ing emerging sequential pattern mining methods and algorithmscannot be used to answer our problem.

We propose to differentiate two cases of interactiondatabases: 1) nonmirror interaction databases where both play-ers have different (nonintersecting) sets of available actions;2) mirror interaction databases where both players can per-form the same actions. We show that in the first case, emergingpatterns as introduced in the literature (Section II) are ableto answer the problem by slightly modifying the interactiondatabase representation. In the second case, the most generalone, we need new settings, and we propose to embed theclass (positive or negative) in the definition of the items ofa sequence; see Table IV. This is formalized in the two nextsections, and it enables the design of efficient pattern miningalgorithms in Section VI.

BOSC et al.: A PATTERN MINING APPROACH TO STUDY STRATEGY BALANCE IN RTS GAMES 125

TABLE IIINONMIRROR INTERACTION DATABASE

TABLE IVSIGNED INTERACTION DATABASE

IV. BALANCED PATTERNS IN NONMIRROR DATABASES

In this section, we consider an interaction database, called anonmirror database, where the set of actions is different for eachof the players in a single interaction. It means that we only havetwo types of players in each sequence and in the whole database(e.g., Protoss and Zerg factions in the RTS game StarCraft II),and these types are determined by the actions they can do. Assuch, the type can also be used as a winning class label. Tocharacterize balanced patterns in such databases, we consider asimple transformation of the original interaction database R bydropping the player associated to each action, and labeling eachsequence by the type of the winner. This enables to express abalance measure as a growth-rate measure in this new data rep-resentation. The transformed database is then formally definedas follows.

Definition (Transformed Interaction Database): A sequencedatabase T defined over the set of items (actions) I1 ∪ I2such as I1 ∩ I2 = ∅ and class : T → {I1, I2} is called a trans-formed interaction database.

Consider an interaction s ∈ T where the winner is charac-terized by the actions I1: we have class(s) = I1 that gives thewinner of the interaction. This brings back the problem of find-ing frequent balanced interaction patterns to the well-knownemerging pattern settings. Indeed, consider an arbitrary patterns over I1 ∪ I2: its support in the whole database sup(s, T ) tellsus its frequency, while sup(s, T I1) and sup(s, T I2) enable todefine a balance measure as a growth rate.

Problem (Mining Balanced Patterns From NonmirrorInteractions): Let T be a transformed database obtained froma nonmirror interaction database R. T is defined over I1 ∪ I2where Ik represents the type k of player (k ∈ {1, 2}) andclass : T → {I1, I2} assigns to any sequence its winner type.σ is a minimum frequency threshold. The problem is to extractthe set of so-called frequent balanced patterns Ft such as forany s ∈ Ft, freq(s, T I1) ≥ σ and freq(s, T I2) ≥ σ (implying

freq(s, T ) ≥ σ) and the balance measure is computed andgiven by

balance(s, T k) =|sup(s, T k)|

|sup(s, T 1)|+ |sup(s, T 2)| .

Remark: The balance measure is a normalized version of thegrowth rate given in previous section such that balance(·) ∈]0, 1] and balance(s, T 1) + balance(s, T 2) = 1 which entailsa zero-sum game property.

Example: Table III gives a transformed interaction databaseT , obtained from a nonmirror interaction database R,with I1 = {a, b} and I2 = {c} being the sets of actionsof each player type. With σ = 0.2, s = ⟨{ab}{c}⟩ is afrequent balanced pattern since freq(s, T I1) = (2/3) andfreq(s, T I2) = (1/2). Moreover, balance(s, T I1) = (2/3)and balance(s, T I2) = (1/3). It means that s wins two timesmore for the type 1 player than for the type 2 player.

V. BALANCED PATTERNS IN MIRROR DATABASES

In this section, we consider interaction sequence databaseswhere the players have access to the same set of actions.Consequently, the latter cannot be partitioned into two sets andthe previous approach cannot apply. We propose a new inter-action database representation: signed interaction databases. Itenables to define frequent balanced patterns from an arbitraryinteraction database (either mirror or nonmirror).

Definition (Signed Interaction Database): Recall thatActions is a finite set of actions shared by both players. Weintroduce Is = Actions× {+,−} denoting actions associatedeither to a positive class or a negative class. A signed databaseS is built from an interaction database R as follows: Eachaction of an interaction sequence is signed + if it is performedby the winner and signed − if performed by the loser (bothplayers and class labels are dropped).

Definition (Dual of an Item, an Itemset, and a Sequence):Let Is = Actions× {+,−} be the set of signed items, oractions. For any (a, c) ∈ Is, also written ac, we define its dualas

dual(a, c) = dual(ac) = (a, {+,−}\c) = a{+,−}\c.

Informally, it means that the dual of a signed action is the sameaction where the class c has changed. This definition is simplypropagated for itemsets and sequences of itemsets, for any X ⊆Is and any s = ⟨X1, X2, . . .⟩ a sequence over Is

dual(X) = {dual(x), ∀x ∈ X}dual(s) = ⟨dual(X1), dual(X2), . . .⟩.

Example: In Table IV, we have Is = {a, b, c, d}× {+,−},dual(a+) = a−, and dual(s1) = ⟨a−{b−c+}⟩.

These definitions enable now to naturally introduce a bal-ance measure that would, for a sequential pattern s, give theproportion of its support among the support of both itself andits dual.

Definition (Balance Measure): Let s be a frequent sequentialpattern in a database S . The balance measure of s is

balance(s) =|sup(s,S)|

|sup(s,S)|+ |sup(dual(s),S)| . (1)

126 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 9, NO. 2, JUNE 2017

This intuitive definition, however, does not hold. Since actionsare shared by the two players, both a subsequence and itsdual may occur in a single sequence s ∈ S . Consider thefollowing example: S = {⟨{a+b−}{a−b+}⟩, ⟨{a−b+}⟩}with σ = (1/2). The sequence s = ⟨{a+b−}⟩ is a fre-quent sequential pattern, and |sup(s,S)| = 1. We have also|sup(dual(s),S)| = |sup(dual(⟨{a−b+}⟩),S)| = 2. Hence,balance(s) = (1/(1 + 2)) = (1/3). However, since s anddual(s) both appear in the first sequence, it should not becounted two times. This leads us to the definition of the balancemeasure in the general case in which we ignore sequenceswhere both a pattern and its dual appear.

Definition (Generalized Balance Measure): For a sequentialpattern s, the generalized balance measure is given by

balancegen(s) =|sup(s,S)\sup(dual(s),S)||sup(s,S) 0 sup(dual(s),S)| (2)

where 0 denotes the exclusive union A 0B = (A ∪B)\(A ∩B). In the following, balance will always refer to the generalversion. We have that balance(s) ∈ [0, 1] and balance(s) +balance(dual(s)) = 1 which expresses a zero-sum gameproperty.

Problem (Mining Balanced Patterns from SignedInteractions): Let S be a signed interaction database definedover Is generated from an interaction database R, and σa minimum frequency threshold. The problem is to extractthe set of so-called frequent balanced patterns Fs such asfor s ∈ Fs, freq(s,S) ≥ σ, freq(dual(s),S) ≥ σ and thebalance measure is computed and given by (2). Furthermore,the fact that both s and dual(s) have to be frequent leadsto redundant information: it is enough to keep s along withits support, balance measure, and intersection of supportcommon(s) = |sup(s,S) ∩ sup(dual(s,S))| to know themeasures of its dual. As such, the problem is also to compute anonredundant collection of patterns Fs where, if s ∈ Fs, thendual(s) ̸∈ Fs.

Example: Table IV gives a signed interaction databaseS obtained from an arbitrary R. With σ = (1/4), s =⟨a+c−⟩ appears two times, its dual appears only once, hencebalance(s) = (2/3).

Remark: Any interaction database, mirror or nonmirror, canbe represented as a signed interaction database. For the nonmir-ror case, one can easily prove that for any balanced pattern s,we have common(s) = ∅ and thus (1) holds.

VI. ALGORITHMS

We present several algorithms to extract frequent bal-anced patterns from interaction databases. We introduce firsta well-known framework for mining frequent sequential pat-terns, called PATTERN GROWTH and its associated algorithmPREFIXSPAN[11].

A. The PREFIXSPAN Algorithm

Given a sequence database D over items I and a mini-mum frequency threshold σ, PREFIXSPAN outputs all frequent

sequential patterns and only them [11]. First, the database D isscanned once to find all the frequent items from I, called size-1 sequential patterns. Second, each of these prefixes is used todivide the search space: for one prefix, say ⟨a⟩ (and a ∈ I),one retains only sequences of D containing a and only keepsfor each of these sequences the longest suffix starting by a.The set of the prefixes sequences of the remaining sequencesis called a projected database with respect to prefix ⟨a⟩, writtenD|⟨a⟩. Third, this projected database is scanned to generate thesize-2 sequential patterns having ⟨a⟩ as prefix. The process isrecursively applied leading to a tree structure where each noderepresents a frequent sequential pattern (associated with a pro-jected database of a least ⌈σ × |D|⌉ sequences) and an edge toan extension: the item added to a size-k sequential pattern togenerate a size-(k + 1) sequential pattern. For a prefix s and anitem a, two kinds of extensions are considered: appending a asa new suffix itemset of s, noted s◦sa, and appending a withinthe last itemset of s, written s◦ia (◦ denotes an extension ingeneral). At the end, the pattern tree structure is explored andeach node outputs a pattern.

Example: We briefly illustrate PREFIXSPAN on the toyexample of Table I with σ = 0.5. A larger example is avail-able in the original publication of PREFIXSPAN[11]. Thefirst step of PREFIXSPAN consists of finding frequent itemfrom D: ⟨a⟩ (|sup(⟨a⟩,D)| = 4), ⟨b⟩ (|sup(⟨b⟩,D)| = 4),⟨c⟩ (|sup(⟨c⟩,D)| = 4), ⟨d⟩ (|sup(⟨d⟩,D)| = 3), ⟨e⟩(|sup(⟨e⟩,D)| = 3) and ⟨f⟩ (|sup(⟨f⟩,D)| = 3). For eachof these previous frequent sequential patterns s of size 1,PREFIXSPAN projects D into a projected database with prefixs. Thus, the ⟨a⟩ − projected database D|⟨a⟩ is composedof four sequences: ⟨{abc}{ac}d{cf}⟩, ⟨{_d}c{bc}{ae}⟩,⟨{_b}{df}cb⟩, and ⟨{_f}cbc⟩. Then, PREFIXSPAN

searches for frequent sequential patterns of size 2: ⟨aa⟩(|sup(⟨aa⟩,D|⟨a⟩)| = 2), ⟨ab⟩ (|sup(⟨ab⟩,D|⟨a⟩)| = 4), ⟨(ab)⟩(|sup(⟨(ab)⟩,D|⟨a⟩)| = 2), ⟨ac⟩ (|sup(⟨ac⟩,D|⟨a⟩)| = 4), ⟨ad⟩(|sup(⟨ad⟩,D|⟨a⟩)| = 2), and ⟨af⟩ (|sup(⟨af⟩,D|⟨a⟩)| = 2).Then, for each of these sequential patterns of size 2,PREFIXSPAN creates the projected databases and extractsfrequent patterns of size 3, and so on.

B. Mining Balanced Patterns With EMERGSPAN

Given a sequential database over an arbitrary I, where eachsequence is labeled by a class (among two classes) and a min-imum frequency threshold, the general problem here is to findall frequent patterns, each one provided with its growth rate.In our settings, this involves mining a nonmirror interactiondatabase R as a transformed database T where sequences arecomposed of actions of both players and the class label gives thewinner’s type. The balance measure is then exactly the growthrate as defined in the general settings and can be computed innegligible time as postprocessing [14]. Based on PREFIXSPAN,EMERGSPAN works as follows. For each sequence of thedatabase T , its class is appended as a new itemset at its end. Itensures that frequent patterns containing a class label are leavesof the pattern tree, and that two sequential patterns that differonly by their classes have the same direct parent, allowing adirect computation of the balance measure.

BOSC et al.: A PATTERN MINING APPROACH TO STUDY STRATEGY BALANCE IN RTS GAMES 127

Algorithm. PREFIXSPAN

Require: A sequence s, the s-projected DB D|s, and a thresh-old σEnsure: The frequent sequential pattern set F

1: F ← {s};2: scan D|s once, find every frequent item a such that:

a) s can be extended to (s ◦i a); orb) s can be extended to (s ◦s a)

3: if no valid a available then4: return F ;5: end if6: for all valid a do7: a) F ← F ∪ PREFIXSPAN(s ◦i a,D|s◦ia,σ); or8: b) F ← F ∪ PREFIXSPAN(s ◦s a,D|s◦sa,σ)9: end for

10: return F ;

Example: To illustrate how EMERGSPAN works, let us con-sider the previous example on the transformed interactiondatabase T of Table III. Remind that this database is obtainedfrom a nonmirror interaction database R, with I1 = {a, b}and I2 = {c} being the sets of actions of each player type.Thus, we transformed each of these sequences by append-ing the item I1 (respectively, I2) at the end if the playerwith actions in I1 (respectively, I2) won. For example, s2becomes ⟨{ab}acI1⟩. The exploration of EMERGSPAN worksthe same way as PREFIXSPAN except that it only keeps pat-terns that end up with either I1 or I2. Thus, computingbalance of pattern ⟨X1 . . . XlIj⟩ only requires to backtrackto the parent (i.e., ⟨X1 . . . Xl⟩) to get its support to evaluatethe balance. So, when EMERGSPAN deals with the sequen-tial pattern s = ⟨{ab}cI1⟩, since the last item of s is I1, itwill output this pattern, and to compute its balance measure, ituses the support of its parent node s′ = ⟨{ab}c⟩: balance(s) =((|sup(s, T )|)/(|sup(s′, T )|)) = 0.67.

C. Mining Balanced Patterns With PREFIXSPANNAIVE

Now we consider a signed interaction database S over Is anda minimal frequency threshold σ. The problem is to extract theset of frequent sequential patterns and compute their balancedmeasure. We first use the original PREFIXSPAN algorithm tobuild the frequent pattern tree structure. We need for each pat-tern to 1) compute its balance; and 2) to ensure that for apattern and its dual only one of them is outputted. We thenpropose to store for each node (equivalently sequential pattern)a pointer to its dual pattern. At the first level of the tree, anode represents a single frequent item, and there is a pointertoward its dual (if both are frequent). Recursively, when anitem is used to expand a sequential pattern p to obtain a pat-tern q, we compute the pointer toward dual(q) by searchingamong the children of dual(p). If dual(q) exists, the pattern qis outputted, dual(q) is flagged as already outputted (redundantpattern) and the process recursively continues. Otherwise, thealgorithm backtracks. In this way, q and dual(q) are never out-put together. Computing the balance for non mirror databases

Algorithm. EMERGSPAN

Require: A sequence s, the s-projected transformed DB T|s,and a minimum threshold σEnsure; The frequent sequential pattern set F

1: F ← ∅;2: if s ends with the class then3: Compute balance(s) thanks to sup(parent(s))4: F ← {s}5: return F ;6: end if7: scan T|s once, find every frequent item a such that

a) s can be extended to (s ◦i a), orb) s can be extended to (s ◦s a)

8: If no valid a available then9: return F ;

10: end if11: for all valid a do12: a) F ← F ∪ EMERGSPAN(s ◦i a, T|s◦ia,σ) or13: b) F ← F ∪ EMERGSPAN(s ◦s a, T|s◦sa,σ)14: end for15: return F ;

is straightforward since for each node/pattern we have accessto its dual support. For mirror databases, we need, however,to know common(q) which is stored for each node. The proofof the completeness and correctness of PREFIXSPANNAIVE forextracting all balanced patterns without redundancy is direct:first, PREFIXSPAN extracts all frequent patterns, thus any pat-tern s and its dual dual(s) are nodes of the pattern tree andnone can be missed; second, as we visit in the tree traversalboth a pattern s and its dual dual(s) (if frequent), we ensure noredundant patterns.

Example: PREFIXSPANNAIVE requires a signed interac-tion database. Let us consider that of Table IV with σ =1/4 (i.e., frequent items are items that only appear once).PREFIXSPANNAIVE starts with an empty sequence s = ⟨⟩(dual(⟨⟩) = ⟨⟩) and the entire database. First, it searchesfor frequent items: e.g., both a+ and a− are frequent.Thus, PREFIXSPANNAIVE flags that ⟨a−⟩ is the dual of⟨a+⟩, and calls PREFIXSPANNAIVE(⟨a+⟩,S|⟨a+⟩,σ). Then,PREFIXSPANNAIVE searches for frequent items in S|⟨a+⟩: e.g.,c− is frequent. So, it will search in the children of ⟨a−⟩ if thereis c+, but at this step, the node of ⟨a−⟩ has not been exploredyet. Thus, when ⟨a−⟩ is explored, PREFIXSPANNAIVE extractsfrequent items in S|⟨a−⟩: e.g., c+ is frequent. So it will searchamong the children of dual(⟨a−⟩) if dual(c+) exists: ⟨a−c+⟩is a frequent sequential pattern that could be outputted. Finally,PREFIXSPANNAIVE will proceed iteratively until none of thesequential patterns could be extended.

VII. MINING BALANCED PATTERNS WITH BALANCESPAN

The problem of PREFIXSPANNAIVE is that it generates botha pattern and its dual as different nodes in the pattern tree.Furthermore, it also generates nodes for patterns that are fre-quent but whose dual is not frequent. Consequently, and this is

128 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 9, NO. 2, JUNE 2017

Algorithm. PREFIXSPANNAIVE

Require: A sequence s, the s-projected signed DB S|s, and aminimum threshold σEnsure: The frequent sequential pattern set F

1: Compute balance(s) thanks to dual(s)2: F ← {s};3: scan S|s once, find every frequent item a such that

a) s can be extended to (s ◦i a), orb) s can be extended to (s ◦s a)

4: if no valid a available then5: return F ;6: end if7: for all valid a do8: a) Search for dual (s ◦i a) among children of dual (s)9: if dual(s ◦i a) exists then

10: Link dual (s ◦i a) to s ◦i a11: F ← F ∪ PREFIXSPANNAIVE (s ◦i a,S|s◦ia,σ)12: end if

or,13: b) Search for dual (s ◦s a) among children of dual (s)14: if dual (s ◦s a) exists then15: Link dual (s ◦s a) to s ◦s a16: F ← F ∪ PREFIXSPANNAIVE(s ◦i a,S|s◦sa,σ)17: end if18: end for19: return F ;

shown in the experiments (Section VII), an important amountof nodes are useless. To solve that problem, and to be sureonly nodes corresponding to balanced patterns are generated(and only them, i.e., correct and complete), we propose theBALANCESPAN approach. The general idea is the following:instead of considering each item a ∈ Is as an extension onsequence s leading to a new projected database S|s◦a andconsequently a new node in the pattern tree, we consider simul-taneously an item and its dual, hence two projected databasesS|s◦a and S|dual(s)◦dual(a) are related to a single node (thisis done for both kinds of extensions ◦i and ◦s). Thus, itensures that no redundant patterns are generated, since both asequence s and dual(s) are generated at the same node, andit allows to compute balance(s) [or balance(dual(s)] directlyif and only if both s and dual(s) are frequent. It follows thatBALANCESPAN produces a correct and complete collection offrequent balanced patterns.

Example: BALANCESPAN also requires a signed interactiondatabase, but contrary to PREFIXSPANNAIVE, it proceeds to adouble projection at a time. Let us still consider the toy dataset of Table IV with σ = 1/4. Starting with the empty sequenceand the entire data set, the first step consists of finding frequentitems: e.g., a+ is frequent. Contrary to PREFIXSPANNAIVE,BALANCESPAN directly generates the dual of ⟨a+⟩ to pro-ceed to the double projection. Thus, it checks if a− is frequentand then projects on both ⟨a+⟩ and ⟨a−⟩ at a time: i.e., itcreates a new node in the pattern tree that is related to the cou-ple (⟨a+⟩, ⟨a−⟩). The next step calls BALANCESPAN on thisnew node. So it searches for frequent items in the projected

Algorithm. BALANCESPAN

Require: A sequence s, its dual sequence s′ = dual(s), thes-projected signed DB S|s, the s′ − projected signed DB S|s′ ,and a minimum threshold σEnsure: The frequent balanced pattern set F

1: Compute balance(s)2: F ← {(s, s′)};3: scan S|s once, find every frequent item a such that

a) s can be extended to (s ◦i a), orb) s can be extended to (s ◦s a)

4: scan S|s′ once, find every frequent item b such thata) s′ can be extended to (s′ ◦i b), orb) s′ can be extended to (s′ ◦s b)

5: if no valid a or b available then6: return F;7: end if8: for all valid a and b such that a = dual(b) do9: (a) F ← F ∪ BALANCESPAN

(s ◦i a, s′ ◦i b,S|s◦ia,S|s′◦ib,σ)10: (b) F ← F ∪ BALANCESPAN

(s ◦s a, s′ ◦s b,S|s◦sa,S|s′◦sb,σ)11: end for12: return F ;

database S|⟨a+⟩: e.g., c−. Thus, it checks if dual(c−) is frequentin S|⟨a−⟩: the node containing the couple (⟨a+c−⟩, ⟨a−c+⟩) iscreated and explored in the next step.

VIII. EXPERIMENTS

A. StarCraft II in a Nutshell

We study one of the most competitive RTS games, StarCraftII (Blizzard Entertainment, 2010), successor of StarCraft:Brood War, test bed for many research in AI [6]. A gameinvolves two players, each choosing a faction among Zerg (Z),Protoss (P), and Terran (T): there are six different possiblematchups with different strategies of game. During a game, twoplayers are battling on a map (aerial view), controlling build-ings and units to gather supply, build an army with the final goalof winning by destroying the opponents forces. Such actions(training, building, moving, attacking) are done in real time.Each faction (Z, P, T) allows different units and buildings withdistinctive weaknesses and strengths following a rockpaperscis-sors principle. As such, there are mirror matchups (TvsT, PvsP,ZvsZ) and nonmirror matchups (TvsP, TvsZ, PvsZ). A strategyis hidden in large sequences of actions generated by players andcalled replays.

Played as an electronic sport, StarCraft II is regularlypatched: basic rules of the games are adjusted (properties ofunits, building times, etc.), new rules are introduced throughexpansion sets (“heart of the swarm” and “legacy of the void”).The balance design team of StarCraft II often needs to studyhistorical data, care about player feedback on Web forums, andfinally justify their choices. After quantitative experiments ofour algorithms, we will discuss the usefulness of our approachto help studying balance issues in RTS games.

BOSC et al.: A PATTERN MINING APPROACH TO STUDY STRATEGY BALANCE IN RTS GAMES 129

TABLE VDATA SETS: SEQUENCE AND ITEM COUNTS; MAX. AND AVG. SEQUENCE

SIZES (smax , savg ); MAX. AND AVG. ITEMSETS SIZES (imax , iavg )

B. Data Sets

StarCraft II replays are files that store any action performedby all players during a game, and are easily accessible on theWeb.1 We retained 91 503 games with a total of 3.19 years ofgame time. The average length of a game is about 20 min. Agame is selected if it involves a high level players (in the high-est leagues and playing at least 200 actions per minute), sincecasual (by opposition to professional) players are not able tofollow specific strategies. We divided the 91 503 replays into sixdifferent sequence data sets, one for every matchup. Buildingsare one of the key elements of a strategy, since they enable dif-ferent kinds of units production: from each replay, we derivea sequence where the items represent the buildings the playerschose to produce in real time, and itemsets denote time win-dows of 30 s. We consider only the ten first minutes of eachgame (the strategical impact of a building is less important after10 min). Table V summarizes all characteristics of the data sets.

C. Runtime Analysis and Memory Usage

We implemented our algorithms over the original C++ ver-sion of PREFIXSPAN [11], and experimented on a 1.8-GHz IntelCore i5 with 8 GB machine. Note that we released both thesource code and the data sets used in the following experiments[15].2 We discuss running times of the proposed algorithms.First, we consider the nonmirror databases given by trans-formed interaction databases T and signed databases S (sincesigned databases correspond to the general case whereas trans-formed databases are specific to nonmirror databases). For dif-ferent minimum frequency thresholds σ, we present the runtimeof PREFIXSPAN on S as a rough baseline (since it does not com-pute the balance of a pattern), and the runtimes of the three otheralgorithms on their respective data representation. It followsthat our general algorithm BALANCESPAN is the only one ableto be executed with lowest σ [Fig. 1(d)–(f)]. We report the sameresults for the general case with mirror data sets (i.e., S only forPvP, TvT, and ZvZ) in Fig. 1(a)–(c). BALANCESPAN clearly

1http://wiki.teamliquid.net/starcraft2/Replay_Websites2Data sets, source code, and scripts are available on https://github.com/

guillaume-bosc/BalanceSpan

outperforms PREFIXSPANNAIVE, its only competitor (remem-bering that PREFIXSPAN is given as baseline since it does notcompute the balances, and EMERGSPAN does not apply for mir-ror databases). Indeed, even if sometimes PREFIXSPANNAIVE

seems to have the same runtime as BALANCESPAN with highvalue of σ, it cannot reach lowest frequency thresholds σ. Notethat on the figures, missing points correspond to unterminatedruns when available memory is insufficient. Fig. 3 shows thedistribution of the length of the extracted patterns for the PvTand ZvZ data sets.

The quantity of memory used is a very important aspect ofour algorithms. Indeed, since the number of outputted patternsgrows exponentially (see Fig. 1), the memory usage becomesmore and more important. Thus, the quantity of memory neededby our algorithms should be as low as possible. In fact, the per-centage of used nodes which are created in the tree structureare sensibly different for the algorithms. Each of the proposedalgorithms builds a pattern tree in which each node representsa frequent sequential pattern, but not necessarily a frequent bal-anced pattern from Ft (or Fs). BALANCESPAN is the onlyalgorithm that creates a node, and only one, for each pattern tobe outputted (see Fig. 2): in the best cases, PREFIXSPANNAIVE

has only half of the useful nodes (by definition). This numberdrops to 10% of useful nodes for very low supports on somedata sets. For EMERGSPAN, it is worst, as only the direct pre-decessors of the leaves of the prefix tree are balanced patternsby definition.

D. Exploration of the Extracted Patterns

It is interesting to visualize the distribution of both thesupport and the balance of the patterns. Fig. 4 gives such distri-bution for data set ZvZ that enables very fast computations withlow σ (less than 5 s for σ = 0.001). There, both a pattern andits dual are presented, which allows interestingly to observe thatthe equation y = 0.5 (where y is the vertical axis) gives almosta symmetry axis. Indeed, both a pattern and its dual do not nec-essarily have the same support. One can notice that, empirically,there are high chances for a pattern with high frequency to havea fair balance around 0.5. This behavior applies for the otherdata set and is what we could expect, given the definition of thebalance measure.

It is possible to query the set of extracted patterns in var-ious ways. Indeed, the pattern mining task is related to theKDD process that aims at extracting knowledge from data [8].Data mining approaches and more precisely pattern miningapproaches are a step of the KDD process that results in pat-terns from transformed data [9]. Our work is a pattern miningapproach to study strategy balance that is applied to RTS games.Thus, one gets a set of patterns that are a generalization of thelocal strategies within the data. BALANCESPAN still requires ananalysis once the patterns are outputted.

Exploring a large collection of patterns can be done in manyways. First, as illustrated hereafter, the expert can filter the col-lection with specific constraints such as a minimum numberof itemsets, or specific items using regular expressions, etc.Second, the expert can introduce preferences as measures onthe patterns (size, length, support, balance, etc.) that he wishes

130 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 9, NO. 2, JUNE 2017

Fig. 1. Runtime and number of patterns. (a) PvP. (b) TvT. (c) ZvZ. (d) PvT. (e) PvZ. (f) TvZ.

Fig. 2. Percentage of used nodes in the PvZ data set.

Fig. 3. Cumulative distribution of the length of the patterns (number ofitemsets) for the PvT (left) and ZvZ (right) data sets.

to minimize or maximize given his goals (the so-called sky-patterns [16]). Indeed, one expert may favor highly balancedpatterns with high support (probably the standard strategies),while another could be interested in maximizing the supportwhile favoring patterns whose balance is closest to 0.5 (givinghints to possible game design problems). Finally, the discoverycan be done interactively, through an interactive algorithm (not

Fig. 4. Pattern support and balance for the ZvZ data set.

only the full KDD process [17]). The basic assumption is thatthe expert does not really know what he is looking for in thedata, and guides the pattern discovery at each step of the algo-rithm (such as sequence expansion in our case). In all cases,the pattern language must be clearly defined and efficient algo-rithms proposed to compute the measure of interest, our maincontribution.

We now provide a basic example of exploration by query-ing the pattern set. Out of the 43 610 patterns for ZvZwith σ = 0.001, we can keep the patterns involving twoplayers (containing both + and −), which returns 40 674patterns. Then, we restricted the set of patterns to thoseinvolving two specific items (RoachWarren and Spire) to getonly 290 patterns. ⟨SpawPool+, SpawnPool−, SpiCrawler+,RoachWarren+, Spire−⟩ denotes for example games whereone of the players go to air units and the second to ground units,two known openings, with balance 0.47 and support 68.

The pattern mining task can be adapted in various ways,depending on which and how the basic actions are encoded inthe sequence. Let us now sketch different scenarios.

BOSC et al.: A PATTERN MINING APPROACH TO STUDY STRATEGY BALANCE IN RTS GAMES 131

Fig. 5. Game opening support and balance for the ZvZ data set.

1) Discovering Strategy Openings: Openings are the mostwell-known strategies and executed during the first 5–10 min ofa StarCraft II game. It is expected that openings are balancedto make the game enjoyable for the casual player, competitivefor the professional, but also interesting to watch for thespectators [2]. We build our sequence databases with a set ofitems composed of tuples (building, sign, ith window), withfixed windows of 30 s by default, i.e., the ith window containsthe items performed between the ((i− 1)× 30)th second andthe ((i× 30)− 1)th second. We expect that openings are foundas the more frequent patterns and being also balanced: Fig. 5shows the complete set of patterns for the ZvZ data set whichdiffers from Fig. 4 by its skewness. We explored the resultingpatterns with a game expert. When considering another dataset (PvZ), we obtain only 591 patterns with σ = 0.05. Topfrequent patterns represent all well-known strategies: s =⟨{(Nexus,+, 5)}{(Gateway,+, 6)(PhotonCannon,+6)}⟩represents a popular Protoss strategy, no matter the strategy ofthe opponent is. It is balanced [balance(s) = 0.52].

2) Discovering Possible Balance Issues (HypothesesElicitation): The rules of the game are set by the editorsand developers. However, such rules are not always fairand balanced, and such weaknesses can only be discoveredafter weeks. We asked an expert to highlight a well-knownimbalanced strategy. The so-called “bunker rush” was used alot by Terran players against their Zerg opponents. It consistsof building in the early stage of the game a bunker nearthe opponent’s base to put his economy and developmentin difficulty. After several complaints from the StarCraft IIcommunity, the rules changed on May 10, 2012: a Zerg counterunit (the queen) has been slightly improved. Since then, thisstrategy stopped to be used for some time. Our approach shouldbe able to reflect/discover that fact: we proceed as follows. Wesplit data set TvZ into two parts: the first one, called Sbefore,contains replays that happened strictly before May 10, 2012(17 171 replays), and the second one, called Safter, contains thereplays that happened strictly after this change (6698 replays).The mining of the data set Sbefore (respectively, Safter) witha low σ = 0.05 returns 8138 patterns (respectively, 7735).According to the experts, the bunker should be built duringthe sixth window of time (between 2’30” and 3’ of the game).There are 20 (respectively, 12) patterns that involve the item(Bunker, c, 6) with c ∈ {+,−}. With Sbefore (respectively,Safter), the average value of the balance is 0.58 (respectively,0.51) with a standard deviation equal to 0.5 (respectively, 1.6).This is clear that since the patch was released, this strategy hasbecome balanced. Moreover, we can remark that this strategyis no longer used by players: in fact, the number of extracted

patterns related to this strategy decreases by 40% whereasthe number of extracted patterns only decreases by 5% fromSbefore to Safter. Thus, BALANCESPAN enables to see theimpact of the release of patch, by analyzing the period beforeand after this key date.

3) On the Diversity in Mirror Matchups: It is more deli-cate to speak about balance when both players belong to thesame faction (mirror matchups): both players have access tothe same strategies (same building, hence a symmetrical game).Let us observe, for example, the data set PvP with a new vocab-ulary: items are tuples (building, sign, ith window, jbuilding)where the last element records how many of these buildingswere already made at the moment of the action (cumulative).Setting a minimal support to σ = 0.05, we obtain 3418 patterns.We can find here the so-called “4 Gates” strategy through thepattern

s = ⟨{(Gateway,+, 3, 1)(Assimilator,+, 3, 1)},{(Cyb.Core,+, 4, 1)},{(Gateway,+, 7, 2)(Gateway,+, 7, 3)(Gateway,+, 7, 4)}⟩

with balance(s) = 0.59. Such a high value may be surprising,but it reflects the effectiveness of this strategy, and consequentlythe poor diversity of the strategies used in the PvP matchup. It isan easy and nonrisky strategy to apply: According to the expert,a player has better chances to win with this strategy againstriskier strategies. After several recurrent complaints, nothingchanged in the game until a new major update of the games(with new units and buildings). Since then strategies used in thePvP match up are more diversified and the “4 Gates” strategy israrely used.

VIII. RELATED WORK

Discovering patterns that highly distinguish a data set fromothers (e.g., “win” labeled objects versus “lose” labeled objects)is an important task in machine learning and data mining [13].One of the main reasons is that such patterns enable the buildingof comprehensible classifiers [18]. In the general settings, weare given a set of objects of different classes that take descrip-tions, generally from a partially ordered set (itemsets, graphs,intervals, etc.) [19]. The goal is to find good description gen-eralizations that mostly appear in one class of objects and notin the others. In different fields of AI and applied mathemat-ics, such descriptions have different names [13] such as versionspaces [20], contrast sets [21], and subgroups discovery [22]in machine learning; emerging patterns [12] in data mining; orno counterexample hypothesis in formal concept analysis [19].Our contribution in this field is to consider and compute effi-ciently a balance measure that existing methods can partially ornonefficiently compute.

StarCraft II and other RTS games, in general, face severalresearch challenges in artificial intelligence as thoroughlydiscussed in a recent survey [6]. Our work is related to thechallenge that the authors of the survey qualify as “priorlearning,” that is, techniques that can “exploit available datasuch as existing replays [. . .] for learning appropriate strategiesbeforehand.” Strategies in an RTS game are complex anddivided into several tasks, each bringing difficult problems.

132 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 9, NO. 2, JUNE 2017

Several case-based reasoning approaches have been proposed,mainly to retrieve and adapt strategies (especially build orders)to be used then by an automated agent [23], [24]. Otherkinds of approaches are also used for several prediction tasks.Predicting the opponent’s production was studied with answerset programming [25], while learning transition probabilitieswithin build orders was achieved with hidden Markov models[26]. Weber et. al. described any past game by a vector ofbuilding and upgrade timings: such features allow an accuratestrategy prediction [27]. This comes with a limit: game logs area priori labeled with a strategy using rules based on “manual”expert analysis. The same applies for opening prediction [28].

To discover strategies in large volumes of replays, avoidingto manually label game logs, KDD methods are required, andespecially pattern mining techniques. This was highlighted inthe open problems category “domain knowledge” of the recentsurvey mentioned before [6]: “Is it possible to devise techniquesthat can automatically mine existing collections [. . .] and incor-porate it into the bot?.” We did not study in this paper the secondstep (incorporation), but presented a way to extract efficientlysuch patterns and focused on balance issues for helping gamedesigners. A long road remains to be able to select the right pat-terns to be used by artificial agents, as discussed recently in [6]and [29].

IX. CONCLUSION

We tackled the problem of mining frequent sequential pat-terns in RTS games whose balance measures provide mean-ingful insights on the strategies played and their ability ofbeing in equilibrium or not. For that matter, we revisited thewell-known notions of discriminant pattern mining to provideefficient algorithms for the elicitation of balance hypothesesfrom the data.

From that, we presented several algorithms that enable (par-tially or not) dealing with interaction databases, and we showedthat only BALANCESPAN enables to deal with all data setsefficiently.

We empirically validated that the balance measure is able todistinguish balanced and imbalanced strategies. We believe thatour approach can become a basic tool for balance designerswhen analyzing a subset of historical data of a game in betaphase, or even after its release, through an exploratory process(KDD and interactive mining). A major difficulty remains toselect and construct features of interest from the game logs.

REFERENCES

[1] T. L. Taylor, Raising the Stakes: E-Sports and the Professionalization ofComputer Gaming, Cambridge, MA, USA, MIT Press, 2012.

[2] G. Cheung and J. Huang, “Starcraft from the stands: Understanding thegame spectator,” in Proc. Int. Conf. Human Factors Comput. Syst., 2011,pp. 763–772.

[3] M. Kaytoue, A. Silva, L. Cerf, W. M. Jr., and C. Rassi, “Watch me play-ing, I am a professional: A first study on video game live streaming,” inProc. 21st World Wide Web Conf., 2012, pp. 1181–1188.

[4] A. Von Eschen, “Machine learning and data mining in call of duty,”European Conference on Machine Learning and Knowledge Discoveryin Databases (ECML-PKDD), ser. Lecture Notes in Computer Science,Berlin, Germany: Springer-Verlag, 2014, vol. 8724.

[5] M. A. Ahmad, B. Keegan, J. Srivastava, D. Williams, andN. S. Contractor, “Mining for gold farmers: Automatic detection ofdeviant players in MMOGs,” in Proc. 12th IEEE Int. Conf. Comput. Sci.Eng., 2009, pp. 340–345.

[6] S. Ontañón et al., “A survey of real-time strategy game AI research andcompetition in Starcraft,” IEEE Trans. Comput. Intell. AI Games, vol. 5,no. 4, pp. 293–311, 2013.

[7] O. Missura and T. Gärtner, “Predicting dynamic difficulty,” 25thAnnual Conference on Neural Information Processing Systems 2011,ser. Advances in Neural Information Processing Systems 24, Cambridge,MA: MIT Press, 2011, pp. 2007–2015.

[8] U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “The KDD processfor extracting useful knowledge from volumes of data,” Commun. ACM,vol. 39, no. 11, pp. 27–34, 1996.

[9] C. C. Aggarwal, Data Mining—The Textbook, New York, NY, USA:Springer-Verlag, 2015.

[10] A. Giacometti, D. H. Li, P. Marcel, and A. Soulet, “20 years of patternmining: A bibliometric survey,” SIGKDD Explor., vol. 15, no. 1, pp. 41–50, 2013.

[11] J. Pei et al., “Prefixspan: Mining sequential patterns by prefix-projectedgrowth,” in Proc. 17th Int. Conf. Data Eng., 2001, pp. 215–224.

[12] G. Dong and J. Li, “Efficient mining of emerging patterns: Discoveringtrends and differences,” in Proc. 5th ACM SIGKDD Int. Conf. Knowl.Disc. Data Mining, 1999, pp. 43–52.

[13] P. K. Novak, N. Lavrac, and G. I. Webb, “Supervised descriptive rule dis-covery: A unifying survey of contrast set, emerging pattern and subgroupmining,” J. Mach. Learn. Res., vol. 10, pp. 377–403, 2009.

[14] M. Plantevit and B. Crémilleux, “Condensed representation of sequen-tial patterns according to frequency-based measures,” 8th InternationalSymposium on Intelligent Data Analysis, IDA 2009, ser. Advances inIntelligent Data Analysis VIII, Cambridge, MA, USA: MIT Press, 2009,pp. 155–166.

[15] G. Bosc, M. Kaytoue, C. Rassi, J. Boulicaut, and P. Tan,“Balancespan,” 2015, [Online]. Available: https://github.com/guillaume-bosc/BalanceSpan.

[16] A. Soulet, C. Rassi, M. Plantevit, and B. Crémilleux, “Mining domi-nant patterns in the sky,” in Proc. 11th IEEE Int. Conf. Data Mining,Vancouver, BC, Canada, Dec.11–14, 2011, pp. 655–664.

[17] M. van Leeuwen, “Interactive data exploration using pattern mining,” inProc. Interactive Knowl. Disc. Data Mining Biomed. Inf., State-of-the-ArtFuture Challenges, 2014, pp. 169–182.

[18] M. García-Borroto, J. F. M. Trinidad, and J. A. Carrasco-Ochoa, “A sur-vey of emerging patterns for supervised classification,” Artif. Intell. Rev.,vol. 42, no. 4, pp. 705–721, 2014.

[19] S. O. Kuznetsov, “Galois connections in data analysis: Contributionsfrom the Soviet era and modern Russian research,” in Proc. FormalConcept Anal. Found. Appl., 2005, pp. 196–225.

[20] T. M. Mitchell, Machine Learning, ser. Computer Science, New York,NY, USA: McGraw-Hill, 1997.

[21] S. D. Bay and M. J. Pazzani, “Detecting group differences: Miningcontrast sets,” Data Mining Knowl. Disc., vol. 5, no. 3, pp. 213–246,2001.

[22] F. Herrera, C. J. Carmona, P. González, and M. J. del Jesús, “An overviewon subgroup discovery: Foundations and applications,” Knowl. Inf. Syst.,vol. 29, no. 3, pp. 495–525, 2011.

[23] D. W. Aha, M. Molineaux, and M. J. V. Ponsen, “Learning to win: Case-based plan selection in a real-time strategy game,” in Proc. 6th Int. Conf.Case-Based Reason. Res. Develop., 2005, pp. 5–20.

[24] B. G. Weber and M. Mateas, “Case-based reasoning for build orderin real-time strategy games,” in Proc. 5th AAAI Conf. Artif. Intell.Interactive Digit. Entertain., 2009, pp. 106–111.

[25] M. Stanescu and M. Certicky, “Predicting opponent’s production inreal-time strategy games with answer set programming,” IEEE Trans.Comput. Intell. AI Games, vol. 8, no. 1, pp. 89–94, 2014, DOI:10.1109/TCIAIG.2014.2365414.

[26] E. W. Dereszynski et al., “Learning probabilistic behavior models in real-time strategy games,” in Proc. 7th AAAI Conf. Artif. Intell. InteractiveDigit. Entertain., 2011, pp. 20–25.

[27] B. G. Weber and M. Mateas, “A data mining approach to strat-egy prediction,” in Proc. IEEE Symp. Comput. Intell. Games, 2009,pp. 140–147.

[28] G. Synnaeve and P. Bessière, “A Bayesian model for opening predictionin RTS games with application to Starcraft,” in Proc. IEEE Conf. Comput.Intell. Games, 2011, pp. 281–288.

[29] M. Leece and A. Jhala, “Sequential pattern mining in Starcraft: Broodwar for short and long-term goals,” in Proc. 10th AAAI Conf. Artif. Intell.Interactive Digit. Entertain., 2014, pp. 281–288.