constraint solving meets data mining and machine learningdata mining and machine learning tias guns...

100
Constraint Solving meets Data Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium Guest Lecture @ AI lab, VUB, Belgium 22 Feb. 2013

Upload: others

Post on 15-Apr-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Constraint Solvingmeets

Data Mining and Machine Learning

Tias GunsDTAI, KU Leuven, Belgium

Guest Lecture @ AI lab, VUB, Belgium 22 Feb. 2013

Page 2: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

One of the success stories of A.I.

model: declarative specification of constraints

+

search: generic handling of variables and constraints & efficient propagation of individual constraints

Used in scheduling, planning, bio-informatics, game playing, verification, logical reasoning, ...

Constraint Solving

Page 3: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Data mining & Machine Learning“Using historical data to improve decisions”

● Data Mining: discovering knowledge in datafor example purchasing behaviour, biological data, ...

● Software we can't program by handfor example self-driving cars, speech recognition, …

● Self-customizing programsfor example spam filters, recommender systems, …

Also hyped as data analytics and big data[T. Mitchell, Machine Learning, 1997]

Page 4: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

This lecture

Data Mining &Machine Learning

ConstraintSolving

How can the two fields benefit from each-other?

● Part A: Introduction● Part B: Solving in ML & DM● Part C: Learning in constraint solvers

Page 5: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Part A: introduction

Page 6: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative vs ImperativeTwo approaches to problem solving.

Page 7: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

I want a carpet that is:

Declarative

● 5m x 2m● blue● with fish patterns● and little ropes on the side

Page 8: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Imperative

Now,

● put a blue wire,● tighten it,● add two layers of white,● make a nod,● ...

Page 9: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Example: graph coloring

Page 10: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Graph coloring, imperativeNaieve:

● depth-first search over countries,do not assign neighbors the same color

● O(kn) k=colors, n=nodes/countries

Smart:● based on the principle of inclusion-exclusion and the

zeta transform [Björklund, A.; Husfeldt, T.; Koivisto, M. (2009), "Set partitioning via inclusion–exclusion", SIAM Journal on Computing 39 (2): 546–563]

● O(n*2n)

Development time vs execution time?

Page 11: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Graph coloring, declarative

Neighboring countries shouldhave different colors

Page 12: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Graph coloring, declarativeConstraint programming● Variables

ex. countries

● Domains ex. colors

● Constraints ex. neighbor colors

Page 13: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Imperative vs DeclarativeImperative:

● Low-level control● Typically very fast and efficient● Very specific for one problem

Declarative:● high-level modeling● less fast and scalable than hand-made algorithms● general and reusable

Page 14: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

From imperative to declarativeExample, evolution in databases:

● Data in files, access with IO calls (syscall level)● Data in records, access by following pointers

(data definition language + data manipulation language)● Data in tables/relations (schema + query language)

Abstraction, reuse, query optimisation, indexing, continued progress and improvements

Warning: generality/efficiency trade-off remains● NoSQL (key/value) for specialised data (images, graphs)

or specific settings (high-speed low-consistency)

Page 15: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

AI Research in 2012Recent conferences and journals have:

● Search and Planning● SAT and Constraints● Probabilistic Planning● Probabilistic Reasoning● Inference in First-Order Logic● Machine Learning & Data Mining● Natural Language● Vision and Robotics● Multi-agent systems

[H. Geffner, AI: From Programs to Solvers, Turing Session, ECAI-2012]

Trend from imperativeto declarative:

● SAT solvers● CP solvers● BDDs● MIP & QP● Bayesian Networks● Markov Logic● POMDPs● ...

Page 16: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Constraint Solving

An incomplete categorisation

Symbolic Numeric

SAT

SMT

ASP

FO(.)

CP Constr. basedLocal Search

LNS

Know.Comp.

BDD

ADD

Math.Progr.

LP

MIP

Convex opt.

weightedCP

pseudo-boolean

Numeric Other

Page 17: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative Constraint Solving

Mantra:

Constraint Solving = Model + Search

by the user

by a solver

Page 18: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SAT solving● Propositional Satisfiability● Example: the 'frietkot' problem

Page 19: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SAT solving● Propositional Satisfiability● First proven NP-complete problem (Cook, 1971)● Input: clauses

● X \/ Y \/ -Z

● -X \/ Z

● Output: UNSATISFIABLE, or, assignment to variables that satisfies all clauses

Page 20: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SAT solvingAdvantages/disadvantages:✔ Extremely optimised solvers

● standard input format (making comparison easy)● yearly competitions● sustained scientific progress

Page 21: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SAT solver research

Time

(max 1200 sec)

Number of problems solved

20022003

20042005

20062007

2008

2009

2010

2011

Page 22: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SAT solvingAdvantages/disadvantages:✔ Extremely optimised solvers

● yearly competitions● standard input format (making comparison easy)● sustained yearly progress

✗ No support for modelling high-level problems

Page 23: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Constraint Programming, example● variables

[E11

... E99

]

● domains

Exy

= {1 ... 9}

● constraints

all_different([E1x

]), ...

all_different([Ex1

]), ...

all_different([E11

...E33

]), ...

all_diff(

all_diff(

all_diff(

all_diff(

...

...

) )

)

)

all_diff()

Page 24: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Constraint Programming● CSP or COP (satisfaction / optimisation)● Solving combinatorial problems (typically in NP)

scheduling, routing, planning, ...

● Input: Variables, Domains, Constraints

● High level modeling languages (Zinc, Essence, OPL)

● 'global constraints'

Page 25: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP SearchTwo key principles:

● Propagation of constraintseg. alldiff(X,Y,Z) X={1},Y={1,2},Z={1,2,3,4} → Y={2},Z={3,4}

Every constraint is implemented by a propagator.

● Branch over values of variableseg. Propagation at fixpoint → branch over Z={3}

Search is recursive and complete

Page 26: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative constraint solvingAdvantages

● general approach● reuse of solvers, modeling primitives

Disadvantage● need expertise (good model/bad model)● search heuristics huge impact on performance

Use historical data to improve decisions?→ Machine Learning

Page 27: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

What DM & ML offers to CS

The use of historical data:● Learning/improving models (constraints)● Learning/improving search strategies, solver selection,

heuristics, etc

Data Mining &Machine Learning

ConstraintSolving

Page 28: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

What CS offers to DM & ML

● Declarative: model + search● Decomposability and reuse● General: many tasks, variations● Rapid prototyping, iterative process

Data Mining &Machine Learning

ConstraintSolving

Page 29: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

This lecture

Data Mining &Machine Learning

ConstraintSolving

How can the two fields benefit from each-other?

● Part A: Introduction● Part B: Solving in ML & DM● Part C: Learning in constraint solvers

Page 30: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Part B: Solving in ML & DM

Page 31: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Data Mining and Machine LearningSymbolic

● Rule learning● Decision trees● Clustering● Pattern Mining● ...

Numeric● Regression● SVMs● Matrix factorisation● ...

Mostly using numeric optimisation:least squares, gradient decent,convex optimisation, ...

Mostly using hand-craft algorithms:Ripper, C4.5, k-means, Apriori, ...

Page 32: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 33: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 34: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 35: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 36: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 37: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009][Slide by E. Ricci, Analysis of Patterns, 2009]

Page 38: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 39: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 40: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

[Slide by E. Ricci, Analysis of Patterns, 2009]

Page 41: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative methods in MLOften problem expressible as linear/convex opt. problem → can solve with standard ILP/QP solvers

In practice, specialised solvers are often used:● faster, lighter implementations● special-purpose decompositions● dealing with large and sparse data

Page 42: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Data Mining and Machine LearningSymbolic

● Rule learning● Decision trees● Clustering● Pattern Mining● ...

Numeric● Regression● SVMs● Matrix factorisation● ...

Mostly using numeric optimisation:least squares, gradient decent,convex optimisation, ...

Mostly using hand-craft algorithms:Ripper, C4.5, k-means, Apriori, ...

Can declarative methodsbe used here too?

Page 43: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Basic mining taskAnalysing a datasetto find patterns of interest

For example:

Analysing purchases (e.g. books)

Here, patterns are sets of 'items'

(e.g. + + )

Page 44: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Patterns of interest:

● which patterns are frequent?

● which patterns have a high average price?

● which patterns are frequent on one datasetand infrequent on the other?

● which patterns are significant w.r.t a background model?

● ...

→ specified by constraints

Page 45: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Constraint-based Pattern Mining● Numerous constraints proposed● Numerous algorithms developed

Yet,● new constraints mostly require new implementations● very hard to combine different constraints

Surprisingly, CP had not beenapplied to Pattern Mining

Page 46: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Pattern MiningDepends on type of data

Itemset MiningText Mining Graph Mining

Well, there's egg and spam; egg sausage and spam; spam and bacon; egg bacon and spam; egg bacon sausage and spam; spam bacon sausage and spam; spam egg spam spam bacon and spam; spam sausage spam spam bacon spam tomato and spam; spam spam spam egg and spam; spam spam spam spam baked beans spam spam spam spam; or Lobster Thermidor, a Crevette with a mornay sauce served in a Provencale manner with shallots and aubergines garnished with truffle pate, brandy and spam.

Page 47: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Pattern MiningDepends on type of data

Itemset MiningText Mining Graph Mining

Well, there's egg and spam; egg sausage and spam; spam and bacon; egg bacon and spam; egg bacon sausage and spam; spam bacon sausage and spam; spam egg spam spam bacon and spam; spam sausage spam spam bacon spam tomato and spam; spam spam spam egg and spam; spam spam spam spam baked beans spam spam spam spam; or Lobster Thermidor, a Crevette with a mornay sauce served in a Provencale manner with shallots and aubergines garnished with truffle pate, brandy and spam.

Mostfundamental

setting

Page 48: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

sets of items

Page 49: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Find: set of items appearing frequently

Example:

Itemset Mining

{ , }: frequency = 2

Page 50: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative approach

coverage( , ) = { , }

frequency( , ) = 2

Page 51: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Mining

coverage:

frequency: ∑tT t≥Freq

∀T t : T t=1⇔({I1, ... In}⊆rowt)

Page 52: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Mining

coverage:

frequency: ∀ I i : I i=1⇒∑t ,Dti=1T t≥Freq

∀T t : T t=1⇔∑iI i(1−Dti)=0

Page 53: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP4IM, basic model

Page 54: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Traditional search: proj. database

Page 55: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Miningcoverage:

freq >= 2: ∀ I i : I i=1⇒∑t ,Dti=1T t≥Freq

∀T t : T t=1⇔ ∧i , Dti=0 ¬I i

Page 56: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Miningcoverage:

freq >= 2:

● propagate i2Intuition: infrequent i2 can never be part of freq. superset

∀ I i : I i=1⇒∑t ,Dti=1T t≥Freq

∀T t : T t=1⇔ ∧i , Dti=0 ¬I i

Page 57: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Miningcoverage:

freq >= 2:

● propagate i2● propagate t1

Intuition: unavoidable t1 will always be covered

∀ I i : I i=1⇒∑t ,Dti=1T t≥Freq

∀T t : T t=1⇔ ∧i , Dti=0 ¬I i

Page 58: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Miningcoverage:

freq >= 2:

● propagate i2● propagate t1● branch i1=1

∀ I i : I i=1⇒∑t ,Dti=1T t≥Freq

∀T t : T t=1⇔ ∧i , Dti=0 ¬I i

Page 59: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset Miningcoverage:

freq >= 2:

● propagate i2● propagate t1● branch i1=1● ...

∀ I i : I i=1⇒∑t ,Dti=1T t≥Freq

∀T t : T t=1⇔ ∧i , Dti=0 ¬I i

Page 60: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

More constraints● Coverage (required)

● Frequent

● Maximal

● Closed

● Delta-closed

● ...

I i=1⇒∑tT tDti≥Freq

T t=1⇔∑iI i1−Dti=0

I i=1⇔∑tT tDti≥Freq

I i=1⇔∑tT t 1−Dti=0

I i=1⇔∑tT t 1−−Dti=0

+ combinations

Page 61: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Generality

Page 62: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

35% 30% 25% 20% 15% 10% 5% 1%0.05

0.5

5

50

500

FIMCPPATTERNISTLCM5.3LCM2.5MAFIAB_ECLATB_FPGROWTHB_APRIORIDMCP

Minimum support

Ru

ntim

e (

s)

Simple Itemset MiningCP (Gecode)

Specialisedsystems

}

coverage+frequency

Page 63: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Constraint-based mining

5 15 25 350.1

1

10

100

1000

FIM_CP_1%FIM_CP_5%FIM_CP_10%PATTER_1%PATTER_5%PATTER_10%LCM_10%

MaxAvgCost

Ru

ntim

e(s

)

CP (Gecode)

Specialisedsystems

Page 64: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Correlated itemset miningAlso known as: discriminative itemset mining, contrast set mining, emerging itemsets, subgroup discovery, ...

● Given: labelled transactions

● Find: the itemset that best correlates with the class label : { , } , : { , }

Page 65: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Correlation constraint

● Existing pruning technique:

only uses upper-bound of

● Our CP-based propagator:

uses upper- and lower-bound of and look-ahead formulation

f ∑t∈PT t ,∑t∈N

T t ≥Bound

∑T

∑TI i=1⇒ ...

much stronger propagation !

Page 66: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Correlated itemset mining

Runtime in seconds

Page 67: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Decreasing the gap

An integrated CP solver would:

● use principles of both IM and CP

● focus on constraints for itemset mining

35% 30% 25% 20% 15% 10% 5% 1%0.05

0.5

5

50

500

Hypothesis: unnecessary overhead in CP solver

Page 68: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Itemset Mining principles

● Search strategy Level-wise, BFS, DFS

● Representation of data

● Representation of sets

1 0 1 11 1 0 10 0 1 1

110

010

101

111

1 0 1 11 1 0 10 0 1 1

1 0 1 1 0 1 0 0{1, 3, 4} {2}

Page 69: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

EclatMiner

GecodeCP Solver

DMCP CP Solver

Search strategy

DFS DFS (binary) DFS (binary)

Repres. of data

Shared,vertical

In constraints(up to 4 copies)

Shared matrix (default: vertical)

● Data shared (read-only) by constraints● Horizontal, positive and negative views available

Integration 1/3

Page 70: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

EclatMiner

GecodeCP Solver

Our DMCP CP Solver

Repres. of sets

Sparse or Dense

Sparse Sparse or Dense

Types of vars. Boolean vector (set)

Bool, Int, Set, ... Boolean vector (set)

● Represented by lower and upper bound:

0, 0/1, 0/1, 1

Min: {0, 0, 0, 1}

Max: {0, 1, 1, 1}

Integration 2/3

Page 71: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

EclatMiner

GecodeCP Solver

Our DMCP CP Solver

Constraints Few,hard to

combine

Many,easy to

add/combine

Some,easy add/combine

Constraint activ.

Strict order(in algorithm)

On domain change

Change of lower/upper bound

● General matrix constraint:

Integration 3/3

Data representation (matrix)

Boolean vectors

Page 72: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Frequent Itemset Mining

35% 30% 25% 20% 15% 10% 5% 1%0.05

0.5

5

50

500

Mushroom (Frequent)

FIMCPPATTERNISTLCM5.3LCM2.5MAFIAB_ECLATB_FPGROWTHB_APRIORIDMCP

Minimum support

Ru

ntim

e (

s)

Old, Gecode

New, DMCP

Page 73: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

50% 10% 5% 1% 0.5% 0.1% 0.05% 0.01%0.05

0.5

5

50

500

T10I4D100K (Frequent)

FIMCPPATTERNISTLCM5LCM2MAFIAB_ECLATB_FPGROWTHB_APRIORIDMCP

Minimum support

Run

time

(s)

Frequent Itemset Mining, scaling

Old, Gecode

New, DMCP

Page 74: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

35% 30% 25% 20% 15% 10% 5% 1%0.05

0.5

5

50

500

Splice (Closed)

FIMCPPATTERNISTLCM5.3LCM2.5MAFIAB_ECLATB_FPGROWTHB_APRIORIDMCP

Minimum support

Ru

ntim

e (

s)

Closed Itemset Mining

Old, Gecode

New, DMCP

Page 75: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

CP for Itemset MiningAdvantages of CP modelling:

● Easily add new constraints● Freely combine constraints

Advantage of IM/CP solver integration:● Theoretical: polynomial delay analysis● Practical: remove efficiency/scalability gap

Model

Search

Page 76: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Data Mining and Machine LearningSymbolic

● Rule learning● Decision trees● Clustering● Pattern Mining● ...

Mostly using hand-craft algorithms:Ripper, C4.5, k-means, Apriori, ...

Can declarative methodsbe used here too?

Correlated Itemset Mining [Nijssen et al. KDD 09]

using CP [Bessiere et al. CP 09]

SAT & clustering [Davidson et al. SDM 10]

Itemset Mining: [Guns et al. KDD08, AIJ 12] [many more]

Sequence mining: [Coquery et al. ECAI 12], ...

Page 77: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Open Questions● More tasks, different types of problems● Structured data (sequences, trees, graphs)● Efficiency/generality trade-off● Scalability and specialised solvers● High-level modeling language for DM?

Page 78: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

This lecture

Data Mining &Machine Learning

ConstraintSolving

How can the two fields benefit from each-other?

● Part A: Introduction● Part B: Solving in ML & DM● Part C: Learning in constraint solvers

Page 79: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Part C: Learning in constraint solvers

Page 80: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative Constraint Solving

Mantra:

Constraint Solving = Model + Search

by the user

by a solver

Page 81: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Declarative Constraint SolvingAdvantages

● general approach● reuse of solvers, modeling primitives

Disadvantage● need expertise (good model/bad model)● search heuristics huge impact on performance

Use historical data to improve decisions?→ Machine Learning

Page 82: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

No Free Lunch theoremNo single algorithm is best on all problem instances

→ Can we characterize/learn which algorithm is best on which problem instance?

In many competitions (SAT, CP, rostering, …),big gap between 'single best solver' and 'oracle solver'

Empirical hardness modelsHardness of a problem vs design choices in an algo.

Page 83: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

● given a number of solvers,which solver to choose?

→ algorithm selection (also known as: portfolio's, meta-learning, ...)

● given an algorithm with a number of parameters,which parameter values to set?

→ algorithm configuration (= alg. selection when small parameter space)

Page 84: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Algorithm selectionGeneral approach:

1. Collect solvers

2. Collect problem instances/data

3. Calculate features on instance/data

4. Build predictive model

[Kotthof, Survey, 2012]

Page 85: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Algorithm selection 1. Collect solvers● Availability of solvers?● Too many solvers?● Diversity of solvers?

Page 86: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Algorithm selection 2. Collect problem instances/data● Similar problems? (SAT vs CP)● Representable set of data?● Diversity of instances/data?

Page 87: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Algorithm selection 3. Calculate features on instance/data● What features to use?

→ Domain specific!● Number of features to use?● Normalisation? Features selection? Stacking?

Features are arguably the MOST IMPORTANT choice(in ML in general)

Page 88: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Algorithm selection 4. Build predictive model● What model?

● per solver, e.g. regression?● per pair-of-solver, classification?● per portfolio, classification or ranking?

● Sensitivity to features? (e.g. noise, redundancy)● Clustering / hierarchical models?

Page 89: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SATZilla 2007: winning the SAT competition 1. Solvers:

● From previous competitions (~20)● Subset selection to select ~10 diverse ones

(as measured by repeatedly building portfolio's)● Two pre-solvers (limited amount of time)● One backup solver (if all else fails)

2. Problem instances/data● From previous competitions

Page 90: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SATZilla 2007: winning the SAT competition 3. Calculate features on instance/data

● 64 SAT-specific features● feature selection● speed to compute vs gain in prediction performance

4. Build predictive model● logistic regression of log(runtime)● censored data/timeouts● hierarchical: predict sat/unsat

Page 91: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SATZilla 2007: winning the SAT competition

HANDMADE

Pre-solving

Feat. Comp.

Oracle

SATzilla07

Page 92: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

SATzilla 2011: more success● More features (138)● Learn best pre-solvers● Feature computation prediction (2 levels of features)● Backup solver: not best overall, but best on feature-

timeout instances

Learning method:● Replace regression by pairwise classification,● Take cost of mis-classification (runtime) into

account

Random Oracle Handmade Industrial

Page 93: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

General lessons● Good classifier alone not enough to win a competition

(pre-solvers, backup solver, time of feat. calculation)

● Need good features (and feature selection)

● Hierarchical models: clustering problem instances and having separate portfolio's offers gain

● The best classifiers take the actual runtime into account (new: and cost of misclassification?)

Page 94: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Why does it work?Machine Learning perspective:● Similar to ensembles:

combining predictions = minimising the variance

● Many more interesting (underexplored?) connections to boosting, bagging, and ensembles.

In SAT community: even single solver has large variance on runtime for a single input file (heuristic choices)

Page 95: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Open Questions● Bounds on improvement? Learning theory?● Limited use of probabilistic techniques?

In wider context of improving constraint solving:● Learning parameters [iRace] or entire search

strategies [grammar approach IRIDIA]?● Learning model reformulations? [ModRef]● Learning constraints/models? [ConAcq, ModelSeeker]

Page 96: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

This lecture

Data Mining &Machine Learning

ConstraintSolving

How can the two fields benefit from each-other?

● Part A: Introduction● Part B: Solving in ML & DM● Part C: Learning in constraint solvers

Page 97: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

What CS offers to DM & ML

● Declarative: model + search● Decomposability and reuse● General: many tasks, variations● Rapid prototyping, iterative process

Data Mining &Machine Learning

ConstraintSolving

Page 98: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

What DM & ML offers to CS

The use of historical data:● Learning/improving models (constraints)● Learning/improving search strategies, solver selection,

heuristics, parameters, etc

Data Mining &Machine Learning

ConstraintSolving

Page 99: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Thank you for listening

Data Mining &Machine Learning

ConstraintSolving

Questions?

Page 100: Constraint Solving meets Data Mining and Machine LearningData Mining and Machine Learning Tias Guns DTAI, KU Leuven, Belgium ... for example spam filters, recommender systems, …

Possible task:SatZilla data:

http://www.cs.ubc.ca/labs/beta/Projects/SATzilla/

Do solver selection with ML techniques:● Features matter!● Imperative or declarative learning methods?● Ease of modification/improvement of learning

technique?● Additional improvements tuning with, e.g., iRace?