ilp slides

Upload: annada-priyadarshinee

Post on 08-Apr-2018

239 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 ILP Slides

    1/44

    Inductive Logic Programming(for Dummies)

    Anoop & Hector

  • 8/6/2019 ILP Slides

    2/44

    EDAM Reading Group 2004 2

    KnowledgeD

    iscovery inDB

    (KDD

    )automatic extraction of novel, useful, and valid

    knowledge from large sets of data

    Different Kinds of: Knowledge

    Rules

    Decision trees

    Cluster hierarchies Association rules

    Statistically unusual subgroups

    Data

  • 8/6/2019 ILP Slides

    3/44

    EDAM Reading Group 2004 3

    Relational AnalysisWould it not be nice to have analysis methods and

    data mining systems capable of directly working

    with multiple relations as they are available in

    relational database systems?

  • 8/6/2019 ILP Slides

    4/44

    EDAM Reading Group 2004 4

    Single Table vs RelationalD

    MThe same problems still remain.

    But solutions?

    More problems:

    Extending the key notations

    Efficiency concerns

  • 8/6/2019 ILP Slides

    5/44

    EDAM Reading Group 2004 5

    RelationalD

    ata MiningData Mining : ML :: Relational Data Mining : ILP

    Initially

    Binary Classification

    Now

    Classification, Regression, Clustering, AssociationAnalysis

  • 8/6/2019 ILP Slides

    6/44

    EDAM Reading Group 2004 6

    ILP? India Literacy Project?

    International Language Program?

    Individualized Learning Program?

    Instruction Level Parallelism?

    International Lithosphere Program?

  • 8/6/2019 ILP Slides

    7/44

  • 8/6/2019 ILP Slides

    8/44

    EDAM Reading Group 2004 8

    D

    eductive Vs Inductive ReasoningT B p E(deduce)

    parent(X,Y) :- mother(X,Y).

    parent(X,Y) :- father(X,Y).

    mother(mary,vinni).

    mother(mary,andre).

    father(carrey,vinni).

    father(carry,andre).

    parent(mary,vinni).

    parent(mary,andre).

    parent(carrey,vinni).parent(carrey,andre).

    parent(mary,vinni).

    parent(mary,andre).

    parent(carrey,vinni).

    parent(carrey,andre).

    mother(mary,vinni).

    mother(mary,andre).

    father(carrey,vinni).

    father(carry,andre).

    parent(X,Y) :- mother(X,Y).

    parent(X,Y) :- father(X,Y).

    E B p T(induce)

    7

    7

  • 8/6/2019 ILP Slides

    9/44

    EDAM Reading Group 2004 9

    ILP: ObjectiveGiven a dataset: Positive examples (E+) and optionally negative examples (E-)

    Additional knowledge about the problem/application domain (Background

    KnowledgeB

    ) Set of constraints to make the learning process more efficient (C)

    Goal of an ILP system is to find a set of hypothesis that: Explains (covers) the positive examples - Completeness

    Are consistent with the negative examples - Consistency

    ).,covers(:),covers(:: nhNnphPpHh 0

  • 8/6/2019 ILP Slides

    10/44

    EDAM Reading Group 2004 10

    DB

    vs. Logic ProgrammingDB Terminology

    Relation namep

    Attribute of relationp Tuple

    Relationp

    a set of tuples

    Relation q

    defined as a view

    LP Terminology Predicate symbolp

    Argument of predicatep Ground factp(a1,,an)

    Predicatep

    defined extensionally by a

    set of ground facts Predicate q

    defined intentionally by aset of rules (clauses)

  • 8/6/2019 ILP Slides

    11/44

    EDAM Reading Group 2004 11

    Relational PatternIF Customer(C1,Age1,Income1,TotSpent1,BigSpender1)

    AND MarriedTo(C1,C2)

    AND Customer(C2,Age2,Income2,TotSpent2,BigSpender2)

    AND Income2 10000

    THEN BigSpender1 = Yes

    big_spender(C1,Age1,Income1,TotSpent1)

    married_to(C1,C2)customer(C2,Age2,Income2,TotSpent2,BigSpender2)

    Income2 10000

    u

    n

    u

  • 8/6/2019 ILP Slides

    12/44

    EDAM Reading Group 2004 12

    A Generic ILP Algorithmprocedure ILP (Examples)

    INITIALIZE (Theories, Examples)

    repeatT = SELECT (Theories, Examples)

    {Ti}ni=1 = REFINE (T, Examples)

    Theories = REDUCE (Theories Ti, Examples)

    until STOPPINGCRITERION (Theories, Examples)

    return (Theories)

    77n

    i 1!

  • 8/6/2019 ILP Slides

    13/44

    EDAM Reading Group 2004 13

    Procedures for a Generic ILP Algo.

    INITIALIZE: initialize a set of theories(e.g. Theories = {true} orTheories = Examples)

    SELECT: select the most promising candidate theory

    REFINE: apply refine operators that guarantee new theories(specialization, generalization,).

    REDUCE: discard unpromising theories

    STOPPINGCRITERION: determine whether the current set of theories

    is already good enough(e.g. when it contains a complete and consistent theory)

    SELECT and REDUCE together implement the search strategy.

    (e.g. hill-climbing: REDUCE = only keep the best theory.)

  • 8/6/2019 ILP Slides

    14/44

    EDAM Reading Group 2004 14

    Search Algorithms

    Search Methods Systematic Search

    Depth-first search

    Breadth-first search

    Best-first search

    Heuristic Search

    Hill-climbing

    Beam-search

    Search Direction Top-down search: Generic to specific

    Bottom-up search: Specific to general

    Bi-directional search

  • 8/6/2019 ILP Slides

    15/44

    EDAM Reading Group 2004 15

    Example

  • 8/6/2019 ILP Slides

    16/44

    EDAM Reading Group 2004 16

    Search Space

  • 8/6/2019 ILP Slides

    17/44

    EDAM Reading Group 2004 17

    Search Space

  • 8/6/2019 ILP Slides

    18/44

    EDAM Reading Group 2004 18

    Search Space as Lattice

    Search space is a lattice underU-subsumption There exists a lub andglb for every pair of

    clauses

    lub is least general generalization

    Bottom-up approaches find the lggof the

    positive examples

  • 8/6/2019 ILP Slides

    19/44

    EDAM Reading Group 2004 19

    Basics of ILP contd

    Bottom-up approach of finding clauses leads to longclauses through lgg.

    Thus, prefer top-down approach since shorter and moregeneral clauses are learned

    Two ways of doing top-down search FOIL: greedy search using information gain to score

    PROGOL: branch-and-bound, usingP-N-lto score, uses saturationto restrict search space

    Usually, refinement operator is to

    Apply substitution

    Add literal to body of a clause

  • 8/6/2019 ILP Slides

    20/44

    EDAM Reading Group 2004 20

    FOIL Greedy search, score each clause using information gain:

    Let c1be a specialization ofc2

    Then WIG(c1) (weighted information gain) is

    Wherep is the number of possible bindings that make the clause

    cover positive examples,p is the number of positive examples

    covered and n is the number of negative examples covered.

    Background knowledge (B) is limited to ground facts.

    WI (c1

    ,c2

    ) ! p2

    I c1

    I c

    2

    I(c) ! log2

    p

    p n

  • 8/6/2019 ILP Slides

    21/44

    EDAM Reading Group 2004 21

    PROGOL Branch and bound top-down search

    UsesP-N-las scoring function: Pis number of positive examples covered

    Nis number of negative examples covered

    lis the number of literals in the clause

    Preprocessing step: build a bottom clause using a positive example and B to

    restrict search space. Uses mode declarations to restrict language

    B not limited to ground facts

    While doing branch and bound top-down search:

    Only use literals appearing in bottom clause to refine clauses.

    Learned literal is a generalization of this bottom clause.

    Can set depth bound on variable chaining and theorem proving

  • 8/6/2019 ILP Slides

    22/44

    EDAM Reading Group 2004 22

    Example ofBottom Clause

    E

    ! p(a),p(b)_ aE

    ! p(c),p(d)_ aB ! {r(a,b),r(a,c),r(c,d),

    q(X,Y) n r(Y,X)} var)(var,

    )(var,

    var)(var,(var):modes

    r

    constr

    qp

    Select a seed from positive examples,p(a), randomly or by order (first

    uncovered positive example)

    Gather all relevant literals (by forward chaining add anything from B

    that is allowable)

    Introduce variables as required by modes

    p(a)n r(a,b),r(a,c),r(c,d),q(b,a),

    q(c,a),q(d,c),r(a,b),r(a,c),r(c,d)

    p(A)n r(A,b),r(A,c),r(C,d),q(B,A),

    q(C,A),q(D,C),r(A,B),r(A,C),r(C,D)

  • 8/6/2019 ILP Slides

    23/44

    EDAM Reading Group 2004 23

    Iterate to Learn Multiple Rules Select seed from positive examples to build

    bottom clause.

    Get some rule If A B then P. Now throw

    away all positive examples that were covered bythis rule

    Repeat until there are no more positive examples.

    +++

    + + + ++

    ++-

    ---

    --

    -

    -

    -

    First seed

    selected

    First rule

    learned

    Second seed

    selected

  • 8/6/2019 ILP Slides

    24/44

    EDAM Reading Group 2004 24

    From Last Time

    Why ILP is not just Decision Trees.

    Language is First-Order Logic

    Natural representation for multi-relational settings

    Thus, a natural representation forfulldatabases

    Not restricted to the classification task.

    So then, what is ILP?

  • 8/6/2019 ILP Slides

    25/44

    EDAM Reading Group 2004 25

    What is ILP?(An obscene generalization)

    A way to search the space of First-Order

    clauses.

    With restrictions of course

    U-subsumption and search space ordering

    Refinement operators:

    Applying substitutions

    Adding literals

    Chaining variables

  • 8/6/2019 ILP Slides

    26/44

    EDAM Reading Group 2004 26

    More From Last Time

    Evaluation of hypothesis requires finding

    substitutions for each example.

    This requires a call to PROLOG foreach

    example

    For PROGOL only one substitution required

    For FOIL allsubstitutions are required (recallthep in scoring function)

  • 8/6/2019 ILP Slides

    27/44

    EDAM Reading Group 2004 27

    An Example: MIDOS

    A system for finding statistically-interestingsubgroups

    Given a population and its distribution oversome class

    Find subsets of the population for which thedistribution over this class is significantly

    different (an example)Uses ILP-like search for definitions of

    subgroups

  • 8/6/2019 ILP Slides

    28/44

    EDAM Reading Group 2004 28

    MIDOS

    Example:

    A shop-club database

    customer(ID, Zip, Sex, Age, Club) order(C_ID, O_ID, S_ID, Pay_mode)

    store(S_ID, Location)

    Say want to findgood_customer(ID)

    FOIL or PROGOL might find: good_customer(C) :- customer(C,_,female,_,_),

    order(C,_,_,credit_card).

  • 8/6/2019 ILP Slides

    29/44

    EDAM Reading Group 2004 29

    MID

    OS Instead, lets find a subset of customers that

    contain an interesting amount of members

    Target: [member, non_member]

    ReferenceDistribution: 1371 total

    customer(C,_,female,_,_), order(C,_,_,credit_card).

    Distribution: 478 total 1.54 %%

  • 8/6/2019 ILP Slides

    30/44

    EDAM Reading Group 2004 30

    MID

    OS How-to:

    Build a clause in an ILP fashion

    Refinement operators:

    Apply a substitution (to a constant, for example)

    Add a literal (uses given foreign link relationships)

    Use a score for statistical interestDo a beam search

  • 8/6/2019 ILP Slides

    31/44

    EDAM Reading Group 2004 31

    MIDOS

    Statistically-interesting scoring

    g: relative size of subgroup

    p0i: relative frequency of value i in entire populationpi: relative frequency of value i in subgroup

    g

    1gy p0i pi 2

    i!1..n

  • 8/6/2019 ILP Slides

    32/44

    EDAM Reading Group 2004 32

    MID

    OS Example 1378 total population, 478 female credit card

    buyers

    906 members in total population, 334 members

    among female credit card buyers:

    g

    1gy p0i pi

    2

    i!1..n

    4781378

    1 4781378

    y 9061378 334

    478 2

    4721378

    144478

    2

    ! 0.00154

  • 8/6/2019 ILP Slides

    33/44

    EDAM Reading Group 2004 33

    Scaling in Data Mining

    Scaling to large datasets

    Increasing number of training examples

    Scaling to size of the examples

    Increasing number of ground facts in

    background knowledge

    [Tang, Mooney, Melville, UT Austin, MRDM (SIGKDD) 2003.]

  • 8/6/2019 ILP Slides

    34/44

    EDAM Reading Group 2004 34

    BETHE

    Addresses scaling to number of ground facts in

    background knowledge

    Problem: PROGOL bottom-clause is too big This makes search space too big

    Solution: dont build a bottom clause

    Use FOIL-like search, BUT

    Guide search using some seed example

  • 8/6/2019 ILP Slides

    35/44

    EDAM Reading Group 2004 35

    Efficiency Issues

    Representational Aspects

    Search

    Evaluation

    Sharing computations

    Memory-wise scalability

  • 8/6/2019 ILP Slides

    36/44

    EDAM Reading Group 2004 36

    Representational Aspects

    Example: Student(string sname, string major, string minor)

    Course(string cname, string prof, string cred)

    Enrolled(string sname, string cname)

    In a natural join of these tables there is a one-to-onecorrespondance between join result and the Enrolled table

    Data mining tasks on the Enrolled table are reallypropositional

    MRDM is overkill

  • 8/6/2019 ILP Slides

    37/44

    EDAM Reading Group 2004 37

    Representational Aspects

    Three settings for data mining:

    Find patterns within individuals represented as tuples(single table, propositional)

    eg. Which minor is chosen with what major

    Find patterns within individuals represented as sets oftuples (each individual induces a sub-database)

    Multiple tables, restricted to some individual

    eg. Student X taking course A, usually takes course

    B

    Find patters within whole database Mutliple tables

    eg. Course taken by student A are also taken by student B

  • 8/6/2019 ILP Slides

    38/44

    EDAM Reading Group 2004 38

    Search

    Space restriction

    Bottom clauses as seen above

    Syntactical biases and typed logic Modes as seen above.

    Can add types to variables to further restrict language

    Search biases and pruning rules

    PROGOLs bound (relies on anti-monotonicity ofcoverage)

    Stochastic search

  • 8/6/2019 ILP Slides

    39/44

    EDAM Reading Group 2004 39

    Evaluation

    Evaluating a clause: get some measure ofcoverage

    Match each example to the clause: Run multiple logical queries.

    Query optimization methods from DBcommunity

    Rel. Algebra operator reordering BUT: queries forDB are set oriented (bottom-up),

    queries in PROLOG find a single solution (top-down).

  • 8/6/2019 ILP Slides

    40/44

    EDAM Reading Group 2004 40

    Evaluation

    More options

    k-locality, given some bindings, literals in a clause

    become independent: eg. ?-p( ,Y),q(Y,Z),r(Y,U).

    Given a binding forY, proofs ofq and rare independent

    So, find only one solution forq, if no solution found forrno

    need to backtrack.

    RelaxU-subsumption using stochastic estimates Sample space of substitutions and decide on subsumption

    based on this sample

  • 8/6/2019 ILP Slides

    41/44

    EDAM Reading Group 2004 41

    Sharing Computations

    Materialization of features

    Propositionalization

    Pre-compute some statistics

    Joint distribution over attributes of a table

    Query selectivity

    Store proofs, reuse when evaluating newclauses

  • 8/6/2019 ILP Slides

    42/44

    EDAM Reading Group 2004 42

    Memory-wise scalability

    All full ILP systems work on memory

    databases

    Exception: TILDE: learns multi-relational

    decision trees

    The trick: make example loop the outer loop

    Current solution:Encode data compactly

  • 8/6/2019 ILP Slides

    43/44

    EDAM Reading Group 2004 43

    References

    Dzeroski and Lavrac, RelationalData Mining,Springer, 2001.

    David Page, ILP: Just Do It, ILP 2000. Tang, Mooney, Melville, Scaling Up ILP to LargeExamples: Results on LinkDiscovery for CounterTerrorism, MRDM 2003.

    Blockeel, Sebag: Scalability and efficiency in multi-relational data mining. SIGKDDExplorations 5(1) July2003

  • 8/6/2019 ILP Slides

    44/44

    EDAM Reading Group 2004 44

    Overview

    Theory 1

    Theory 2

    Theory 3

    Performance

    Results

    Conclusion

    Future

    Work