mathematics – a new domain for datamining? simon colton [email protected] simonco universities...

9
Mathematics – A new Domain for Datamining? Simon Colton [email protected] http://www.dai.ed.ac.uk/~simonco Universities of Edinburgh & York United Kingdom

Upload: howard-fox

Post on 12-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mathematics – A new Domain for Datamining? Simon Colton simonco@cs.york.ac.uk simonco Universities of Edinburgh & York United

Mathematics – A new Domain for Datamining?

Simon Colton

[email protected]

http://www.dai.ed.ac.uk/~simonco

Universities of Edinburgh & York

United Kingdom

Page 2: Mathematics – A new Domain for Datamining? Simon Colton simonco@cs.york.ac.uk simonco Universities of Edinburgh & York United

Mathematics is the new Biology

Many databases of math information Massive potential for datamining

This talk Overview of mathematics databases Hurdles to overcome for datamining Suggested Methods Potential Rewards

Page 3: Mathematics – A new Domain for Datamining? Simon Colton simonco@cs.york.ac.uk simonco Universities of Edinburgh & York United

Mathematical Databases

Mathworld encyclopedia 8974 entries, 153958 cross-references, 1400 pages

MathSciNet citation service 10843 reviews, 151350 articles, 358104 authors

Mizar library of formalised maths 666 articles, 2000 concept definitions

Mathematica CAS functions Tens of thousands of computer algebra functions

Page 4: Mathematics – A new Domain for Datamining? Simon Colton simonco@cs.york.ac.uk simonco Universities of Edinburgh & York United

Mathematical Databases

Encyclopedia of Integer Sequences 60,000 sequences with terms, definitions, etc.

Inverse Symbolic Calculator 50 million constants, 400 tables

Gap library (CAS) 6 million groups

Ad hoc databases everywhere Geometry junkyard, My favourite constants

Page 5: Mathematics – A new Domain for Datamining? Simon Colton simonco@cs.york.ac.uk simonco Universities of Edinburgh & York United

Problems with the Data

Highly heterogeneous No agreed upon format for concepts, conjectures

Distributed Hundreds of websites

Dynamic Eg. 50 new integer sequences daily

Really need to impose homogenuity

Page 6: Mathematics – A new Domain for Datamining? Simon Colton simonco@cs.york.ac.uk simonco Universities of Edinburgh & York United

Suggestions for Datamining

Conjectures: simple relationships between concepts Equivalence, implication, nonexistence, moonshine

Need to worry about interestingness Plausibility, complexity, surprisingness

Concept formation to get correct statements Composition, tweaking, monster-barring

Page 7: Mathematics – A new Domain for Datamining? Simon Colton simonco@cs.york.ac.uk simonco Universities of Edinburgh & York United

Potential Rewards - Example

NumbersWithNames program http://machine-creativity.com/programs/nwn Datamining the Encyclopedia of Integer Sequences

Perfect numbers are pernicious Perfect: sum of divisors is twice the number Pernicious: prime number of 1s in binary 6, 28, 496, ….

Found by looking for subsequences Lots more of similar examples

Page 8: Mathematics – A new Domain for Datamining? Simon Colton simonco@cs.york.ac.uk simonco Universities of Edinburgh & York United

Potential Rewards: Money & Fame

Money EPSRC funded big project: e-science E-maths initiative being discussed

Fame Monstrous Moonshine Conjectures Found by accident (numbers 196833 & 196884) Led to Fields Medal (see paper)

Page 9: Mathematics – A new Domain for Datamining? Simon Colton simonco@cs.york.ac.uk simonco Universities of Edinburgh & York United

Conclusions and Future Work

Consider mathematics as a datamining domain Much data available, but there are problems Techniques required are simple Possible to make important conjectures

Cross domain/database sharing of data Projects like NumbersWithNames http://machine-creativity.com/programs/nwn