mathematics – a new domain for datamining? simon colton [email protected] simonco universities...
TRANSCRIPT
Mathematics – A new Domain for Datamining?
Simon Colton
http://www.dai.ed.ac.uk/~simonco
Universities of Edinburgh & York
United Kingdom
Mathematics is the new Biology
Many databases of math information Massive potential for datamining
This talk Overview of mathematics databases Hurdles to overcome for datamining Suggested Methods Potential Rewards
Mathematical Databases
Mathworld encyclopedia 8974 entries, 153958 cross-references, 1400 pages
MathSciNet citation service 10843 reviews, 151350 articles, 358104 authors
Mizar library of formalised maths 666 articles, 2000 concept definitions
Mathematica CAS functions Tens of thousands of computer algebra functions
Mathematical Databases
Encyclopedia of Integer Sequences 60,000 sequences with terms, definitions, etc.
Inverse Symbolic Calculator 50 million constants, 400 tables
Gap library (CAS) 6 million groups
Ad hoc databases everywhere Geometry junkyard, My favourite constants
Problems with the Data
Highly heterogeneous No agreed upon format for concepts, conjectures
Distributed Hundreds of websites
Dynamic Eg. 50 new integer sequences daily
Really need to impose homogenuity
Suggestions for Datamining
Conjectures: simple relationships between concepts Equivalence, implication, nonexistence, moonshine
Need to worry about interestingness Plausibility, complexity, surprisingness
Concept formation to get correct statements Composition, tweaking, monster-barring
Potential Rewards - Example
NumbersWithNames program http://machine-creativity.com/programs/nwn Datamining the Encyclopedia of Integer Sequences
Perfect numbers are pernicious Perfect: sum of divisors is twice the number Pernicious: prime number of 1s in binary 6, 28, 496, ….
Found by looking for subsequences Lots more of similar examples
Potential Rewards: Money & Fame
Money EPSRC funded big project: e-science E-maths initiative being discussed
Fame Monstrous Moonshine Conjectures Found by accident (numbers 196833 & 196884) Led to Fields Medal (see paper)
Conclusions and Future Work
Consider mathematics as a datamining domain Much data available, but there are problems Techniques required are simple Possible to make important conjectures
Cross domain/database sharing of data Projects like NumbersWithNames http://machine-creativity.com/programs/nwn