challenges in physically inspired machine learning (pialm task force) dymitr ruta, phd bt group...

29
Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

Upload: patience-fitzgerald

Post on 25-Dec-2015

218 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

Challenges in Physically Inspired Machine Learning (PIALM Task Force)Dymitr Ruta, PhD

BT Group

Bogdan Gabrys, PhD

Bournemouth University

Page 2: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Agenda

• Motivation• The links between physics and information theory

• Data fields methodology for classification, clustering, data condensation and visualisation

• Information theoretic learning (ITL)

• Hybrid large scale optimisation (simulations)• Concluding remarks• Discussion, Q&A

Page 3: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Motivation (Business): Ability to Predict is the Key to Survival & Success

• DISCOVER the data driven problem that can be improved using intelligent learning techniques

• DESCRIBE the problem and its characteristics, prior knowledge, input and output data

• MODEL the relationship between inputs and outputs adopting existing algorithms or devise the new ones

• LEARN to reproduce outputs based on previously unseen inputs

• PREDICT the future outputs and save/earn money

Extract Clean Pre-process …… Tune Implement Productise Deploy Support …

Page 4: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Motivation (Personal): Physical phenomena guide artificial learning mechanisms

• Deep analogies between information and matter, energy and uncertainty, complexity and entropy

• Any knowledge can only be conveyed using certain amount of matter/energy

• Physics limits the ability to access, learn and know.• Convergence of matter and information at the

quantum level: “It From Bit”• Computational intelligence sciences are in chaos:

– Lack of unified theory of information and its processing– Vast amounts of data, yet mostly numericals are used– Many models, too many assumptions, poor performance

• Guidance of well established physical models

Page 5: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Key Analogies Between Physics and Information Theory

• Energy, Work → Uncertainty, Information• Law of the Total Energy/Uncertainty Preservation• Thermodynamic Entropy → Shannon Entropy• Matter, Space → Information, Knowledge Space• Heisenberg Principle of Uncertainty → Breiman

Principle of Uncertainty• Information exist only in the physical context• Physics and information theory converge at the

quantum level

Page 6: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Boundaries of information processing

• Physical constraints on information processing– Mass/energy, Speed of light, Location, Time

• Spatial bounds on information capacity– Quantum mechanics at the elementary particle level– Gravitational collapse into a black hole in the macro scale

• Communication is a dynamic process and requires certain energy transmitted with power P

GcR sb23

]/[ /AP

[S.Lloyd et al. Phys. Rev. 93(10) 2004]

Page 7: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Quantum effects about to emerge

Turing / von Neumann: Universal MachineWe can make computers

Landauer: Information is PhysicalComputers need cooling fans

Deutch: Information is QuantumComputers get weird

Page 8: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Where lies the problem: stop the atom

Page 9: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Quantum computingState Amplitude Probability

* (α+i β) (|α|2+|β|2)000 a = 0.37 + i 0.04 0.14001 b = 0.35 + i 0.43 0.31010 c = 0.09 + i 0.31 0.10011 d = 0.30 + i 0.30 0.18100 e = 0.11 + i 0.18 0.04101 f = 0.40 + i 0.01 0.16110 g = 0.09 + i 0.12 0.02111 h = 0.15 + i 0.16 0.05

11|0||

22

• Multitude of states, inherent parallelism• Applications:

– Search in the unsorted database– Factoring large numbers (cryptography)– Simulating quantum effects in complex

systems

1,0bit

500 qubit system:

2500 states at a single pulse

Page 10: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Information Thermodynamics & Complexity

XAx xp

xpxH)(

1log)()(

• Landauer: Any logical data processing must be accompanied by the corresponding entropy increase of the environment heat waste of at least kTln2/bit

• Equivalence between thermodynamic and information (Shannon) entropy

• Information complexity: size, dimensionality, structure• Complexity measures the cost of obtaining

information• Kolmogorov Algorithmic Complexity: the shortest

program code that can obtain the requested• Information distance: )|(),|(max),( xyKyxKyxDI

Page 11: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Data Particles – The Prime Inspiration

• Across different scales in Physics the two particle interaction paradigms are dominant:– Dynamic particle models where particles act upon each

other and/or environment and move accordingly– Static or statistic particle models where scale and

complexity forces statistical description of particles

• Both areas are now the field of our exploration towards the possibility of a synergic merger:– ITLDynamic data fields provide the whole methodology

for dealing with mobile data particles,,, while…– Kernel Machines, Information Theoretic Learning typically

treats data statically– Can a unified methodology be proposed?

Page 12: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Data Field Models

TS

TS

TSS XXNXXNXXD )1,(2),1( 11

• Distance matrix calculation

• Charged data points• Central, potential field• Attracting/repelling force

Page 13: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Electrostatic Field for Classification

Page 14: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Field generated clustering

• All the data points let free to merge in a single point• Data hierarchies arranged as time passes,• Data trajectories form dynamic clustering dendrograms

Gravity Field Lennard-Jones Potential

Page 15: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Dynamic Data Condensation

• Terabytes of complex corporate data – unexploited• State-of-the-art machine learning techniques – O(n2)• Real-time and adaptive models require frequent

retraining• Data are being condensed using:

– Random sub-sampling– Parzen density based methods– Multi-resolution spatial analysis– Hierarchical clustering models– …..

• …but dynamic data condensation is not approached• …but labelled data are not being condensed?

Page 16: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Soft Fixed Field Electrostatic Condensation Process

• Builds a soft Parzen density estimate for each class of data• Normalises and freezes the original class distribution• Electrostatic field with Gaussian relation on the distance is built:

• The data are let free to move and merge towards lower energy states yet the original field continuously guards the distribution

• Fast matrix implementation in Matlab

Page 17: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

99% data reduction, 99% performance retention

Page 18: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Discriminant Function Visualisation

Quadratic

Decision Tree

)(maxarg1

xP j

C

jd

)(max1

xPD j

C

j

Page 19: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Visualisation of Classifier Fusion

• Mean

• Max

• Min

• Product

• Majority Vote

ijNi

C

jP1

1fus Fmaxarg

ij

N

i

C

jP

11max maxmaxarg

ij

N

i

C

jP

11min minmaxarg

N

i ij

C

jP11

prod maxarg

ijNi

C

jPvote 1

1vote maxarg

N

i ij

C

jP11

mean maxarg

Page 20: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Information Theoretic Learningfor data transformation

XAx

R xpxH

)(log

1

1)(

)|()(),( YCHCHYCI

BTWV

dyypcPycp

ALLV

dyypcP

INV

dyycpYCIc

Yc

Yc

Y )()(),(2)()(),(),( 222

• Mutual Information• Renyi’s Entropy• Information potentials:

ci

BTW

ci

ALL

ci

IN

i y

V

y

V

y

V

y

I

2

Information Forces

Linear Feature Transformation

[Torkkola NIPS 2001]

Page 21: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Information Theoretic Learningfor classification and clustering

• ITL Framework [Principe et al 1999]• Class label transmission [Archambeau et al 2004]: a

new generic method for classification based on ITL and Parzen density model

• Generalised information distances used for feature generation [Kaplan & Hafner 2006]

• Classification with unlabelled data using ITL-linked density divregence minimisation [Jeong et al 2005]

• Clustering by separating cell identities using MIM [Schneideman et at 2003]

• Unsupervised Clustering by MIM between data and parameters [Herschkowitz & Nadal 1999]

Page 22: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

The Challenges to Tackle

• Data obesity and data quality issues• Model and data complexity control,• Multidimensional information uncertainty and fusion• Natural language processing

Zadeh’s Generalised Theory of Information Uncertainty:

Information is a generalised constraint

Most Swedes are tall: ))()(()( duuuhhGC tallbalikely

Page 23: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Particle Dynamics based Exploration Models

• Simulated Annealing – random particle exploration of the input space in the cooling environment gradually slowing particles velocity

• Stochastic Diffusion Search – random agent exploration with one-to-one communication

• Ant Colony Optimisation – spatial path optimisation inspired by ant laid pheromone trails

• Particle Swarm Optimisation – swarm dynamically led by the local best (one-to-all communication)

• Particle filters – a flexible sequential predictor based on sampling from a sequence of probability distributions using large number of particles

Page 24: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Business Vision for the Future

• Distributed data mining• Multimedia mining (voice, text, video)• Open and flexible data structures• Unified data processing framework• Online secured predictive services• Networked, evolving and adaptable software• Automated knowledge discovery• Artificial awareness: self-aware software (>2020)

Page 25: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Composite Optimisation Problem

• Each component treated separately• Lack of coordination• Modelling inconsistencies

• The challenge: Full Optimisation

Page 26: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Task Force Achievements

• Seminars given:– IDSIA, Switzerland, Prof. J. Schmidhuber– Birmingham University, Dr Peter Tino– Aston University, Prof. David Lowe

• Conferences:– ICCMSE’2007 with a paper published in American Institute

of Physics (AIP) Conference Proceeding Series

• Publications:– D. Ruta, B. Gabrys.

Reducing Spatial Data Complexity fort Classification Models. Accepted to the International Conference of Computational Methods in Sciences and Engineering ICCMSE 2007, American Institute of Physics Proceeding Series

– D. Ruta, B. Gabrys. A Framework for Machine Learning based on Dynamic Physical Fields. Accepted to the Special Issue of Natural Computing Journal on Nature-inspired Learning and Adaptive Systems

• Establishing an active group of about 20 researchers networking around the PIALM and related issues

Page 27: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

• Proposal submitted to the EU 7th to merge different PIALM directions and follow up research cantred around Information Theoretical Learning and Dynamic Particle Models.

• Transforming the PIALM contacts into prospective project support group with regular meetings agenda, newsletter and closer collaborative ventures.

• Organisation of Special Sessions during related Conferences

• Further applications for networking/travel grants• Widening the scope of PIALM into several focus themes

to strengthen the link with other NISIS projects and better address changing needs of the society

PIALM Follow-up and Future Activities

Page 28: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc

Conclusions

• Business analytics quite disparate from state-of-the-art research in machine learning, pattern recognition etc.

• Over-complex black-box type models unusable in business applications

• Customer analytics gains on importance and the modelling tools for customer-centric service providers

• Online predictive and adaptable services soon to emerge

• Nature continues to provide inspirations for data-driven modelling and learning

Page 29: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University

© British Telecommunications plc