Download - Brief Tour of Machine Learning
What is Machine Learning?
• Very multidisciplinary field – statistics, mathematics, artificial intelligence, psychology, philosophy, cognitive science…
• In a nutshell – developing algorithms that learn from data
• Historically – flourished from advances in computing in the early 60’s, resurgence in the late 90’s
Main areas in Machine Learning
#1 Supervised learningassumes a teacher exists to label/annotate data
#2 Unsupervised learningno need for a teacher, try to learn relationships
automatically
#3 Reinforcement learningbiologically plausible, try to learn from
reward/punishment stimuli/feedback
Machine Learning Area #1 Supervised Learning
More about Supervised Learning
Perhaps the most well studied area of machine learning – lots of nice theory adapted from statistics/mathematics.
Assume the existence of a training and test set
Main sub-areas of research are:• Pattern recognition (discrete labels)• Regression (continuous labels)• Time series analysis (temporal dependence in
data)
i.i.d. assumption commonly
made
Machine Learning Area #1 Supervised Learning
The formalisation of data• How to we formally describe our data?
Object
+
Label
Commonly represented as a feature vector – this describes the object
1 2, , , di i ix x xix
The individual features can be real, discrete, symbolic… eg. patient symptoms: temperature, sex, eye colour…
Property of the object that we want to predict in the future using our training data – e.g.. screening cancer labels could be Y = {normal, benign, malignant}
Machine Learning Area #1 Supervised Learning
The formalisation of data (continued)• What is training and test data?
2 7 6 1 7 ?
Training set of images
?
We learn from the training data, and try to predict new unseen test data. More formally we have a set of n training and test examples (information pairs – object + label) from the some unknown probability distribution P(X,Y).
New test images – labels either not known or withheld from the learner
1 1 2 2( , ), ( , ), , ( , ) s.t. ( , )n n iy y y y X Y ix x x x
x
y
Machine Learning Area #1 Supervised Learning
More about Pattern Recognition
Lots of algorithms/techniques – the main contenders:
1. Support Vector Machines (SVM)2. Nearest Neighbours3. Decision Trees4. Neural Networks5. Multivariate Statistics6. Bayesian algorithms7. Logic programming
Machine Learning Area #1 Supervised Learning
The mighty SVM algorithm• Very popular technique – lots of followers, relatively new• Very simple technique – related to the Perceptron, is a
linear classifier (separates data into half spaces).
☺☺
☺
☺
☺
☺
☺
☺
■■ ■
■ ■■
Concept – keep the classifier simple, don’t over fit the data the classifier generalises well on new test data (Occams razor)
Concept – if data not linearly separable use a kernel Φ map into another higher dimensional feature space and data may be separable
Machine Learning Area #1 Supervised Learning
Hot topics in SVM’s
• Kernel design – central to the application to data, eg. when the objects are text documents, the features are words incorporate domain knowledge about grammar.
• Applying the kernel technique to other learning algorithms e.g.. Neural Networks
Machine Learning Area #1 Supervised Learning
The trusty old Nearest Neighbour algorithm
• Born in the 60’s – probably the most simple of all algorithms to understand.
• Decision rule = classify new test examples by finding the closest neighbouring example in the training set and predict the same label as the closest.
• Lots of theory justifying its convergence properties.
• Very lazy technique, not very fast – has to search for each test example.
Machine Learning Area #1 Supervised Learning
Problems with Nearest Neighbours
• View examples in Euclidean space, can be very sensitive to feature scaling.
• Finding computationally efficient ways to search for the Nearest Neighbour example.
Machine Learning Area #1 Supervised Learning
Decision Trees• Many different varieties C4.5, CART, ID3…
• Algorithms build classification rules using a tree of if-then statements.
• Constructs tree using Minimum Description Length (MDL) principles (tries to make the tree as simple as possible)
IF temperature > 65
Patient has fever IF dehydrated = yes
Patient has fluPatient has pneumonia
Machine Learning Area #1 Supervised Learning
Benefits/Issues with Decision Trees
• Instability – minor changes to training data makes huge changes to decision tree
• User can visualise/interpret the hypothesis directly, can find interesting classification rules
• Problems with continuous real attributes, must be discretalised.
• Large AI following, and widely used in industry
Machine Learning Area #1 Supervised Learning
Mystical Neural Networks• Very flexible, learning is a gradient
descent process (back propagation)• Training neural networks involves a lot of
design choices:– what network structure, how many hidden
layers…– how to encode the data (must be values [0,1])– use momentum to speed up convergence – Use weight decay to keep simple
Machine Learning Area #1 Supervised Learning
Training a neural network
Hidden LayerInput layer Output layer
Menopausal status
Ultrasoundscore
CA125
1
0
Sigmoid function
Learnt hypothesis is represented by the weights that interconnect each neuron
E(w)
w1
w2
The aim in training the neural network is find the weight
vector w that minimises the error E(w) on the training set
Gradient descent problem
Machine Learning Area #1 Supervised Learning
Interesting applications
• Bioinformatics:– genetic/protein code analysis– microarray analysis– gene regulatory pathways
• WWW:– classifying text/html documents– filtering images– filtering emails
Machine Learning Area #1 Supervised Learning
Bayesian Algorithms
• Try to model interrelationships between variables probabilistically.
• Can model expert/domain knowledge directly into the classifier as prior belief in certain events.
• Use basic axioms of probability theory to extract probabilistic estimates
Machine Learning Area #1 Supervised Learning
Bayesian algorithms in practice
• Lots of different algorithms – Relevance Vector Machine (RVM), Naïve Bayes, Simple Bayes, Bayesian Belief Networks (BBN)…
• Has a large following – especially Microsoft Research
Weather = sunny
Temperature < 65 Humidity > 100
Play TennisPlay Monopoly
Causal links between features can be modelled
Machine Learning Area #1 Supervised Learning
Issues with Bayesian algorithms
• Tractability – to find solutions need numerical approximations or take computational shortcuts
• Can model causal relationships between variables
• Need lots of data to estimate probabilties using obsevered training data frequencies
Machine Learning Area #1 Supervised Learning
Very important side problems
• Feature Selection/Extraction – Using Principle Component Analysis, Wavelets, Cananonical Correlation, Factor Analysis, Independent Component Analysis
• Imputation – what to do with missing features?• Visualisation – make the hypothesis human
readable/interpretable• Meta learning – how to add functionality to
existing algorithms, or combine the prediction of many classifiers (Boosting, Bagging, Confidence and Probability Machines)
Machine Learning Area #1 Supervised Learning
Very important side problems (continued)
• How to incorporate domain knowledge into a learner
• Trade off between complexity (accuracy on training) vs. generalisation (accuracy on test)
• Pre-processing of data, normalising, standardising, discretalising.
• How to test – leave one out, cross validation, stratify, online, offline…
Machine Learning Area #2 Unsupervised Learning
An introduction to Unsupervised Learning
• No need for a teacher/supervisor
• Mainly clustering – trying to group objects into sensible clusters
• Novelty detection – finding strange examples in data
Clustering examples Novelty detection
Machine Learning Area #2 Unsupervised Learning
Algorithms available
• For clustering: EM algorithm, K-Means, Self Organising Maps (SOM)
• For novelty detection: 1-Class SVM, support vector regression, Neural Networks
Machine Learning Area #2 Unsupervised Learning
Issues and Applications
• Very useful for extracting information from data.• Used in medicine to identify disease sub types.• Used to cluster web documents automatically• Used to identify customer target groups in
buisness• Not much publicly available data to test
algorithms with
Machine Learning Area #1 Supervised LearningMachine Learning Area #3 Reinforcement Learning
Reinforcement Learning
Learning inspired by nature
Machine Learning Area #1 Supervised LearningMachine Learning Area #3 Reinforcement Learning
An introduction
• Most biologically plausible – feedback given through stimuli reward/punishment
• A field with a lot of theory needing for real life applications (other than playing BackGammon)
• But also encompasses the large field of Evolutionary Computing
• Applications are more open ended • Getting closer to what public consider AI.
Machine Learning Area #1 Supervised LearningMachine Learning Area #3 Reinforcement Learning
Traditional Reinforcement Learning
• Techniques use dynamic programming to search for optimal strategy
• Algorithms search to maximise their reward.• Q – Learning (Chris Watkins next door) is most
well known technique.• Only successful applications are to games and
toy problems.• A lack of real life applications.• Very few researchers in this field.
Machine Learning Area #1 Supervised LearningMachine Learning Area #3 Reinforcement Learning
Evolutionary Computing
• Inspired by the process of biological evolution.
• Essentially an optimisation technique – the problem is encoded as a chromosome.
• We find new/better solutions to problem by sexual reproduction and mutation.
• This will encourage mutation
Machine Learning Area #1 Supervised LearningMachine Learning Area #3 Reinforcement Learning
Techniques available in Evolutionary Computing
• Lower level optimisers:– Evolutionary Programming, Evolutionary
Algorithms– Genetic Programming, Genetic Algorithms, – Evolutionary Strategy– Simulated Annealing
• Higher level optimisers: – TABU search– Multi-objective optimisation
Objective 1
Ob
ject
ive
2
Pareto front of optimal solutions – which one
should we pick?
Machine Learning Area #1 Supervised LearningMachine Learning Area #3 Reinforcement Learning
Issues in Evolutionary Computing
• How to encode the problem is very important
• Setting mutation/crossover rates is very adhoc
• Very computationally/memory intensive
• Not much theory can be developed – frowned upon by machine learning theorists