machine learning - eden au · supervised learning is the machine learning task of learning a...

Post on 15-Mar-2020

22 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Machine LearningEden Au, James Fulton

1

https://prezi.com/p/wcvtw_0ssnyv/ 2

3http://joelgrus.com/2013/06/09/post-prism-data-science-venn-diagram/

4https://uk.mathworks.com/discovery/machine-learning.html

Supervised Learning

“ Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs.

6

7

The Machine● Linear regression ● K-nearest neighbours ● Support Vector Machines ● Random Forest● Naïve bayes

8

9Wikipedia

Linear Regression - Ordinary Least Squares1. Linearity (solved by feature engineering e.g. polynomial regression)2. Error-free inputs (solved by generalized least squares)3. Common variance (solved by weighted least squares)

10

Wikipedia

K-nearest neighbours1. Inherently non-linear

(non-parametric)2. Simple3. Parameter selection4. High memory requirement -

slow prediction stage

11

Main Challenges - Feature Engineering

Shallow learning requires feature engineering

12

ArtificialNeural Network

Neural NetworksEach layer consists of

1. Linear regression2. Non-linear ‘activation’

Multiple layers of network enables

1. Sophisticated features to be learned

Problem:

1. So many parameters - requires much more data

14

ConvolutionalNeural Network

CNN1. Leverages spatio-temporal

relationships2. Applies discrete convolution

operations3. Kernels are the only trainable

parameters - reduce # parameters

4. Insights in kernels:

16

Compressing and separatinginformation

The value of each pixel does not matter, the relationships among neighbouring pixels do.

17

18

19

Challenges

1. Data quality (GIGO)2. Data quantity (overfitting)3. Black box (neural networks)4. Feature engineering (for shallow learning)5. Domain expertise6. No guarantee

Unsupervised Learning

“ Unsupervised machine learning algorithms infer patterns from data, without reference to known outcomes

21

What is the underlying structure?

https://www.bbc.co.uk/news/science-environment-47267081https://xkcd.com/1838/

Anomaly Detection

“ Can automatically discover unusual data points in your dataset

24

25

Anomaly Detection: example

- Looking for misrecorded values

Credit: National Science Foundation

26

Anomaly Detection: example

- Classifying extreme events

https://www.ncdc.noaa.gov/extremes/cei/definition

Clustering

“ Allows you to automatically split the dataset into groups according to similarity

28

Clustering

29

30

Clustering:Example

Antarctic ocean temperature profiles

31

Clustering:Example

Antarctic ocean temperature profiles

32

Clustering:Example

Antarctic ocean temperature profiles

Latent Variable Models

“ Decomposing the dataset into multiple components

34

35

Latent Variable Models

Non‑random correlation structures and dimensionality reduction in multivariate climate data - Martin Vejmelka et al. 36

Latent Variable Models: Example

Component Analysis

Autoencoder

“ Tries to learn how to compress data down to the most important components

38

39

Autoencoder:Basic Structure

Image Credit: https://www.jeremyjordan.me/autoencoders/

40

Autoencoder:Application

Dimensionality reduction and finding ‘extreme’ weather events

Topic Modelling

“ Latent variable models applied to text to boost your literature searches

42

43

44

Topic Modelling: Example

Finding the topics of active research and research network

structure

45

Causal Inference

“What are the dynamics of the system? What drives what?

47

Causal Inference: Example

Finding direct and indirect teleconnections

48

Generative Adversarial Networks

“ Learns the distribution function of data so that you can draw more unique samples

50

51

GANs:What they do- Generator takes random

input and tries to create fake image

- Discriminator tries to tell difference between real and fake images

https://thispersondoesnotexist.com/

52

GANs: Example

Generating new, unique examples using what the

network has discovered about the data set

53

GANs: Example

Used to emulate a simulator

Opportunities/resources

“ In data science, 80% of the time is spent preparing data with the remaining 20% spent complaining about the need to prepare data...

55

Questions?

Gaussian Processes

“ Fit a statistical model with minimum assumptions which will return a value and an uncertainty in that value

59

60

Gaussian Processes:

problem

Gaussian Processes:

problem

61

Gaussian Processes:

problem

62

Gaussian Processes:

example

Fitting model to fill gaps in data

63

Gaussian Processes:

exampleOptimising an expensive

experiment or physical model

64

Gaussian Processes:

example

65

top related