machine learning lecture for methodological foundations of biomedical informatics fall 2015 (bmsc-ga...

34
Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Upload: judith-booth

Post on 19-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Machine LearningLecture for Methodological Foundations of Biomedical Informatics

Fall 2015 (BMSC-GA 4449)

Sisi MaNYU Langone Medical Center

CHIBI

Page 2: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

What type of problems can machine learning solve?

• Re Real Estate

Artificial Intelligence

Retail Sales

Conservation

Climate

Current Active Projects on Kaggle as of Oct, 26th,2015

Page 3: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

What type of problems can machine learning solve?

Predominantly:

Classification

Page 4: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

How to classify?

Main Ways to Classify:- Unsupervised- Supervised

Page 5: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Unsupervised Learning

Group similar items together

Comics credit: http://nlp.cs.berkeley.edu/comics.shtml

Page 6: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Unsupervised Learning

Since the definition of similarity is arbitrary, one can get different labeling solutions.

Page 7: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Unsupervised Learning

The solution depend on both: (1) what variables were used to construct the similarity metric (2) how the similarity metric were constructed.

Page 8: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Unsupervised LearningThe solution depend on both: (1) what variables were used to construct the similarity metric (2) how the similarity metric were constructed.

Page 9: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Unsupervised LearningThe solution depend on both: (1) what variables were used to construct the similarity metric (2) how the similarity metric were constructed.

Lowe, 2012

Page 10: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Unsupervised LearningThe solution depend on both: (1) what variables were used to construct the similarity metric (2) how the similarity metric were constructed.

Image Credit: https://en.wikipedia.org/wiki/Metric_(mathematics)

Page 11: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Unsupervised Learning

How do we know the solution is good?It corresponds to something we care about.

Page 12: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Unsupervised Learning

Page 13: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Supervised Learning

Page 14: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Supervised Learning

Overfitting

Duda, 2ed

Page 15: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Supervised Learning

Overfitting

Image Credit: https://commons.wikimedia.org/wiki/File:Overfitting.svg

Page 16: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Supervised Learning

How do I know if I am overfitting?

Validation Data

Page 17: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Supervised Learning

How do I know if I am overfitting?

Duda, 2ed

Page 18: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

18

Supervised Learning

Support Vector Machine

Key Characteristics of SVM• Maximum gap to prevent overfitting• QP problems can be solved with

standard methods.• Soft margins to tolerate noise• Kernel trick for linearly non-separable

dataStatnikov et al., 2011

Most modern algorithms have built in mechanism to minimize overfitting.

Page 19: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

19

Predictive Modeling: A Simplified General Framework

Validation Data

Page 20: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

20

Predictive Modeling: Cross Validation for error estimation and model selection

Ma et al., 2015 (in preparation)

Page 21: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Machine Learning vs Statistics

Robert Tibshiriani

Page 22: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Machine Learning vs Statistics

Robert Tibshiriani

Page 23: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Machine Learning vs Statistics

Machine Learning Statistics

One major difference between machine learning and statistics :How is the model evaluated?

Page 24: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Machine Learning vs StatisticsWhat is a good model? According to most statistician, in practice especially

Most commonly evaluated by R-squared Breiman, 2001

Page 25: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Machine Learning vs Statistics

Validation Data

What is a good model? According to machine learning researcher.

Page 26: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

The Future

Page 27: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI
Page 28: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

What’s the job?

Page 29: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI
Page 30: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI
Page 31: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI
Page 32: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI
Page 33: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Homework

Research bias-variance decomposition and answer the following question from ”An Introduction to Statistical Learning”.

Page 34: Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI

Resources