cs598 machine learning in computational biology (lecture 1

27
CS598 Machine Learning in Computational Biology (Lecture 1: Introduction) Professor Jian Peng Teaching Assistant: Rongda Zhu

Upload: hoangquynh

Post on 14-Feb-2017

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS598 Machine Learning in Computational Biology (Lecture 1

CS598 Machine Learning in

Computational Biology (Lecture 1: Introduction)

Professor Jian Peng Teaching Assistant: Rongda Zhu

Page 2: CS598 Machine Learning in Computational Biology (Lecture 1

IntroductionInstructor:

• Jian Peng My office location: 2118 SC Office hour: Thursday, 3:15pm-4:45pm Email: [email protected]

• My own research: Computational Biology and Graphical Models

Teaching Assistant: • Rongda Zhu, PhD student ([email protected]) (Department of Computer Science)

• Rongda’s research: Machine Learning and Probabilistic Inference

Course website: http://web.engr.illinois.edu/~jianpeng/teaching/CS598_Fall15/index.htm

Page 3: CS598 Machine Learning in Computational Biology (Lecture 1

Course Information

Schedule (tentative)

• Introductory lectures (Aug 25 to Sep 8) • Biology data analysis • Probabilisitic models

• Student presentations (Sep 8 to Dec 3)

• Research survey • Research article

• Course projects

• Proposal presentation (Oct 6 & 8) • Final presentation (Dec 8 &10)

Page 4: CS598 Machine Learning in Computational Biology (Lecture 1

ObjectivesIntroduction to computational biology

• Important problems in computational biology • Machine learning techniques for data analysis • Understand how methods work

Learning to do research

• Paper presentation • Ability to present key ideas to other people • Ability to ask critical questions

• Course project experience • Hands-on practice with real datasets • Propose and perform independent research • Active participation in the field

Page 5: CS598 Machine Learning in Computational Biology (Lecture 1

Prerequisites

Biology:

• Basic concepts in molecular biology • Reference:

Molecular Biology for Computer Scientists by Lawrence Hunter

Machine Learning:

• Probability and statistics • Optimization • Textbook:

Pattern Recognition and Machine Learning by Christopher Bishop

Page 6: CS598 Machine Learning in Computational Biology (Lecture 1

Grading

• Class attendance: 10%

• Presentation: 30%

• Course Project: 60% • Proposal • Report • Presentation

Page 7: CS598 Machine Learning in Computational Biology (Lecture 1

Presentation

• Discuss papers you would like to present with me at least one week before your presentation

• Research survey (at least five papers) • Methodology: applications to different problems • Research problem: the state-of-the-art methods

• Research article (preferred) • Background: what is the problem? why important? • Methodology: how does it work? • Results: what are the findings? any conclusions?

• Open-ended Q & A and debate

Page 8: CS598 Machine Learning in Computational Biology (Lecture 1

Questions about the presentation?

Page 9: CS598 Machine Learning in Computational Biology (Lecture 1

Course Project

Computational techniques • Novel machine learning algorithms • Efficient algorithms that scale on large datasets • New probabilistic models for biological data

Biological problems • New biological findings • Improvements over existing method • New computational biological problems

The goal is to have something publishable or presentable in conferences or journals.

Page 10: CS598 Machine Learning in Computational Biology (Lecture 1

Course Project

• Proposal presentation (Oct 6 & 8) • written proposal due by Oct 4 • at least four pages • discuss with me about your projects in Sep • 15-min presentation in class • I will also give you a list of potential projects

if you don’t have one by Sep 20.

• Final presentation (Dec 8 &10) • Report due by Dec 12 • at least eight pages • 15-min oral presentation and poster

Page 11: CS598 Machine Learning in Computational Biology (Lecture 1

Course Project

• Team size • one or two • make clear your contribution in the project report

• Implementation • put your code/data on github • get your hands dirty and work on real-world datasets • your contribution should be original

Page 12: CS598 Machine Learning in Computational Biology (Lecture 1

Questions about the course project?

Page 13: CS598 Machine Learning in Computational Biology (Lecture 1

Introduce yourself

Page 14: CS598 Machine Learning in Computational Biology (Lecture 1

Why computational biology is hard?

• High-dimensional

• Noisy

• Huge

• Sparse

Page 15: CS598 Machine Learning in Computational Biology (Lecture 1

Biological Data

Sequence data

• Protein/DNA sequence • Generative and discriminative models for sequences • Deep learning

Matrix data

• Gene expression • Dimensionality reduction and feature selection • Low-rank approximation

Page 16: CS598 Machine Learning in Computational Biology (Lecture 1

Biological Data

Network data

• Molecular network • Random walk algorithms • Graphical models and approximate inference

Heterogeneous data

• Dimensionality reduction • Probabilistic models for data integration • Network-based data integration

Page 17: CS598 Machine Learning in Computational Biology (Lecture 1

Machine Learning

Supervised learning • Prediction:

• classification: SVM, logistic regression, random forest • structured output: CRF, structured SVM

• Feature finding: • Sparse learning: LASSO and elastic nets

Unsupervised learning • Dimensionality reduction and embedding:

• manifold learning: Isomap, LLE, t-SNE • component analysis: PCA, ICA

• Probabilistic modeling: • graphical model: HMM, Bayesian networks, RBM • methodology: variational inference, sampling

Page 18: CS598 Machine Learning in Computational Biology (Lecture 1

Please read “Molecular Biology for Computer Scientists” by Lawrence Hunter

TODO after this class

Page 19: CS598 Machine Learning in Computational Biology (Lecture 1

Examples of my research projects

Page 20: CS598 Machine Learning in Computational Biology (Lecture 1

Protein sequence, structure and function

ACDEEEFGHIKL----MPQRSTVWY ACDE--FGHIKLRMQP----STVWY

sequence

structure function

Page 21: CS598 Machine Learning in Computational Biology (Lecture 1

Network analysis for disease modeling

human disease network

network analysis

new disease biology (potential drug targets)

Page 22: CS598 Machine Learning in Computational Biology (Lecture 1

Pharmacogenomics and cancer genomics

Figure from the DREAM challenge website

Page 23: CS598 Machine Learning in Computational Biology (Lecture 1

Integration of heterogeneous data

Page 24: CS598 Machine Learning in Computational Biology (Lecture 1

“Search” engine for drug discovery

Drug Protein

DiseaseSideeffect

perturbationassociation

association association

Pathway

membership

Cell type

on/off

Mutation

association

interaction

Page 25: CS598 Machine Learning in Computational Biology (Lecture 1

Diffusion Component Analysis

Network embedding

Page 26: CS598 Machine Learning in Computational Biology (Lecture 1

Variational inference

Page 27: CS598 Machine Learning in Computational Biology (Lecture 1

• Discriminance sampling for partition function estimation

• Combining variational inference and sampling approaches

Restricted Boltzmann Machine Deep Boltzmann Machine

Sampling Classification

Approximate inference