computational systems biology deep learning in the life ... · • 7.09 quantitative and...

46
Computational Systems Biology Deep Learning in the Life Sciences 1 6.802 20.390 20.490 HST.506 6.874 Area II TQE (AI) David Gifford Lecture 1 February 4, 2019 http://mit6874.github.io

Upload: others

Post on 28-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

ComputationalSystemsBiologyDeepLearningintheLifeSciences

�1

6.80220.39020.490HST.5066.874AreaIITQE(AI)

DavidGiffordLecture1

February4,2019

http://mit6874.github.io

Page 2: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Yourguides

[email protected]

http://mit6874.github.io

[email protected]

[email protected]

Page 3: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

[email protected]

YoushouldhavereceivedtheGoogleCloudcouponURLinyouremail

Page 4: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Recitations(thisweek)Thursday4-5pm36-155Friday4-5pm36-155

Officehoursareafterrecitationat5pminsameroom

(PS1helpandadvice)

Page 5: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Approximately8%ofdeeplearningpublicationsareinbioinformatics

Page 6: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Welcometoanewapproachtolifesciencesresearch

• Enabledbytheconvergenceofthreethings• Inexpensive,high-quality,collectionoflargedatasets(sequencing,imaging,etc.)

• Newmachinelearningmethods(includingensemblemethods)

• High-performanceGraphicsProcessingUnit(GPU)machinelearningimplementations

• Resultiscompletelytransformative

Page 7: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Yourbackground• Calculus,LinearAlgebra• Probability,Programming• IntroductoryBiology

Page 8: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

AlternativeMITsubjects

• 6.047/6.878ComputationalBiology:Genomes,Networks,Evolution

• 6.S897/HST.956:MachineLearningforHealthcare(2:30pm4-270)• 8.592StatisticalPhysicsinBiology• 7.09QuantitativeandComputationalBiology• 7.32SystemsBiology• 7.33EvolutionaryBiology:Concepts,ModelsandComputation• 7.57QuantitativeBiologyforGraduateStudents• 18.417IntroductiontoComputationalMolecularBiology• 20.482FoundationsofAlgorithmsandComputationalTechniquesinSystemsBiology

Page 9: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

MachineLearningistheabilitytoimproveonataskwithmoretrainingdata

• TaskTtobeperformed• Classification,Regression,Transcription,Translation,StructuredOutput,AnomalyDetection,Synthesis,Imputation,Denoising

• MeasuredbyPerformanceMeasureP• TrainedonExperienceE(TrainingData)

Page 10: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models
Page 11: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

https://arxiv.org/abs/1710.10196Trainedon30,000imagesfromCelebA-HQ

SyntheticCelebrities

Page 12: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Thissubjectistheredpill

Page 13: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Welcome

L1 Feb.5 MachinelearninginthecomputadonallifesciencesL2 Feb.7 NeuralnetworksandTensorFlowR1 Feb7MachineLearningOverviewandPS1L3 Feb12 Convoludonalandrecurrentneuralnetworks

ProblemSet:SoemaxMNIST(PS1)

Page 14: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

PS1:TensorFlowWarmUp

Page 15: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

RegulatoryElements/MLmodelsandinterpretadon

L4 Feb14 Protein-DNAinteracdonsR2 Feb.14 NeuralNetworksandTensorFlow

Feb.19(Holiday-President’sDay)L5 Feb.21 ModelsofProtein-DNAInteracdonR3 Feb.21 ModfsandmodelsL6 Feb.26 Modelinterpretadon(Gradientmethods,blackbox)

ProblemSet:RegulatoryGrammar

Page 16: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

PS2:Genomicregulatorycodes

Page 17: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

TheExpressedGenome/Dimensionalityreducdon

L7 Feb.28 TheexpressedgenomeandRNAsplicingR4 Feb28ModelinterpretadonL8 Mar5 PCA,dimensionalityreducdon(t-SNE),autoencodersL9 Mar7 scRNAseqandcelllabelingR5 Mar7 Compressedstaterepresentadons

ProblemSet:scRNA-seqtSNE

Page 18: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

PS3:ParametrictSNE

Page 19: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

GeneReguladon/Modelselecdonanduncertainty

L10 Mar12 ModelinggeneexpressionandreguladonL11 Mar14 Modeluncertainty,significance,hypothesistesdngR6 Mar14ModelselecdonandL1/L2regularizadonL12 Mar19 ChromadnaccessibilityandmarksL13 Mar21 PredicdngchromadnaccessibilityR7 Mar21 Chromadnaccessibility

ProblemSet:CTCFBindingfromDNase-seq

Page 20: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

PS4:ChromatinAccessibility

Page 21: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Genotype->Phenotype,Therapeudcs

L14 Apr2 DiscoveringandpredicdnggenomeinteracdonsL15 Apr4 eQTLpredicdonandvariantprioridzadonR8 Apr4LeadSNPstocausalSNPs;haplotypestructureL16 Apr9 ImagingandgenotypetophenotypeL17 Apr11 Generadvemodels:opdmizadon,VAEs,GANsR9 Apr11GeneradvemodelsL18 Apr18 DeepLearningforeQTLsL19 Apr23 TherapeudcDesign

L20 Apr25 ExamReviewL21 Apr30 Exam

ProblemSet:Generadvemodelsformedicalrecords

Page 22: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

PS5:GenerativeModelsSample1:dischargeinstructions:pleasecontactyourprimarycarephysicianorreturntotheemergencyroomif[*omitted*]developanyconstipation.[*omitted*]shouldbehadstoptransferredto[*omitted*]withdr.[*omitted*]orstartedonalimityourmedications.*[*omitted*]seefultdr.[*omitted*]officeandstopina1mgtablettotrofevergreattoyourpaininpostions,storale.[*omitted*]willbetakingacardiaccatheterizationandtakeanyanti-inflammatorymedicinesdiagnessoranyotherconcerningsymptoms.

Page 23: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Yourprogrammingenvironment

Page 24: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Yourcomputingresource

Page 25: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models
Page 26: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Yourgradeisbasedon5problemsets,anexam,andafinalproject

• FiveProblemSets(40%)• Individualcontribution• DoneusingGoogleCloud,JupyterNotebook

• Inclassexam(1.5hours),onesheetofnotes(30%)• FinalProject(30%)• Doneindividuallyorinteams(6.874bypermission)

• Substantialquestion

Page 27: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Amgencouldnotreproducethefindingsof47/53(89%)landmarkpreclinicalcancerpapers

http://www.nature.com/nature/journal/v483/n7391/pdf/483531a.pdf

Page 28: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Directandconceptualreplicationisimportant

• Directreplicationisdefinedasattemptingtoreproduceapreviouslyobservedresultwithaprocedurethatprovidesnoapriorireasontoexpectadifferentoutcome

• Conceptualreplicationusesadifferentmethodology(suchasadifferentexperimentaltechniqueoradifferentmodelofadisease)totestthesamehypothesis;triestoavoidconfounders

https://elifesciences.org/content/6/e23383

Page 29: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Reproducibility Project: Cancer Biology Registered Report/Replication Study Structure

• ARegisteredReportdetailstheexperimentaldesignsandprotocolsthatwillbeusedforthereplications,andexperimentscannotbeginuntilthisreporthasbeenpeerreviewedandacceptedforpublication.

• TheresultsoftheexperimentsarethenpublishedasaReplicationStudy,irrespectiveofoutcomebutsubjecttopeerreviewtocheckthattheexperimentaldesignsandprotocolswerefollowed.

https://elifesciences.org/content/6/e23383

Page 30: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Claimprecisioniskeytoscience

• “Wehavediscoveredtheregulatoryelements”• “Wehavepredictedtheregulatoryelements”

• “Thevariantcausesadifferenceingeneexpression”

• “Thevariantisassociatedwithadifferenceingeneexpression”

Page 31: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models
Page 32: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Interventionsenablecausalstatements

• Observationonlydatacanbeinfluencedbyconfounders

• Aconfounderisanunobservedvariablethatexplainsanobservedeffect

• Interventionsonavariableallowforthedetectionofitsdirectandindirecteffects

Page 33: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

MLresolvesProtein-DNAbindingevents

Page 34: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

• Who-whatprotein(s)arebinding?• Where-wherearetheybinding?• Why-whatchromatinstateandsequencemotifcausestheirbinding?

• When-whatdifferentialbindingisobservedindifferentcellstatesorgenotypes?

• How-areaccessoryfactorsormodificationsofthefactorinvolved?

Page 35: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Howcanweestablishgroundtruth?

• Replicateexperimentsshouldhaveconsistentobservations

• Independenttestsforsamehypothesis(differentantibody,differentassay)

• Statisticaltestagainstanullhypothesis-whatistheprobablyofseeingthereadsatrandom?Weneedanullmodelforthistest.

Page 36: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

x W by

tf.matmul

+

tf.nn.softmax

lossfunction

tf.placeholder tf.placeholder tf.variable tf.variable

optimizerProblemSet1Structure

[None,10] [None,784] [784,10] [10]

Page 37: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Programming model

Big idea: Express a numeric computation as a graph.

Graph nodes are operations which have any number of inputs and outputs

Graph edges are tensors which flow between nodes

Page 38: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Programming model: NN feedforward

Page 39: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Programming model: NN feedforward

Variables are 0-ary stateful nodes which

output their current value.(State is retained across multiple

executions of a graph.)

(parameters, gradient stores, eligibility traces, …)

Page 40: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Programming model: NN feedforward

Placeholders are 0-ary nodes whose value is fed

in at execution time.

(inputs, variable learning rates, …)

Page 41: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Programming model: NN feedforward

Mathematical operations:MatMul: Multiply two matrix values.Add: Add elementwise (with broadcasting).ReLU: Activate with elementwise rectified linear function.

Page 42: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

In code, please!1. Create model weights,

including initialization

a. W ~ Uniform(-1, 1); b = 0

2. Create input placeholder x

a. m * 784 input matrix

3. Create computation graph

importtensorflowastfb=tf.Variable(tf.zeros((100,)))W=tf.Variable(tf.random_uniform((784,100),-1,1))x=tf.placeholder(tf.float32,(None,784))h_i=tf.nn.relu(tf.matmul(x,W)+b)

1

2

3

Page 43: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

How do we run it?

So far we have defined a graph.

We can deploy this graph with a session: a binding to a particular execution context (e.g. CPU, GPU)

Page 44: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Getting outputsess.run(fetches,feeds)

Fetches: List of graph nodes. Return the outputs of these nodes.

Feeds: Dictionary mapping from graph nodes to concrete values. Specifies the value of each graph node given in the dictionary.

importnumpyasnpimporttensorflowastfb=tf.Variable(tf.zeros((100,)))W=tf.Variable(tf.random_uniform((784,100)-1,1))x=tf.placeholder(tf.float32,(None,784))h_i=tf.nn.relu(tf.matmul(x,W)+b)

1

23

sess=tf.Session()sess.run(tf.global_variables_initalizer())sess.run(h_i,{x:np.random.random(64,784)})

Page 45: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Basic flow

1.Build a grapha.Graph contains parameter specifications, model architecture, optimization process, …

b.Somewhere between 5 and 5000 lines

2.Initialize a session

3.Fetch and feed data with Session.run

a.Compilation, optimization, etc. happens at this step — you probably won’t notice

Page 46: Computational Systems Biology Deep Learning in the Life ... · • 7.09 Quantitative and Computational Biology • 7.32 Systems Biology • 7.33 Evolutionary Biology: Concepts, Models

Thissubjectistheredpill