the measuring technology
DESCRIPTION
- PowerPoint PPT PresentationTRANSCRIPT
Learning Causal Structure from Overlapping Manipulations
Causal Discovery from Mass Cytometry DataPresenters: Ioannis Tsamardinos and Sofia Triantafillou
Institute of Computer Science, Foundation for Research and Technology, HellasComputer Science Department, University of Cretein collaboration with Computational Medicine Unit, Karolinska Institutet1
The Measuring Technology2Mass Cytometry3Single cells measurementsSample sizes in the millions, minimal costPublic data availableUp to ~30 proteins measured at a timeApplicationsCell countingCell sorting (gating)Identifying signaling responsesDrug screeningDe novo, personalized pathway / causal discovery (?)
3Mass Cytometry4
[Image by Bendall et al., Science 2011]Software for some of the applications mentioned is available4Cell Sorting (Gating)Immune system cells can be distinguished based on specific surface markers.
Process resembles a decision tree5
[Image by Bodenmiller et al., Nat. Biotech. 2012]Identifying Signaling ResponsesImmune responses are triggered by specific activators Signaling responses are sub-population specific.Mass cytometry for identifying signaling effects:Functional proteins (non-surface) are also marked (e.g., pSTAT3 and pSTAT5)Activators are applied to stimulate a response to diseaseCells are sorted by sub-populationChanges in protein abundance/phosphorylation in each subpopulation are quantified
6
Columns correspond to Different subpopulationsDifference in log2 mean intensity of the stimulated condition compared with the unstimulated control[Image by Bendall et al., Science 2011]Drug ScreeningUnwanted signaling responses should be suppressed for disease treatmentMass cytometry for drug screeningAfter stimulation, cells are treated with potential drugs (inhibitors)Cells are sorted by sub-populationDose-response curves are identifiedPer activatorPer sub-populationPer inhibitor
7
[Image by Bodenmiller et al., Nat. Biotech. 2012]7The Public Data813 activatorsBendall Data913 surface 18 functional variables several subpopulationsDonor 113 surface 18 functional variables several subpopulationsDonor 2[Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum, Bendall et al., Science 332, 687 (2011)]no activatorno activator
Bodenmiller Data: Time Course1010 surface14 functional variables0 min1 min 5 min15 min30 min60 min120 min240 min11 activatorsEach well produces a data set.10 surface 14 functional variables several subpopulations [Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators, Bodenmiller et al., Nature Biotechnology 30, 9 (2012) ]A plate with 96 wellsno activatorBodenmiller Data: 8 donors118 donors11 activatorsA plate with 96 wellsEach well produces a data set.10 surface 14 functional variables several subpopulations [Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators, Bodenmiller et al., Nature Biotechnology 30, 9 (2012) ]
Bodenmiller Data: Inhibitors12Inhibitor(drug) in 7 dosages11 activators27 inhibitors10 surface 14 functional variables several subpopulations [Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators, Bodenmiller et al., Nature Biotechnology 30, 9 (2012) ]Bodenmiller dataBendall dataInhibitor data8donor dataTime course dataActivatorsTimeDonorsInhibitorsSubpopulationsProteinsCollection of datasets with :All activators1 time point (30)1 donorAll InhibitorsAll SubpopulationsAll 10+14 markers measuredData summary13Data Summary14 BTK ERK HLADR IgM NFkB P38 PLCg2 S6 SHP2 SLP76 STAT3 STAT5 ZAP70 Creb CrkL CXCR4 H3 IkB a Ki67 MAPKAPK2 Src AKT LAT STAT1BCRGCSFIFNaLPSPMAPVO4 Flt3L IL3GMCSFIL7SCFTNFTPOIFN-gIL-2IL-3BodenmillerBendallACTIVATORSFunctional ProteinsBothCausal Discovery in Mass CytometryFeedback loopsLatent variablesNon-linear relationsUnfaithfulness
A typical day in the cellImage courtesy of Dr. Brad MarshA Basic Approach16Local Causal Discovery17Nothing causes XXYZUse stimulus as instrumental binary variableXYZXYZXYZXYZXYZXYZXYZXYZXYZAssumptions:Causal Markov ConditionReichenbachs Common Cause PrincipleNo feedback cyclesIssue #1: Signaling is Sub-Population SpecificGate dataData were gated by the initial researchers in Cytobank.orgAnalyze sub-populations independentlyGated sub-populations differ between Bodenmiller and Bendallcd4+, cd8+, nk sub-populations in common.
18Bodenmiller Bendallcd14+hladr-,cd14+hladrhighcd14+hladrmidcd14+surf-cd14-hladr-cd14-hladrhighcd14-hladrmidcd14-surf-cd4+cd8+dendriticigm+igm-nk
Pre-B IIMature CD38lo BPre-B IMature CD38mid BImmature BPlasma cellnkMyelocyte
Mature CD4+ TNaive CD4+ TCMPNaive CD8+ TMature CD8+ TCD11b- MonocyteCD11bmid MonocyteCD11bhi Monocyte
MPPHSC Megakaryocyte Erythroblast PlateletMEPPlasmacytoid DCGMPIssue #2:Dormant RelationsRelations may appear only during signalingPool together unstimulated and stimulated data
Different parts of the pathway maybe activated by different activatorsAnalyze data from different activators independently
19Issue #3: Testing Independence20SP1P2MIC: Science test/ problems: not conditional, you have to select some parameters plus you need to run simulations to define p-values for a specific sample size.20Issue #4Make Reliable Predictions21SP1P2p-valuedependentindependent
Issue #5: Identify Outlier Experiments22Inhibitor(drug) in 7 dosages27 inhibitors11 stimuliInhibitor data for zero dosage and 8 donor data should represent the same joint distributionDo they?
DistanceIssue #5: Identify Outlier ExperimentsInhibitor data for zero dosage and 8 donor data should represent the same joint distributionDo they?
23Given a pair of plates:For each activator, rank correlations (of markers), compute spearman correlation of rankingDistance = 1-min correlation over activators2324Time Course DataInhibitor DataFor every inhibitorAll necessary dependencies and independencies holdNoNoYesYesYesPipeline for making causal predictionsYesYesNo24Causal Postulates A list of predicted causal pairs, each tagged for a specific population and activator, ranked according to a score quantifying the frequency of appearance.
250.5482
pPlcg2pSTAT30.8750.5512
pPlcg2pZap700.81250.7152
pSlp76pSHP20.81250.6708
pSHP2pSTAT30.78570.8526
pPlcg2pP380.75 0.6166
pPlcg2pZap700.75 0.5688
pSlp76pZap700.75 0.4557
pSTAT3pBtk0.70590.5688
pSHP2pZap700.7143cd14-hladr-cd14-hladrmidcd14-hladrmiddendriticcd14+hladr-cd14-hladr-cd14-hladr-cd14-hladrmidcd14-hladr-0.4557
BCRpS6pErk0.7037igm-288 predictions in 14 sub-populationsInternal Validation26ActivatorProtein1Protein2ActivatorProtein3Protein242% of the predicted triplets are also reportedDespite strict thresholds and multiple testing
Theory+algorithms: [Tillman et. al. 2008, Triantafillou et. al 2010, Tsamardinos et. al 2012]Check whether predicted triplet has also been reportedActivatorProtein1Protein3ActivatorProtein3Protein1ORProtein2Protein2Validation on Bendall Data0.2411PMAERKSTAT3CD8+ 0.44440.51850.3114PMAP38STAT30.2459IFNaBTKSHP20.42310.1802IFNaSTAT5ZAP700.5185PMAERKSTAT3NK0.51850.4444PMAS6NFKbPMAS6STAT30.4815PMAERKZAP700.40740.33410.25020.53960.2236Bendall DataPMAS6STAT30.4074LPSSHP2ZAP700.4444CD4+ 0.14760.4352!Measurements in Bendall data are taken 15 minutes after activation2727Validation on Bendall Data280.2411PMAERKSTAT3CD8+ 0.44440.51850.3114PMAP38STAT30.2459IFNaBTKSHP20.42310.1802IFNaSTAT5ZAP700.5185PMAS6STAT30.4074LPSSHP2ZAP700.4444PMAERKSTAT3NK0.51850.4444PMAS6NFKbPMAS6STAT30.4815PMAERKZAP700.4074CD4+ 0.0100.3500.33410.18430.1300.3600.24980.25020.53960.22360.2300.2300.18910.0500.4500.17580.0000.3600.29360.1300.4900.15610.0200.1500.07930.0500.0200.06280.0000.0500.11130.0000.2400.2221ConflictingConfirmingCorrelation0.14760.435228ResultsHundreds of predictions to-be-tested; Experiments under way!Internal validation using non-trivial inferencesPromising validation on another collection of dataset (Bendall)Evidence of batch effects and/or biological reasons of variabilityMethod based on the most basic causal discovery assumptions29A Not So Basic Approach30Data summary31Bodenmiller dataBendall dataInhibitor data8donor dataTime course dataActivatorsTimeDonorsInhibitorsSubpopulations
Our basic approach utilized 12.5% of the available data setsCo-analyzing data sets from different experimental conditions with overlapping variable sets32
Condition A
Condition B
Condition C
Condition D
Different experimental conditionsDifferent variable setsData can not be pulled together because they come from different distributions
Principles of causality links them to the underlying causal graphCo-analyzing data sets from different experimental conditions with overlapping variable sets33
Condition A
Condition B
Condition C
Condition D
Identify a single causal graph that simultaneously fits all data33What type of causal graph?34ABDCManipulations in SMCMs35ABDCReverse Engineering36ABDCEABDCEABDCEABDCEObserved (in) dependenciesIndependencies as constraints37ABCA-C does not exist(A-B does not existORA-B is into BB-C does not existOROR B-C is into B)AND Statistical errors38
What happens with statistical errors?Conflicts make SAT instance unsatisfiable!
Sort constraints!*well, not really
Comparing p-values39
39COmbINE Algorithm40
COmbINEAlgorithm that transforms independence constrains to SAT instanceSummary of semi Markov Causal models that best fits all data sets simultaneously
Eric Ellis40Similar AlgorithmsSBCSD: [Hyttinen et al., UAI, 2013]Inherently less compact representation of path constraints.Does not handle conflicts; non applicable to real data.In addition, it admits cycles.Scales up to 14 variablesLininf [Hyttinen et al., UAI 2012, JMLR 2012]Linear relations only.Scales up poorly (6 variables in total with overlapping variables, 10 without).In addition, it admits cycles.
41
Execution Time in SecondsPerformance on Simulated Data42
Application on Mass Cytometry data43
cd4+ T-cellscd8+ T-cellsResponse to PMA
Summary and ConclusionsMass Cytometry data a good domain for causal discoveryHundreds of robust causal postulatesApproach:Conservative: local discovery, performing all tests, independent analysis of populationsOpportunistic: using 2 thresholds for (in)dependency
New algorithm that can handle different experimental conditions overlapping variable subsets deal with statistical errors
Numerous directions open for future work on this collection of dataExperiments under way!44Acknowledgements and CreditIoannis TsamardinosAssociate ProfLab Head
Sofia TriantafillouPh.D. Candidate
Vincenzo LaganiResearch FellowJesper TegnrProfUnit Head
Angelika SchmidtPost-Doc
David Gomez-Cabrero, Project Leader
45
Funded by: STATegra EU project (stategra.eu)