decomposing spatially dependent and cell type specific ... · decomposing spatially dependent and...

25
1 Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu 1 , Sheel Shah 2,3 , Ruben Dries 1 , Long Cai 2 *, Guo-Cheng Yuan 1 * 1. Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard T. H. Chan School of Public Health, Boston, MA 02215, USA 2. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA 3. UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA *Co-corresponding authors: [email protected] (L.C.); [email protected] (G.C.Y.) Abstract Both the intrinsic regulatory network and spatial environment are contributors of cellular identity and result in cell state variations. However, their individual contributions remain poorly understood. Here we present a systematic approach to integrate both sequencing- and imaging-based single-cell transcriptomic profiles, thereby combining whole-transcriptomic and spatial information from these assays. We applied this approach to dissect the cell-type and spatial domain associated heterogeneity within the mouse visual cortex region. Our analysis identified distinct spatially associated signatures within glutamatergic and astrocyte cell compartments, indicating strong interactions between cells and their surrounding environment. Using these signatures as a guide to analyze single cell RNAseq data, we identified previously unknown, but spatially associated subpopulations. As such, our integrated approach provides a powerful tool for dissecting the roles of intrinsic regulatory networks and spatial environment in the maintenance of cellular states. Introduction Human and other multicellular organisms are composed of diverse cell types characterized by distinct gene expression patterns. Within each cell type, there is also considerable heterogeneity. The source of cellular heterogeneity remains poorly understood, but it is commonly thought to be modulated by the balance between intrinsic regulatory networks and extrinsic cellular microenvironment (Swain et al., 2002; Jaenisch and Bird 2003). Recently, the rapid development of single-cell technologies has enabled accurate and simultaneous measurements of cell position and gene expression (Yuan et al. 2017), thus providing an . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/275156 doi: bioRxiv preprint first posted online Mar. 2, 2018;

Upload: others

Post on 13-Jul-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

1

Decomposingspatiallydependentandcelltypespecificcontributionstocellularheterogeneity

QianZhu1,SheelShah2,3,RubenDries1,LongCai2*,Guo-ChengYuan1*

1.DepartmentofBiostatisticsandComputationalBiology,Dana-FarberCancerInstituteandHarvardT.H.ChanSchoolofPublicHealth,Boston,MA02215,USA2.DivisionofBiologyandBiologicalEngineering,CaliforniaInstituteofTechnology,Pasadena,CA91125,USA3.UCLA-CaltechMedicalScientistTrainingProgram,DavidGeffenSchoolofMedicine,UniversityofCaliforniaatLosAngeles,LosAngeles,CA90095,USA*Co-correspondingauthors:[email protected](L.C.);[email protected](G.C.Y.)

Abstract

Both the intrinsic regulatory network and spatial environment are contributors of cellularidentityandresultincellstatevariations.However,theirindividualcontributionsremainpoorlyunderstood. Here we present a systematic approach to integrate both sequencing- andimaging-basedsingle-celltranscriptomicprofiles,therebycombiningwhole-transcriptomicandspatial information from these assays.We applied this approach to dissect the cell-type andspatial domain associated heterogeneitywithin themouse visual cortex region. Our analysisidentified distinct spatially associated signatures within glutamatergic and astrocyte cellcompartments,indicatingstronginteractionsbetweencellsandtheirsurroundingenvironment.UsingthesesignaturesasaguidetoanalyzesinglecellRNAseqdata,we identifiedpreviouslyunknown,butspatiallyassociatedsubpopulations.Assuch,ourintegratedapproachprovidesapowerfultoolfordissectingtherolesofintrinsicregulatorynetworksandspatialenvironmentinthemaintenanceofcellularstates.

Introduction

Humanandothermulticellularorganismsarecomposedofdiversecelltypescharacterizedbydistinct gene expression patterns. Within each cell type, there is also considerableheterogeneity. The source of cellular heterogeneity remains poorly understood, but it iscommonlythoughttobemodulatedbythebalancebetweenintrinsicregulatorynetworksandextrinsiccellularmicroenvironment(Swainetal.,2002;JaenischandBird2003).Recently,therapid development of single-cell technologies has enabled accurate and simultaneousmeasurements of cell position and gene expression (Yuan et al. 2017), thus providing an

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 2: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

2

excellent opportunity to systematically dissect the differential roles of intrinsic and extrinsicfactorsonmediatingcellularheterogeneity.

Currently, there are two major, complementary approaches for single-cell transcriptomicprofiling.Thefirstissingle-cellRNAsequencing(scRNAseq)(Tangetal.2009;Islametal.2011;Dalerbaetal.,2011;Dengetal.,2014;Jaitinetal.,2014;Macoskoetal.2015;Kleinetal2015).By combining single-cell isolation, library amplification, and massively parallel sequencing,scRNAseqprovides themost comprehensive viewof transcriptomes. The secondapproach issingle-molecule fluorescence in situ hybridization (smFISH) (Raj et al., 2008; Lubeck and Cai,2014;Chenetal.,2015;Moffittetal.2016;Shahetal.2016a;Shahetal.2016b),whichcanbeused to detectmRNA transcripts with high sensitivity whilemaintaining the spatial content.WithsequentiallyroundsofsmFISHimaging,itisnowfeasibletoprofiletheexpressionlevelofhundredsofgenesforeachcellintissues.Eachtechnologyfeaturesadistinctsetofadvantagesand limitations. The sequential FISH technology carries the advantage of measuring thetranscriptome with high accuracy in its native spatial environment, but currentimplementationsprofileonlyafewhundredgenes,whereassingle-cellRNAseqprovideswhole-transcriptomeestimationbutrequirescellstoberemovedfromtheirenvironment,resultinginalossofspatialinformation.

It is clear that an integrative analysis framework, involving single-cell RNAseqand sequentialFISH,wouldbring together thebenefitsof both technologies tobetter characterizeboth celltypeandspatiallydependentvariations.Tothisend,wedevelopedacomputationalapproachthatcontainstwomajorcomponents: First, thesinglecellRNAseqdata isusedasaguidetoaccurately determine the cell-types corresponding to the cells profiled by sequential FISH.Second,distinctspatialdomainpatternsaresystematicallydetectedfromsequentialFISHdata.Thesespatialpatternsaretheninturnusedtodissecttheenvironment-associatedvariationinasingle-cellRNAseqdataset.

Thisintegratedapproachhasenabledustosystematicallydissecttherespectivecontributionofcell typeandspatiallydependent factors inmediatingcell-statevariation (Fig.1a),whichhaseludedpreviousstudies.Mostexistingstudiesfocusedonidentifyingcell-typedifferences,but,as shown below in our analysis of the mouse visual cortex region, cell-type differencesrepresent only one component in cell-state variation (schematically represented as the cellintrinsicdimensioninFig.1a),whereaslocalenvironmentplaysasignificantroleinmediatinggeneactivities,probablythroughcell-cellinteractions(representedasthespatialdimensioninFig.1a)andsignaling.Aseachtechnologyhasitsownstrengthsandweaknesses,theintegratedapproach presented here provides a powerful model framework and broadly applicable toanalyzediversetissuesfromvariousmodelsystems.

Results

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 3: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

3

MappingscRNAseqcell-typesonseqFISHdata

GiventhatscRNAseq,asawholetranscriptomicapproach,canprovidesignaturesforadiverseset of cell types,we took advantage of thewhole-transcriptomic information obtained fromscRNAseqdata(Tasicetal.,2016)anddevelopedasupervisedcell-typemappingapproachbyintegratingseqFISHandscRNAseqdata(Fig.1b).Ourgoaldiffersfrompreviousstudies(Achimet al., 2015; Satijaet al., 2015;Halpernet al., 2017;Karaiskoset al., 2017),where scRNAseqdataweremappedontoconventionalISHimagestopredictcelllocations.Ofnote,ISHimagesarenotquantitative,multiplexedorsingle-cellresolution.InaseqFISHexperiment,transcriptsfrom hundreds of genes are detected directly in individual cells in their native spatialenvironmentatsinglemoleculeresolution.

OurstrategyistousescRNAseqdatatocapturethelargecelltypedifferencesandthenfurtherinvestigatespatialpatterningwithineachmajorcell type.WeanalyzedapublishedscRNAseqdataset targeting themouse visual cortex regions (Tasic et al., 2016). Eightmajor cell types:GABAergic,glutamatergic,astrocytes,3oligodendrocytegroups,microglia,andendothelialcellswereidentifiedfromscRNAseqanalysis(Tasicetal.,2016).Toestimatetheminimalnumberofgenesthatisrequiredforaccuratecpemapping,werandomlyselectedasubsetfromthelistofdifferentially expressed (DE) genes across these cell types, and applied a multiclass supportvector machine (SVM) (Cortes and Vapnik, 1995; Fan et al., 2008) model using only theexpressionlevelsofthesegenes.Theperformancewasevaluatedbycross-validation.Byusingonly 40 genes, we can already achieve an average level of 89% mapping accuracy. Notsurprisingly, increasing thenumberof genes leads tobetterperformance (92% for60genes,and 96% for 80 genes). Therefore, there is significant redundancy in transcriptomic profileswhichcanbecompressedintofewerthan100genes.

WetheninvestigatedaseqFISHdatasetforthemousevisualcortexarea(Shahetal.,2016a).A1mmby1mmcontiguousareaofthemousevisualcortexwasimagedwith4barcodedroundsofhybridization todecode100unique transcripts followedby5 roundsofnon-combinatorialhybridizationtoquantify25highlyexpressedgenes(SupplementaryTable1).TheseroundsofimagingwereprecededbyimagingoftheDAPIstainintheregionandfollowedbyimagingoftheNisslstainintheregion.TheimageswerealignedandtranscriptsdecodedasdescribedinShahetal.2016.Transcriptswereassigned to cells thatwere segmentedbasedonNissl andDAPI staining.Using this technology,wewereable toquantify theexpression levelsof these125geneswithhighaccuracyinatotalof1597cells.

After computing differentially expressed genes across the 8 major cell types in (Tasic et al,2016),weselectedthe top43 (P<1e-20)of these125genes forcell-typeclassification.Thesegenes contain both highly expressed (>50 copies per cell) and lowly expressed genes (<10copiespercell).Cross-validationanalysis shows that,using these43genesas input, theSVM

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 4: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

4

model accurately mapped 90.1% of the cells in the scRNAseq data to the correct cell-type.Therefore,weproceededbyusingthese43genes(SupplementaryTable2)tomapcell-typesintheseqFISHdata.

Asafirststep,wepreprocessedtheseqFISHdatabyusingamulti-imageregressionalgorithminorder to reduce potential technical biases due to non-uniform imaging intensity variation(Methods). We further adopted a quantile normalization (Bolstad et al., 2003) approach tocalibratethescalinganddistributiondifferencesbetweenscRNAseqandseqFISHexperiments.For most genes, the quantile-quantile (q-q) plot normalization curve is strikingly linear(Supplementary Fig. 1), suggesting a high degree of agreement between the two datasetsdespitetechnologicaldifferences.

Then, the SVM classification model was applied to the bias-corrected, quantile-normalizedseqFISHdatatoassigncelltypes.Ofnote,wefoundthatbetterperformancemaybeachievedbyfurthercalibratingmodelparameterstoaccommodateplatformdifferences.TheresultsofmulticlassSVMarecalibratedacrossmodels (Platt,1999)andconverted toprobabilities.Theresultsshowedtheexclusionof5.5%cellsthatcannotbeconfidentlymappedtoasinglecell-type (with0.5or less probability).Among themapped cells, 51%are glutamatergic neurons,35%areGABAergicneurons,4.5%areastrocytes,andotherglialcelltypesandendothelialcellsmakeuptheremaining4%ofcells(Fig.1c).

To validate our predictions, we first checked the expression of known marker genes andcomparedtheaveragegeneexpressionprofilesbetweenscRNAseqandseqFISHdata. Indeed,this comparison shows a high degree of similarity (Fig. 1c). Notably, marker genes haveexpected high expression in the matched cell types, such as Gja1 andMfge8 in astrocytes,Laptm5 and Abca9 in microglia, Cldn5 in endothelial cells, Tbr1 and Gda in glutamatergicneurons, and Slc5a7 and Sox2 inGABA-ergic neurons. Themajority of cell types have a highPearsoncorrelation(>0.8)betweenmatchedcelltypes’averageexpressionprofile;evenfortherare cell-typemicroglia, the correlation remains reasonablyhigh (0.75) (Fig.1d).Wearealsoable to distinguish early maturing oligodendrocytes in the seqFISH data based on Itpr2expression(Fig.1c,OPC.1column)aspreviouslyreported(Zeiseletal,2015).InhibitoryGABA-ergic neurons and excitatory glutamatergic neurons exhibit strong anti-correlation to eachother(Fig.1d).

Asanadditionalvalidation,weexaminedtheNisslandDAPIstainingimageswhichareknownto have distinct patterns between astrocytes and neuronal cell types. As Nissl is a neuronalstainandDAPIstainsDNA,astrocytesaretypicallyassociatedwithDAPIbutnotNissl,whereasneuronsarestainedforboth.Ourcell-typemappingresultshighlyagreewiththesepatterns.Over 89% of predicted astrocytes exhibit strong DAPI staining but weak or no Nissl staining

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 5: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

5

acrosscortexcolumns(SupplementaryFig.2,SupplementaryTable3).Takentogether,theseanalysesstronglyindicatethatthevastmajorityofcellsweremappedtothecorrectcelltypes.

BycombiningcelltypepredictionsfromscRNAseqandpositionalinformationfromseqFISH,wewereabletoconstructasingle-cellresolutionlandscapeofcelltypespatialdistribution(Fig.1e).Asexpected,thislandscapeisverycomplex,withdifferentcelltypesintermixedwitheachother(Fig.1e).Ontheotherhand,itisclearthatthereremainssignificantheterogeneitywithineachcell-type.Asystematicapproachtoidentifymulticellularnichefromspatialgenomicsdata

Microenvironment in tissues can contribute to heterogeneity in addition to cell type specificexpressionpatterns.Tosystematicallydissectthecontributionsofmicroenvironmentsongeneexpression variation, we developed a novel hidden-Markov random field (HMRF) approach(Zhangetal.,2001) tounbiasedly informtheorganizationalstructureof thevisualcortex.AnoverviewofthisapproachisillustratedinFig.2a.Thebasicassumptionisthatthevisualcortexcanbedividedintodomainswithcoherentgeneexpressionpatterns.Adomainmaybeformedbyaclusterofcellsfromthesamecell-type,butitmayalsoconsistofmultiplecell-types.Inthelatter scenario, the expression patterns of cell-type specific genes may not be spatiallycoherent, but environmentally associated genes would express in spatial domains. HMRFenables the detection of spatial domains by systematically comparing the gene signature ofeach cell with its surroundings to search for coherent patterns. Briefly, we computationallyconstructed an undirected graph to represent the spatial relationship among the cells,connectinganypairofcellsthatareimmediateneighbors(Fig2a,b).Eachcellisrepresentedasanodeinthisgraph.Thedomainstateofeachcellisinfluencedbytwosources(Fig2b):1)itsgeneexpressionpattern,and2)thedomainstatesofneighboringcells.Thetotalcontributionofneighboringcellscanbemathematicallyrepresentedasacontinuousenergyfield,andtheoptimalsolutionisidentifiedbysearchingfortheequilibriumoftheenergyfield(seeMethodsformathematicaldetails).

Next, we applied our HMRF model to analyze the 1597-cell mouse visual cortex seqFISHdataset.Forthevisualcortexregion,thedetectionofspatialpatternsisconfoundedbythefactthat different cell types tend to be mixed together. To reduce this confounding effect, wesystematicallyremovedgenesthatarestronglyassociatedwithspecificcell-types.Wefurthernarrowed down the gene list by identifying genes with spatially coherent gene expressionpatterns using a Silhouette metric (see Methods). This resulted a list of 69 genes(SupplementaryTable4)thatwereusedtoidentifyspatialdomains.

HMRFmodelingofthevisualcortexregionrevealed9spatialdomains(Fig.2c).Thesedomainshave distinct spatial patterns; some display a layered organization that resembles the

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 6: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

6

anatomicalstructure(Sunkinetal.,2013).Forexample,fourofthedomainsarelocatedontheouter layersofthecortextherefore labeledasO1,O2,O3,andO4,respectively(Fig.2c).Thelocations of these layers roughly correspond to the well-characterized L1, L6, and externalcapsule(EC)layers,respectively.FourdomainsarelocatedontheinsideofthecortexthereforelabeledasI1a, I1b, I2,andI3,respectively(Fig.2c).ThesedomainsroughlycorrespondtotheL2-5 layers. These inner domains are less pronounced than the outer domains, which isconsistent with previous anatomical analysis. Finally, one domain is sporadically distributedacrossintheinnerlayersofthecortex,thereforelabeledasIS(Fig2c).

Byoverlayingcelltypeannotations,weseethateachdomaingenerallyconsistsofamixtureofGABA-ergic,glutamatergicneuronsandastrocytesinteractingineachenvironment(e.g.domainI1a in Supplementary Fig. 3). We further revisited Nissl staining to observe the physicalcharacteristics of these cells in HMRF-defined domain environment. Strikingly, the domainsidentified by HMRF correspond verywell with distinct shapes of the cells in the outer layerdomains O2, O3, O4, which exhibit the characteristics of elongated cells, small size, largecircular cells respectively (Fig 2c). Someof thesedifferences are cell-type related. However,very often within a cell-type, such as glutamatergic neurons, there remains significantmorphological differences across domains, as described in the next section, suggesting thatspatialpositionsaccountsforalargepartofmorphologiesinthesecells,consistentwithknownmorphological diversity in the cortex. Overall, cells located in different HMRF domains areassociatedwithdistinctmorphologies.

The decomposition ofmouse visual cortex into spatial domains suggests that a spatial geneexpression program is shared across cells in proximity. Differential gene expression analysisidentifieddistinctsignaturesassociatedwitheachspatialdomain(Fig.3a).Forexample,genesCalb1,Cpne5,Novarepreferentiallyexpressedininnerdomains(I1a,I1b),whereasgenesTbr1,Serpinb11,Capn13arehighlyenrichedinouterdomains(O1,O2).Differentouterdomainscanbefurtherdistinguishedbyadditionalmarkers,suchasMmp8(O2),Spag6(O1),andNeurod4(O1).ThespatialmarkergenesarehighlyconsistentwiththeirspatialexpressioninAllenBrainAtlas (Sunkin et al., 2013) ISH images, such as Calb1, Cpne5, Nov, Gda, and Tbr1 (seeSupplementary Figs. 4, 5). Additional genes such asNell1,Aldh3b2,Gdf5 are also consistentwithcellclusters inanindependentdataset(Zeiseletal.2015)(SupplementaryFig.5).Takentogether,theseanalysesstronglysuggestthatourmodelforanalyzingseqFISHdataisabletodetectfunctionally,morphologically,andtranscriptionallydistinctspatialenvironments.

Integrativeanalysisidentifiedcell-type,environmentalinteractions

Glutamatergicneuronsmediatetheneuronalcircuit inthevisualcortexbyplayingaprimarilyexcitatoryfunction. It isalsowell-knownthatthebehaviorofdifferentglutamatergicneuronscanbeverydifferent(Andjelicetal.,2008;Tasicetal.,2016).Bycombiningcell-typemapping

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 7: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

7

and spatial domain identification, we set out to dissect the source of heterogeneity withinglutamatergiccells.

First,nearlyallglutamatergiccellsexpresscell-typespecificmarkerssuchasTbr1andGda(Fig3btop). Inadditiontodemonstratingcelltypeidentity,thereexistssubstantialheterogeneitywithinglutamatergic cells ina spatiallydependentmanner.Asglutamatergic cells are spreadacrossall9domains,eachsubsetexpressesadifferentgenesignatureinaccordancetodomainannotation(Fig.3bmiddle).Furthermore,anadditionalsetofgenesignaturesaredifferentiallyexpressed between glutamatergic cells in different domains (Fig. 3b bottom). For example,Neurog1indomainIS,isaIS-domainspecificgeneupregulatedinglutamatergiccellsbutnotinGABA-ergicneuronsorothercelltypes(Fig3bbottom).OthergenessuchasVmn1r65,Psmd5,follow a similar specific pattern (Fig 3b bottom). Collectively, the domain-specific signaturesmap out the spatial patterns of expression within glutamatergic cells, demonstrating theirpowertodifferentiatesubgroupsofthiscell type(SupplementaryFig.6). Additionally, thesespatially dependent variations within glutamatergic neurons have strong support from cellmorphology. We compared the morphology of cells at the boundary of two layers for sixdifferent snapshot regions (Fig 3c). In every case, domain boundaries clearly mark theboundaries of layers that possess visually identifiable cell shape characteristics (the threegroups of cells in panel L6a, L6b, EC of Fig 3c). Therefore, glutamatergic cells in differentdomainsshowstrikingmorphologicaldifferences,furthersupportingthevalidityofourdomainpartitionresults.Together,theseanalysesstronglysuggestthatspatialdomainvariationplaysanimportantroleinmediatingcellularheterogeneitywithinacommoncell-type.

UsingHMRFdomaininformationtoreanalyzescRNAseqdata

Single-cell RNAseq data does not contain spatial information. However, by integratinginformation from seqFISH data analysis, we were able to identify metagene signaturesassociated with different spatial domains. Briefly, for each domain we defined a metagenesignaturerepresentingthegenesetthatisspecificallyexpressedinthisdomain(seeMethods).Usingthesemetagenesasaguide,wewereabletoinferthespatiallocationofacellbasedontheactivitiesofthesemetagenes,definedastheaveragegeneexpressionlevel.Weusedthisapproachtodissectthecontributionofenvironmentalfactorstotranscriptomicheterogeneitywithinglutamatergiccellsfromsingle-cellRNAseqdata.t-SNEandk-meansclusteringanalysesrevealedalandscapeofsubpopulationsassociatedwithdistinctmetageneactivities,whichwasstrikinglyconsistentwithseqFISHdataanalysis(Fig4a,b).

For simplicity, these clusters were labeled according to their enrichedmetagene signatures.We identified differentially expressed genes in the single-cell RNAseq data between theaforementionedclusters,andexaminedtheirbiologicalfunctionsbyusinggenesetenrichment

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 8: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

8

analysis.Interestingly,differentbiologicallyprocesseswereassociatedwithdifferentdomains,suggestingfunctionaldifferencesbetweenthespatialdomains(Fig4d).

Importantly, subpopulations detected bymetagene analysis arewell enriched in themanuallayer annotations provided from authors of the dataset (Fig 4c). For example, cluster 5(annotated as domain I1bbasedonmetagene analysis) is enriched in L1-L2/3dissected cellsfrom Tasic et al (P<1.2e-6) (Fig 4c). Cluster 4 (marked as domain O1) is enriched in L6bdissectionlabels(P<0.0017).Cluster9(markedasdomainIS)isenrichedinL4dissectionlabel(P<0.016). Overall, these results demonstrate the value of our analysis in reinterpreting thescRNAseqdatasetbymappingourspatialHMRF-derivedsignaturestoRNAseqwhichcontainsno spatial information. Thus, integrating seqFISH data analysis provides new insights intoscRNAseqdata.

HMRFanalysisrevealsregion-specificvariationamongastrocytes

Next, we investigated the environment effect on astrocytes, which are also known to havesubstantialheterogeneity(BenHaimandRowitch,2016). Ourcelltypemappingidentified47astrocytesintheseqFISHdata.Thesecellsallexpressedkeyastrocytemarkers(Fig5a,box1)butwerespreadacross5HMRFdomains(O1,O2,O3,I1a,andI3)(Fig.5a).Furthermore,it isnotable that several groupsofenvironmentassociatedgenesare identified, indicativeof keyenvironmentalprocesses. These signaturegenesare confirmed tobeexpressed inastrocytesaccordingtobulkastrocyteRNAseqdatabase(Zhangetal.,2016)(SupplementaryFig.7).Asanexample, Sox2and loxl1 inourdomain I1aare twoofmosthighly rankedastrocytegenes inbulksequencing.CoexpressionofthesegeneswithotherECM(extracellularmatrix)markersinthe same state, suchasActa2,Col5a1, implicatean important roleof ECM in thisdomainofastrocytes, which has been previously linked to the differentiation and reprogramming ofastroglial lineage(Niuetal.,2015).WhiletheseECMgenesareupregulated indomain I1a, inother domains such as outerO1,O2, they arenotably absent or down-regulated. Therefore,domain-specific astrocytes gene expression may reveal functional differences in differentmicroenvironments.

Conclusion

Amajorgoalinsingle-cellanalysisistosystematicallydissectthecontributionsofcell-typesandenvironmentonmediatingcell-statevariability (Regevetal.,2017). Toachieve thisgoal,wepresented an HMRF-based computational approach to combine the strengths of sequencingandimaging-basedsingle-celltranscriptomicprofilingstrategies.Weshowedthatourmethodcanbeusedtocorrectlydetectspatialdomainsinthemousevisualcortexregion.Indoingso,we were able to identify environment-associated variations within a common cell-type. Ouranalysis also demonstrated that novel insights can be gleaned from single-cell data by an

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 9: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

9

integrationofinformationfromcomplementarytechnologies.Inparticular,integratingsingle-cell RNAseq data allows us tomap cell-typesmore accurately than in seqFISH data analysis,whereas integrating seqFISH data allows us to extract spatial structure in single cell RNAseqdata analysis. Future work will continue to investigate the mechanisms underlying theinteractionsbetweencell-typeandmicroenvironment.

AuthorContributions

Conceptionandsupervisionofproject:G.C.Y.,L.C.ConceptionofHMRFandSVMmodels:Q.Z.,G.C.Y. Conducting and supervision of computational analyses: Q.Z, G.C.Y. Conducting andsupervision of seqFISH experiments: S.S, L.C.Writing: Q.Z., S.S., R.D., G.C.Y., L.C. All authorscontributed ideas for this work. All authors reviewed and approved the manuscript. ThisresearchwassupportedbyaClaudiaBarrAwardandNIHgrantR01HL119099toG.C.Y.andNIHR01HD075605toL.C.

Methods

SeqFISHdatageneration

SeqFISHdatainthemousevisualcortexregionwasgeneratedasdescribedpreviously(Shahetal.,2016a).Briefly,100geneswereencodedusingatemporalbarcodingmethodand25geneswerequantified individually.Toencode100genes,4roundsofhybridizationwereperformedusing5distinctfluorescencechannels.Outofatotalpossible625barcodes,100werechosensuch that loss of signal in any given hybridization still allows accurate decoding of the spot.Everytranscriptwashybridizedineveryroundusingagivenprobeset.Afterhybridization,thesignalwasamplifiedusingsmHCRandimagesweretakenatpredefinedlocationsinthemousevisual cortex. The DNA probes along with the amplification polymers were digested usingDNase I leavingbehindanakedRNA for re-hybridizationwith thenextprobeset.A roundofimagingwithDAPI stainingwasdonebeforeanyRNAhybridization to imageallnuclei in thefields and a final round of Nissl staining was imaged to identify cell boundaries. Cells weresegmented based on DAPI staining, Nissl staining, and RNA point density. Once all imagingroundswerecompleted,theseimageswerealignedusinga2Dnormalizedcrosscorrelationandeach spotwas decoded based on the unique color switching pattern. For the 25 genes thatwerelabelledwithoutanyencoding,simplespotcountingwasdonetoidentifythenumberoftranscripts.Thesetranscriptswerethenassignedtocellsbasedonthelocationofthetranscriptandthesegmentationmasks.ForamoredetailsregardingtheseqFISHmethod,pleaserefertoShahetal.2016.ThespatialcoordinatesofthecellsareprovidedinSupplementaryData.

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 10: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

10

SeqFISHdatanormalizationandbiascorrection

TheseqFISHgeneexpressionmatrix, representedby -log (count+1),wasnormalizedby rowand column z-scoring to remove cell-specific andgene-specific biases. Potential field imagingbiases were estimated and removed by using a multi-image regression algorithm similar aspreviouslydone(Caicedoetal.,2017).Briefly,foreachgene,theimagingbiasateachbinnedlocationwasestimatedbyaveragingthenormalizedgeneexpressionlevelsover8neighboringbins within each field followed by averaging across all fields. The estimated bias was thenmodeledbyprincipalcomponentanalysis(PCA).ThecontributionsofthefourmostsignificantPCswere estimated by linear regression and removed from the normalized gene expressionmatrix(SupplementaryFig8).

Celltypemapping

Single-cell RNAseq data for the mouse visual cortex were obtained from Gene ExpressionOmnibus(GSE71585).Cell-typeinformationcorrespondingto1723cellswasobtainedfromtheoriginal paper (Tasic et al, 2016). In this analysis, we considered the 8 major cell types:GABAergic, glutamatergic, astrocytes, 3 oligodendrocyte groups, microglia, and endothelialcells.DifferentiallyexpressedgenesamongdifferentcelltypeswereidentifiedbyMAST(Finaketal.,2015).

Wetrainedclassifiersofcelltypesfromsingle-cellRNAseqdatasetbyusingthemulticlassSVMformulation.Foreachcell-type,webuiltaclassifierasfollows.Let𝑥!,𝑖 = 1,… ,𝑛,bethegeneexpressionpatternforthe𝑖-thcell,and𝑦! codeforcell-typeidentity:𝑦! = 1ifcell𝑖belongstothe specified cell type and -1 otherwise. We selected the linear kernel that produces twohyperplanesthatbestseparatesthetwoclasses.Theobjectivefunctionisdefinedasfollows

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐶 𝜁!!!!!! + 𝑤 !/2

𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜 1− 𝜁! ≤ 𝑦! 𝑤 ∙ 𝑥! − 𝑏 , 𝜁! ≥ 0Eq.1

Herew is thenormal vector to thehyperplaneused to representmargin.The squaredhingeloss function 𝜁!!!

!!! is used here to quantify the margin of misclassification error. C is aregularizationparameterthattradesoffmisclassificationduetooverfittingagainstsimplicityofthedecisionfunction.AlowerCincreasestheabilityofthemodeltogeneralizetounseendataatacostoflargerfittingerror.Fortestingdata,thesignof𝑤 ∙ 𝑥! − 𝑏isusedtopredictcelltypeidentity.WeusedthePythonLinearSVCimplementation,whichispartofthescikit-learn0.19library(Pedregosaetal.,2011),withthefollowingparametersetting:class_weights=balanced,dual=False,max_iter=10000,andtol=1e-4.

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 11: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

11

Using theSVMmodel formulatedasabove,we first testedhowmanygenesareneeded foraccuratecellmapping.Tothisend,werandomlysubset20,40,60,and80genesfromthelistofdifferentiallyexpressedgenesand,foreachgeneset,builtavanillaSVMclassificationmodeltomap each cell in the single-cell RNAseq dataset to its corresponding cell-type. The cross-validationaccuracywasevaluatedbyusing4-foldcross-validation.Ourresultsindicatedthatahighaccuracy(>90%)canbeobtainedwith40ormoregenes.

To map cell-types in the seqFISH data, we made a few modifications to incorporate theplatform differences. First, since 125 genes were profiled by seqFISH, we used the topdifferentiallyexpressedgenes(P<1e-20)inthescRNAseqdatasetforcell-typemapping.Basedonthesubsamplinganalysisdescribedabove,these43genesweresufficientforaccuratecell-typemapping.Second,thescRNAseqdatawerez-scoretransformedsothatthedynamicrangewascomparablewithseqFISHdata.Third,weusedquantilenormalization(Bolstadetal.,2003)to convert seqFISH data so that the statistical distributionwas almost identical to single-cellRNAseqdata.Fourth,wechosetheregularizationparameterCtomaximizethecross-platformcorrelation between the cell-type specific gene expression profiles, resulting an estimate ofC=1e-6.Finally,toaccountforthepossibilitythatcertaincellscannotbeunequivocallyassignedtoasinglecell-type,weusedPlattscaling(ref)toconvertSVMoutputtoaprobabilitymeasureandthenselectedacutoffvalueof0.5probabilitytofiltercellsthatcanbeconfidentlymappedtoasinglecell-type.97(5%)cellsdidnotpassthisfilter.

HiddenMarkovrandomfield

Hidden Markov random field (HMRF) is a graph-based model commonly used for patternrecognitioninimagedataanalyses(Li,2003;Zhangetal.,2001).Inacommonsetting,HMRFisusedtomodelthespatialdistributionofasignal,suchasthepixelintensitiesovera2Dimage.The spatial structure is represented as a set of nodes on a regular grid, where neighboringnodesareconnectedtoeachother.Thespatialpatternis“hidden”inthesensethatitmustbeindirectlyestimated fromother variables that canbedirectlymeasured.Themost importantassumption inHMRF is theMarkovproperty,whichstates that thespatial constraintscanbereduced to considering only correlation between immediate neighboring nodes. Thissimplifying assumption implies that the joint distribution can be decomposed as products ofmuchsmallercomponentseachdefinedonafullyconnectedsubgraph(termedcliques).Ashasbeen done previously, we decomposed the graph into size-2 components (or edges in thegraph)thatprovidesaconvenientmeanstoestimatingtheMRFbyusingpairwiseenergies.

Specifically,let𝑆 = {𝑠!}bethenodesinthegraph.Thesetofnodesandtheadjacencyrelationas defined by the local neighborhood graph forms the neighborhood system 𝑆, 𝑁! . Everynode isassociatedwithobservedsignalvalues𝑥!.Let𝐶 = {𝑐! = 1,… ,𝐾} representthesetof

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 12: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

12

possibleclassesofpatterns.The jointprobability thatanode𝑠! isassociatedwithclass 𝑐! isspecifiedbythefollowingequation:

𝑃(𝑐!|𝑥! , 𝑠! , 𝑐!!) = 1/𝑍! 𝑃(𝑥!|𝑐! , 𝑠!)𝑃(𝑐!|𝑠! , 𝑐!!)Eq.2

In the right-hand side, the term 𝑃(𝑥!|𝑐! , 𝑠!) models the effect of the node 𝑠!’s own geneexpression,whereas𝑃(𝑐!|𝑠! , 𝑐!!)modelstheeffectoftheneighboringcellsconfiguration𝑐!!.The combinedeffectof these two terms is schematically shown inFig. 2. The latter term isfurtherdeterminedbytheGibbsdistribution:

𝑃(𝑐!|𝑠! , 𝑐!!) = 1/𝑍! 𝑒𝑥𝑝 −𝛽 𝑈 𝑐! , 𝑐!!!∈!! Eq.3

where𝑈(𝑐! , 𝑐!) is referred to as the energy function. The exact formulation of𝑈(𝑐! , 𝑐!) isdependent on the specific application, and it imposes the assumption of how neighboringnodesareinteractingwitheachother.HereweusethespecialcasePott’smodel.

𝑈(𝑐! , 𝑐!) = −1, 𝑖𝑓 𝑐! = 𝑐! ; 𝑎𝑛𝑑 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.Eq.4

which means that the effects of neighboring cells are additive. Essentially, 𝑃(𝑐!|𝑠! , 𝑐!!)expresses the totalenergiesasa summationofpairwise interactionenergieswithneighbors.Theparameterbetareflectsthestrengthofinteractions.

ApplicationtoseqFISHdata

TheHMRFmodeldescribedabove isnaturallyapplicabletoanalyzeseqFISHdata. Hereeachclassofpatterns corresponds toa spatialdomain. Theobserved signalsaregeneexpressionlevels measured by seqFISH data, whose distribution is modeled as a multivariate Gaussianrandom variable. The application of HMRF to seqFISH data analysis involves the following 4components. 1) Neighboring graph representation. 2) Gene selection. 3) Domain numberselection, and 4) Implementation and model inference. The details of each component aredescribedbelow.

1) Neighborhoodgraphrepresentation.Anundirectedgraphwasconstructedtorepresentthespatialrelationshipbetweenthecells.Eachnoderepresentsacell,andeachedgeconnectsapairofneighboringcells.Theneighborhoodsizewaschosensuchasonaverageeachcellhasfiveneighboringcells.

2) Gene selection. We selected a subset of genes whose expression patterns tend to bespatiallycoherentbasedonthefollowinganalysis.Foreachgeneg,cellsweredividedintotwo mutually exclusive sets, corresponding to high expression (denoted as L1) and lowexpression(denotedasL0)respectively,atthe90thpercentileexpressionlevelcutoff.Thespatial coherence of the gene was quantified as the Silhouette coefficient (Rousseeuw,

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 13: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

13

1987)ofthespatialdistanceassociatedwiththesetwocellsets.Specifically,theSilhouettecoefficientiscalculatedas:𝒮! = 1/|𝐿!| (𝑚! − 𝑛!)/𝑚𝑎𝑥(𝑚! ,𝑛!)!!∈!! Eq.5whereforagivencell𝑠! inSet𝐿!,𝑚! isdefinedastheaveragedistancebetween𝑠! andanycell in𝐿!, and𝑛! is definedas theaveragedistancebetween𝑠! andanyother cell in𝐿!.Here, we used the rank-normalized, exponentially transformed distance to quantify thelocalphysicaldistancebetweentwocells.Forapairofcells𝑠! and𝑠!,thisdistanceisdefinedas 𝑟(𝑠! , 𝑠!) = 1− 𝑝!"#!!(!!,!!)!!where 𝑟𝑎𝑛𝑘!(𝑠! , 𝑠!) is the mutual rank (Obayashi andKinoshita, 2011) of 𝑠! and 𝑠! in the vectors of Euclidean distances 𝐸𝑢𝑐 𝑠! ,∗ and

𝐸𝑢𝑐 𝑠! ,∗ . Hence, this exponentially weighted function (Moffat and Zobel, 2008) isdesigned to give more emphasis on closely located cells and penalizing far-away cells’distance to a large number. p is a rank-weighting constant (0<p<1.0) set at 0.95. Thestatistical significance of 𝒮! was evaluated by random permutation, and the genesassociatedwithsignificantvaluesof𝒮!(p-value<0.05)wereselectedasspatiallycoherent.

3) Domainnumberselection.Weusedk-meansclusteringresultsasinitializationfortheHMRFdomains.Thevalueofkwasselectedbasedonthegap-statistics(Tibshiranietal.,2001).

4) ImplementationandmodelinferenceThemodelparameterswereinferredbyusingtheExpectation-Maximization(EM)algorithm(Dempster et al., 1977). We developed a new implementation based on the MRITC Rpackage (Feng et al., 2012) and GraphColoring Java package (Brélaz, 1979). Theimplementation contains modifications to accommodate arbitrary neighborhood graphtopology. The domain assignment for each cell was determined by using maximum aposterioriestimation,whichcanbeviewedastheequilibriumstateoftheenergyfunction.

Domain-specificgenesignatures

Foreachspatialdomain,weidentifiedasubsetofgenesthatweresignificantly(p<0.05)up-regulatedinthedomaincomparedtocellsinotherregions,usingthetwo-sample,one-sidedt-test.Ametageneexpressionwasdefinedastheaverageexpressionlevelforthisgenesubset.WedetermineddomaingenesignaturesforglutamatergiccellsacrosstheHMRFdomains(seeSupplementary Table 6) by summarizing the activities of genes that are simultaneouslyassociatedwithaspecificdomainandcell-type.

Analysisofspatialstructureinthesingle-cellRNAseqdata

Inordertosystematicallycharacterizethespatialstructurewithinasingle-cellRNAseqdata,wesummarizedthegenesignatureassociatedwitheachspatialdomainasametagene.Thereare

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 14: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

14

nine metagenes in total, corresponding to domain I1a, I1b, O1, O2, O3, O4, IS, I2, and I3respectively(definedinSupplementaryTable6).Theoverallactivityofametageneineachcellwasquantifiedas themeanz-scoredexpressionofall constituentgenes in thesignatureandfurtherbinarizedbasedonthebimodalityofthedistribution.At-SNEanalysiswasperformedon this matrix using the Rtsne package with parameters pca_scale=T, perplexity=35. Cellsubpopulations with similar metagene expression patterns were identified by K-meansclusteringanalysis(K=9).

For each subpopulation discovered frommetagene clustering above, we found differentiallyexpressed(DE)genesforthepopulation(2-samplet-test,unequalvariance,P<0.05).WiththeDEgenes,wecarriedoutGeneOntologyenrichmentanalysis (usinghypergeometric test) foreachof9subpopulationstoconstructafunctionalenrichmentprofileinFig.4(hypergeometrictest P<0.05, top 500DE genes analyzed per group,multiple hypothesis corrected by q-valueprocedure(StoreyandTibshirani,2003)).Hereweusedgenesexpressedinglutamatergiccellsasthebackgroundgene-setwhendoingenrichmentanalysis.

Tasic et al also provides layer information for a glutamatergic cell subset basedon the layerfromwhich the cells weremanually dissected using different Cre-lines. To test whether theextractedsubpopulationbasedonmetagenesisenrichedforacertainmanuallydissectedlayerof cells,wealsoperformedhypergeometric test corrected formultiplehypothesis comparingmanualannotationsofcellstoourHMRFdomain-basedannotations.

CodeAvailability

Codeisdepositedathttps://bitbucket.org/qzhu/smfish-hmrf.

DataAvailability

Expression data, spatial coordinates, SVM prediction results and HMRF segmentation resultsaredepositedathttps://bitbucket.org/qzhu/smfish-hmrf.

EthicalCompliance

Animalresearchwasconductedincompliancewithallrelevantethicalregulationsandotherinstitutionalrequirements.

References

Achim,K.,Pettit,J.B.,Saraiva,L.R.,Gavriouchkina,D.,Larsson,T.,Arendt,D.,andMarioni,J.C.(2015). High-throughput spatialmapping of single-cell RNA-seq data to tissue of origin. Nat.Biotechnol.33,503–509.

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 15: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

15

Andjelic,S.,Gallopin,T.,Cauli,B.,Hill,E.L.,Roux,L.,Badr,S.,Hu,E.,Tamas,G.,andLambolez,B.(2008).GlutamatergicNonpyramidalNeuronsFromNeocorticalLayerVIandTheirComparisonWithPyramidalandSpinyStellateNeurons.J.Neurophysiol.101,641–654.

Bolstad,B.M.,Irizarry,R.a,Astrand,M.,andSpeed,T.P.(2003).Acomparisonofnormalizationmethodsforhighdensityoligonucleotidearraydatabasedonvarianceandbias.Bioinformatics19,185–193.

Ben Haim, L., and Rowitch, D.H. (2016). Functional diversity of astrocytes in neural circuitregulation.Nat.Rev.Neurosci.18,31–41.

Brélaz,D.(1979).Newmethodstocolortheverticesofagraph.Commun.ACM22,251–256.

Caicedo,J.C.,Cooper,S.,Heigwer,F.,Warchal,S.,Qiu,P.,Molnar,C.,Vasilevich,A.S.,Barry,J.D.,Bansal,H.S.,Kraus,O.,etal.(2017).Data-analysisstrategiesforimage-basedcellprofiling.Nat.Methods14,849–863.

Chen, K.H., Boettiger, A.N.,Moffitt, J.R.,Wang, S., and Zhuang, X. (2015). Spatially resolved,highlymultiplexedRNAprofilinginsinglecells.Science.348(6233):aaa6090.

Cortes,C.,andVapnik,V.(1995).Support-VectorNetworks.Mach.Learn.20,273–297.

Dalerba, P., Kalisky, T., Sahoo, D., Rajendran, P.S., Rothenberg, M.E., Leyrat, A.A., Sim, S.,Okamoto, J., Johnston, D.M., Qian, D., et al. (2011). Single-cell dissection of transcriptionalheterogeneityinhumancolontumors.Nat.Biotechnol.29,1120–1127.

Dempster,A.P.,Laird,N.M.,andRubin,D.B.(1977).MaximumlikelihoodfromincompletedataviatheEMalgorithm.J.OftheR.Stat.Soc.Ser.B39,1–38.

Deng, Q., Ramsköld, D., Reinius, B., and Sandberg, R. (2014). Single-cell RNA-seq revealsdynamic,randommonoallelicgeneexpressioninmammaliancells.Science.343,193–196.

Fan,R.-E.,Chang,K.-W.,Hsieh,C.-J.,Wang,X.-R.,andLin,C.-J.(2008).LIBLINEAR:ALibraryforLargeLinearClassification.J.Mach.Learn.Res.9,1871–1874.

Feng,D.,Tierney,L.,andMagnotta,V.(2012).MRITissueClassificationUsingHigh-ResolutionBayesianHiddenMarkovNormalMixtureModels.J.Am.Stat.Assoc.107,102–119.

Finak,G.,McDavid,A.,Yajima,M.,Deng,J.,Gersuk,V.,Shalek,A.K.,Slichter,C.K.,Miller,H.W.,McElrath, M.J., Prlic, M., et al. (2015). MAST: a flexible statistical framework for assessingtranscriptional changes and characterizing heterogeneity in single-cell RNA sequencing data.GenomeBiol.16,278.

Halpern,K.B.,Shenhav,R.,Matcovitch-Natan,O.,Tóth,B.,Lemze,D.,Golan,M.,Massasa,E.E.,Baydatch, S., Landen, S., Moor, A.E., et al. (2017). Single-cell spatial reconstruction revealsglobaldivisionoflabourinthemammalianliver.Nature542,1–5.

Islam S, Kjällquist U, Moliner A, Zajac P, Fan JB, Lönnerberg P, Linnarsson S. (2011).

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 16: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

16

Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq.GenomeRes.2011Jul;21(7):1160-7.

Jaitin,D.A.,Kenigsberg,E.,Keren-Shaul,H.,Elefant,N.,Paul,F.,Zaretsky,I.,Mildner,A.,Cohen,N., Jung, S., Tanay, A., et al. (2014). Massively parallel single cell RNA-Seq for marker-freedecompositionoftissuesintocelltypes.Science.343,776–779.

JaenischR,BirdA.(2003).Epigeneticregulationofgeneexpression:howthegenomeintegratesintrinsicandenvironmentalsignals.NatGenet.2003Mar;33Suppl:245-54.

Karaiskos,N.,Wahle,P.,Alles,J.,Boltengagen,A.,Ayoub,S.,Kipar,C.,Kocks,C.,Rajewsky,N.,andZinzen,R.P.(2017).TheDrosophilaembryoatsingle-celltranscriptomeresolution.Science.358,194–199.

KleinAM,MazutisL,Akartuna I,TallapragadaN,VeresA,LiV,PeshkinL,WeitzDA,KirschnerMW.(2015).Dropletbarcodingforsingle-celltranscriptomicsappliedtoembryonicstemcells.Cell.2015May21;161(5):1187-1201.

Li, S.Z. (2003).Modeling image analysis problems usingMarkov random fields. in StochasticProcesses:ModellingandSimulation.473-513.

Lubeck E, Coskun AF, Zhiyentayev T, Ahmad M, Cai L. Single-cell in situ RNA profiling bysequentialhybridization.NatMethods.2014Apr;11(4):360-1.

MacoskoEZ,BasuA,SatijaR,NemeshJ,ShekharK,GoldmanM,TiroshI,BialasAR,KamitakiN,MartersteckEM,Trombetta JJ,WeitzDA,Sanes JR,ShalekAK,RegevA,McCarrollSA. (2015).HighlyParallelGenome-wideExpressionProfilingof IndividualCellsUsingNanoliterDroplets.Cell.2015May21;161(5):1202-1214.

Moffat, A., and Zobel, J. (2008). Rank-biased precision for measurement of retrievaleffectiveness.ACMTrans.Inf.Syst.27,1–27.

Moffitt, J.R., Hao, J., Bambah-Mukku, D., Lu, T., Dulac, C., and Zhuang, X. (2016). High-performancemultiplexed fluorescence in situ hybridization in culture and tissuewithmatriximprintingandclearing.Proc.Natl.Acad.Sci.113,14456–14461.

Niu,W.,Zang,T.,Smith,D.K.,Vue,T.Y.,Zou,Y.,Bachoo,R.,Johnson,J.E.,andZhang,C.L.(2015).SOX2 reprograms resident astrocytes into neural progenitors in the adult brain. Stem CellReports4,780–794.

Obayashi,T.,andKinoshita,K.(2011).COXPRESdb:adatabasetocomparegenecoexpressioninsevenmodelanimals.NucleicAcidsRes.39,D1016–D1022.

Pedregosa,F.,Varoquaux,G.,Gramfort,A.,Michel,V.,Thirion,B.,Grisel,O.,andVanderplas,J.(2011).Scikit-learn:MachinelearninginPython.J.Mach.Learn.Res.2825–2830.

Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons toregularizedlikelihoodmethods.Adv.LargeMarginClassif.10,61–74.

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 17: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

17

Raj, A., vandenBogaard, P., Rifkin, S.A., vanOudenaarden,A., and Tyagi, S. (2008). ImagingindividualmRNAmoleculesusingmultiplesinglylabeledprobes.Nat.Methods5,877–879.

Regev, A., Teichmann, S.A., Lander, E.S., Amit, I., Benoist, C., Birney, E., Bodenmiller, B.,Campbell,P.J.,Carninci,P.,Clatworthy,M.,etal.(2017).ScienceForum:TheHumanCellAtlas.Elife6,e27041.

Rousseeuw, P.J. (1987). Silhouettes: A graphical aid to the interpretation and validation ofclusteranalysis.J.Comput.Appl.Math.20,53–65.

Satija,R.,Farrell,J.A.,Gennert,D.,Schier,A.F.,andRegev,A.(2015).Spatialreconstructionofsingle-cellgeneexpressiondata.Nat.Biotechnol.33,495–502.

Shah,S.,Lubeck,E.,Zhou,W.,andCai,L.(2016a).InSituTranscriptionProfilingofSingleCellsRevealsSpatialOrganizationofCellsintheMouseHippocampus.Neuron92,342–357.

Shah, S., Lubeck, E., Schwarzkopf,M., He, T.-F., Greenbaum, A., Sohn, C.H., Lignell, A., Choi,H.M.T.,Gradinaru,V.,Pierce,N.A.,etal. (2016b).Single-moleculeRNAdetectionatdepthbyhybridization chain reaction and tissue hydrogel embedding and clearing. Development 143,2862–2867.

Storey,J.D.,andTibshirani,R.(2003).Statisticalsignificanceforgenomewidestudies.Proc.Natl.Acad.Sci.U.S.A.100,9440–9445.

Sunkin,S.M.,Ng,L.,Lau,C.,Dolbeare,T.,Gilbert,T.L.,Thompson,C.L.,Hawrylycz,M.,andDang,C. (2013). Allen Brain Atlas: An integrated spatio-temporal portal for exploring the centralnervoussystem.NucleicAcidsRes.41.D996-D1008.

Swain, P.S., Elowitz, M.B., and Siggia, E.D. (2002). Intrinsic and extrinsic contributions tostochasticityingeneexpression.Proc.Natl.Acad.Sci.99,12795–12800.

Tang,F.,Barbacioru,C.,Wang,Y.,Nordman,E.,Lee,C.,Xu,N.,Wang,X.,Bodeau,J.,Tuch,B.B.,Siddiqui,A.,etal.(2009).mRNA-Seqwhole-transcriptomeanalysisofasinglecell.Nat.Methods6,377–382.

Tasic,B.,Menon,V.,Nguyen,T.N.,Kim,T.K., Jarsky, T., Yao, Z., Levi,B.,Gray, L.T., Sorensen,S.A., Dolbeare, T., et al. (2016). Adult mouse cortical cell taxonomy revealed by single celltranscriptomics.Nat.Neurosci.19,335–346.

Tibshirani,R.,Walther,G.,andHastie,T.(2001).Estimatingthenumberofclustersinadatasetviathegapstatistic.J.R.Stat.Soc.Ser.B(StatisticalMethodol.)63,411–423.

Yuan,G.C.,Cai,L.,Elowitz,M.,Enver,T.,Fan,G.,Guo,G., Irizarry,R.,Kharchenko,P.,Kim, J.,Orkin,S.,etal.(2017).Challengesandemergingdirectionsinsingle-cellanalysis.GenomeBiol.18.

Zeisel, A., Munoz-Manchado, A.B., Codeluppi, S., Lonnerberg, P., La Manno, G., Jureus, A.,Marques,S.,Munguba,H.,He,L.,Betsholtz,C.,etal.(2015).Brainstructure.Celltypesinthe

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 18: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

18

mousecortexandhippocampusrevealedbysingle-cellRNA-seq.Science347,1138–1142.

Zhang,Y.,Brady,M.,andSmith,S.(2001).SegmentationofBrainMRImagesThroughaHiddenMarkov RandomFieldModel and the Expectation-MaximizationAlgorithm. IEEE TransactionsonMedicalImaging20,45–57.

Zhang,Y.Y.,Sloan,S.A.S.A.A.,Clarke,L.E.L.E.E.,Caneda,C.,Plaza,C.A.,Blumenthal,P.D.,Vogel,H., Steinberg, G.K., Edwards,M.S.B., Li, G., et al. (2016). Purification and Characterization ofProgenitor andMatureHumanAstrocytes Reveals Transcriptional and Functional DifferenceswithMouse.Neuron89,37–53.

Figures

Figure1.OverallgoaloftheprojectandcelltypepredictioninseqFISHdata.

a. Cellular heterogeneity is driven by both cell-type (indicated by shape) andenvironmental factors (indicated by colors). ScRNAseq based studies can only detectcell-typerelatedvariation,becausespatialinformationislost.

b. Our goal is to decompose the contributions of each factor by developingmethods tointegratescRNAseqandseqFISHdata.

c. Prediction resultsevaluatedby thecomparisonof cell-typeaverageexpressionprofileacrosstechnologiesfor8majorcelltypes.Valuesrepresentexpressionz-scores.GenesareorderedbysignificanceofdifferentialexpressioninscRNAseq.

d. Correlation between reference and predicted cell type averages ranges from 0.75 to0.95.

e. Integrationof seqFISHandscRNAseqdata (illustratedbyb)enablescell-typemappingwithspatialinformationintheadultmousevisualcortex.Eachcelltypeislabeledbyadifferent color. Cell shape information is obtained from segmentation of cells fromimages(seeMethods).

Figure2.SpatialdomaindissectioninseqFISHdatausinghiddenMarkovrandomfield(HMRF)approach.

a. AschematicoverviewoftheHMRFmodel.Aneighborhoodgraphrepresentsthespatialrelationship between imaged cells (indicated by the circles) in the seqFISH data. Theedges connect cells that are neighboring to each other. seqFISH-detected multigeneexpression profiles are used together with the graph topology to identify spatialdomains. In contrast, k-means and other clustering methods do not utilize spatialinformation therefore the results are expected to be less coherent (illustrated in thedashedbox).

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 19: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

19

b. AnintuitiveillustrationofthebasicprinciplesinaHMRFmodel.Forahypotheticalcell(indicated by the question mark), its spatial domain assignment is inferred fromcombininginformationfromgeneexpression(𝑥!)andneighborhoodconfiguration(𝑐!!).Thecolorofeachnoderepresentscell’sexpressionandthenumberinsideeachnodeisdomainnumber. In thishypotheticalexample, combining such information results thecellbeingassignedtodomain1,insteadofdomain3(seeMethods).

c. HMRFidentifiesspatialdomainconfigurationinthemousevisualcortexregion.Distinctdomainsrevealaresemblancetolayerorganizationorcortex.Namingofdomains:I1a,I1b,I2,I3areinnerdomainsdistributedintheinnerlayers.O1-O4areouterdomains.ISisinnerscatteredstate.Thesedomainsareassociatedwithcellmorphologicalfeaturessuchasdistinctcellshapedifferencesinouterlayerdomains.Cellshapeinformationisobtainedfromsegmentationofcellsfromimages(seeMethods).

d. Generaldomainsignaturesthataresharedbetweencellswithindomains.Figure3.HMRFanalysisidentifieddomainassociatedheterogeneitywithinglutamatergiccells.

a. Three major sources of variations in glutamatergic neurons. (Top): cell type specificsignalsGdaandTbr1.(Middle):generaldomainsignaturesasinFig2d,summarizedintometagenes’ expression. (Bottom): glutamatergic specific domain signatures, found bycomparing glutamatergic cells across domains and removing signatures that also varyacrossdomainsinothercelltypes.

b. Snapshotsofsinglecells.Eachrowisasnapshotofcellsattheboundaryoftwolayers.Each of two columns is a type of annotation: (left column) cell type, (right column)HMRF domains. Cell type is incapable of explaining layer-to-layer morphologicalvariations: e.g. glutamatergic cells (orange) is present in all layers yet morphologicaldifferencesexistwithinglutamatergiccells.HMRFdomainsbettercapturetheboundaryoftwolayersineachcase,inthatthedomainscanseparatedistinctmorphologies

Figure4.Reanalysisofsingle-cellRNAseqdata(fromTasicetal)withdomainsignaturessummarizedintometagenes.

a. t-SNE plot shows how glutamatergic cells from Tasic et al cluster according toglutamatergic-specific domain signatures aggregated as metagenes (shown in (b)).Colors indicate k-means clusters (k=9). Each cluster is annotated by its enrichedmetageneactivity.

b. Binarizedmetageneexpressionprofilesfortheglutamatergiccells.Red:populationthathighlyexpressesthemetagene.

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 20: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

20

c. Spatial clusters defined accordingmetagenes are enriched inmanual layer dissectionannotations.Column:layerinformationobtainedfrommicrodissection.Row:metagenebasedcellclusters.

d. InferredspatialclustersofglutamatergicneuronsareenrichedindistinctGObiologicalprocesses.

Figure5.SpatiallydependentastrocytevariationrevealedbyHMRF.Neighborhoodcelltypecompositionforthe47astrocytecells(columns).CellsareorderedbyHMRFdomainannotations.Theheatmapshowssinglecellexpressionofastrocytesclusteredbydomain-specificgenes.Blue-boxhighlightsthecommonsignaturesexpressedineachdomain’sastrocytepopulation.

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 21: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

cCell type A

Cell type B

seqFISH scRNAseqb

Visual cortex

d

scRNAseqseqFISH

e

a

Mic

ro.

Ast

ro.

End

o.

OP

C.1

OP

C.2

Olig

o.G

lut.

N.

GA

BA

. N.

Mic

ro.

Ast

ro.

End

o.O

PC

.1O

PC

.2O

ligo

.G

lut.

N.

GA

BA

. N.

Cell intrinsic dimensionSpatial dimension

Mapping cell types to seqFISH

3.0

0.0

-3.0

Micro.Glut. N.GABA. N.Astro.Endo.Oligo.,OPC.1, OPC.2

0.8

0.0

-0.8

Micro.

Astro.

Endo.

OPC.1

OPC.2

Oligo.

Glut. N.

GABA. N.

Mic

ro.

Ast

ro.

En

do.

OP

C.1

OP

C.2

Olig

o.

Glu

t. N

.

GA

BA

. N.

seqFISH

scRNAseq

Cell intrinsic dimension

Fbll1HdxMrgprb1Slc5a7Cecr2Nell1Pax6Col5a1DcxArhgef26Itpr2WrnCalb1Abca9Cpne5Npy2rCldn5Bmpr1bSpag6Slc4a8Vps13cSumf2Pld1Ankle1Blzf1Cdc5lCyp2j5Rrm2Csf2rb2Gm805Tnfrsf1bVmn1r65Laptm5Olr1MertkSlco1c1Mfge8Gja1GdaOmgSox2RhobTbr1

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 22: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

?2

21

1 1

13

1si

1 2 3

0.30 0.20 0.50

0.50 0.27 0.23

P(x i∣c i , si)

P(c i∣si , cN i)

1 2 3

0.47 0.17 0.36P(c i∣x i , si , cN i)

1s

i

Neighborhood graph

Spatial gene expression

a

HMRFdomains

K-meansSpatially unaware

b

...g1

g2

g3

HMRF domainsHMRF domains

c L1 L2/3 L4 L5 L6 EC

O1

I1a

I1b

O2IS

I2

I3

O3

O4

O1I1aI1bISI2I3O2O3O4

Domain signaturesd

O1 O2 O3 O4 IS I2 I3 I1a I1b

O1 O2 O3 O4 IS I2 I3 I1a I1b

Itpr2 Cpne5 Gda Calb1 Nov

Nkd2 Rhob

Wrn Ankle1

Lefty2 Lhx3 Hoxb3 Lmod1 B3gat2

Nlrp12 Psmd5 Zfp182

Arhgef26 Laptm5

Slc4a8 Mertk Vmn1r65

Vps13c Bmpr1b Csf2rb2

Dcx Sema3e Pax6 Slco1c1 Mfge8 Gja1 Omg

Sumf2 Ano7 Cilp Rrm2 Aldh3b2Egln3 Pld1 Foxd1 Gdf5 Capn13

Serpinb11 Tbr1 Hn1l Adcy4 Nhlh1 Clec5a Gpc4

Ddb2 Blzf1 Senp1 Tnfrsf1b Mgam Nfkb2

Dcstamp Olr1 Mmgt1 Ppp1r3b Poln Abca9

Cdc6 Spag6 Foxa1 Galnt3 Gdf2 Sis

Murc Nfkbiz Cdc5l Creb1

Slc5a7 Col5a1 Mrc1 Nes Cldn5 Loxl1 Acta2 Lyve1

Nell1 Hnf1a

Cecr2 Ctla4 Pld5 Mmp8 Mrgprb1

Cyp2j5

4.0

0.02.0

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 23: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

a Glutamatergic Neuron

Cell type specific

Domain metagenes

Domain and cell-type specific signatures

O2 I1a O4 I1b O1 I2 I3 O3 IS

Amigo2Acta2Foxd1Vmn1r65Egln3Neurod4Lyve1Loxl1Sox2Cdc5lObsl1Fbll1Neurog1Cpne5Ddb2Nfkb2Olr1Blzf1Nhlh1Zfp715Psmd5Nlrp12Senp1Hn1lNkd2OmgHdxCdh1Hoxb8Ankle1Rrm2Foxd4Ctla4Mmp8SisMrgprb1Pld5Gdf5CilpGalnt3Gm805Barhl1Gm6377Gdf2Npy2rAno7Cdc6Dcstamp

O2I1aO4I1bO1I2I3O3IS

GdaTbr1

b

L2/3

L6a L6b EC

L6a L6b

L4 L5

L1 L2/3

Cell type Spatial domain

Glut. N.GABA. N.

L2/3

L6a L6b EC

L6a L6b

L4 L5

L1 L2/3

O1I1aI1bISI2I3O2O3O4

4.0

0.0

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 24: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

O2 I1a O4

I1b O1 I2

I3 O3 IS

Metagene expressionMetagene-derived cell clusters (9)

-log10

(P value)

123456789

Me

tag

ene

clus

ters

Tasic et allayers

a b

c

d

3.0

0.5

1.3

34

1 I32 (none) 3 O2

4 O15 I1b6 O3

7 O48 I29 IS

Cor

resp

. to

ex

pres

sion

Anterograde trans synaptic signaling

Signal release fro

m synapse

Intracellular tra

nsport

Neurotransmitter tr

ansport

Cell cell s

ignaling

Modulation of chemical synaptic trans.

Exocytosis

Synaptic vesicle transport

Translational initia

tion

Macromolecule catabolic process

Synaptic trans. - d

opaminergic

Cytoskeleton dep. intracellular tra

nsport

Ensheathment of neurons

Lipid biosynthetic process

Glial cell diffe

rentiation

Oligodendrocyte differentiation

Cell prolife

ration

Inflammatory response

Cell migration

Macrophage activation

Cell activation involved in im

mune resp.

Mitochondrion disassembly

ERAD pathway

Resp. to endoplasmic re

ticulum stress

Macroautophagy

Reg. of long term

synaptic potentiation

.

.

3.0

0.5

1.3

0.0

Metageneclusters

L1-L2/3 L4 L5 L6a L6b L6

Kmeans.1 Kmeans.2 Kmeans.3 Kmeans.4 Kmeans.5 Kmeans.6 Kmeans.7 Kmeans.8 Kmeans.9

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;

Page 25: Decomposing spatially dependent and cell type specific ... · Decomposing spatially dependent and cell type specific contributions to cellular heterogeneity Qian Zhu1, Sheel Shah2,3,

0%

50%

100%

HMRF domains

Neighbors (%)

1

2

3

4

5

1 Astrocyte specific genes

2 6 Domain specific genes

Gja1Mfge8Arhgef26MertkSlco1c1Itpr2Acta2Mrc1NesCol5a1Sox2Loxl1Calb1OmgWrnNpy2rClec5aZfp715Barhl1Lefty2Aldh3b2Kcnip2Ankle1Foxa1Pld1Spag6Capn13Mrgprb1Hoxb8Lhx3Fam69cCilpGata6Egln3Hoxb3Abca9Gdf5Neurog1Sumf2Sema3eTbr1Gpc4NfkbizNovCyp2j5RhobSlc5a7Lyve1

6

3.0

0.0

.CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/275156doi: bioRxiv preprint first posted online Mar. 2, 2018;