high-throughput, image-based screening of genetic … each associated with more than one genetic...

18
1 High-throughput, image-based screening of genetic variant libraries George Emanuel 1,2,3 , Jeffrey R. Moffitt 1,3 , and Xiaowei Zhuang 1,3,4 1 Howard Hughes Medical Institute, 2 Graduate Program in Biophysics, 3 Department of Chemistry and Chemical Biology, 4 Department of Physics, Harvard University, Cambridge, MA 02138, USA Correspondence to [email protected] Abstract Image-based, high-throughput, high-content screening of pooled libraries of genetic perturbations will greatly advance our understanding biological systems and facilitate many biotechnology applications. Here we introduce a high-throughput screening method that allows highly diverse genotypes and the corresponding phenotypes to be imaged in numerous individual cells. To facilitate genotyping by imaging, barcoded genetic variants are introduced into the cells, each cell carrying a single genetic variant connected to a unique, nucleic-acid barcode. To identify the genotype-phenotype correspondence, we perform live-cell imaging to determine the phenotype of each cell, and massively multiplexed FISH imaging to measure the barcode expressed in the same cell. We demonstrated the utility of this approach by screening for brighter and more photostable variants of the fluorescent protein YFAST. We imaged 20 million cells expressing ~60,000 YFAST mutants and identified novel YFAST variants that are substantially brighter and/or more photostable than the wild-type protein. peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/143966 doi: bioRxiv preprint first posted online May. 30, 2017;

Upload: dangkiet

Post on 24-Apr-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

1

High-throughput,image-basedscreeningofgeneticvariantlibraries

GeorgeEmanuel1,2,3,JeffreyR.Moffitt1,3,andXiaoweiZhuang1,3,4

1HowardHughesMedicalInstitute,2GraduatePrograminBiophysics,3DepartmentofChemistryandChemicalBiology,4DepartmentofPhysics,HarvardUniversity,Cambridge,MA02138,USA

[email protected]

Abstract

Image-based,high-throughput,high-contentscreeningofpooledlibrariesofgeneticperturbationswillgreatlyadvanceourunderstandingbiologicalsystemsandfacilitatemanybiotechnologyapplications.Hereweintroduceahigh-throughputscreeningmethodthatallowshighlydiversegenotypesandthecorrespondingphenotypestobeimagedinnumerousindividualcells.Tofacilitategenotypingbyimaging,barcodedgeneticvariantsareintroducedintothecells,eachcellcarryingasinglegeneticvariantconnectedtoaunique,nucleic-acidbarcode.Toidentifythegenotype-phenotypecorrespondence,weperformlive-cellimagingtodeterminethephenotypeofeachcell,andmassivelymultiplexedFISHimagingtomeasurethebarcodeexpressedinthesamecell.WedemonstratedtheutilityofthisapproachbyscreeningforbrighterandmorephotostablevariantsofthefluorescentproteinYFAST.Weimaged20millioncellsexpressing~60,000YFASTmutantsandidentifiednovelYFASTvariantsthataresubstantiallybrighterand/ormorephotostablethanthewild-typeprotein.

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

2

High-throughputscreeningofgeneticperturbationsisplayinganincreasinglyimportantroleinadvancingbiologyandbiotechnology.Forexample,byobservingtheeffectsofalargenumberofaminoacidchangeswithinaselectedprotein,large-scalescreeningcanenableefficientsearchesforfluorescentproteinsbetteradaptedforbioimagingorproteinandnucleicaciddrugswithdesiredtherapeuticproperties.Itcanalsoenabletheexaminationofhowmutationsofaproteinaffectcellfunctionorphysiology.Sinceeachcelliscomposedofmanygenes,large-throughputscreeningcanalsoallowtheeffectsofinhibitionoractivationofindividualgenesorcombinationsofgenestobetestedatthegenomicscale,whichwillhelpdeciphertheeffectsofgenesandgenenetworksoncellularbehaviors.

Large-scalescreeningeffortsaregreatlyfacilitatedbypooled,high-diversitylibrariesofgeneticperturbationsbecauseoftheeaseandscalabilityassociatedwiththeconstructionofthesepools.Byusingmethodssuchaserror-pronePCR1orcloningwithlarge,definedpoolsofarray-synthesizedoligonucleotides2,itisoftenpossibletocreatepooledlibrarieswithaverylargenumberofgeneticvariantsorperturbationsusingasimilardegreeofeffortasrequiredtomakeanyindividuallibrarymember.Thescreeningofpooledlibrariesthendependscriticallyontheabilitytomeasurenotonlythedesiredphenotypebutalsothegenotypeofthecorrespondinglibrarymembers.Themeasuredphenotypescanbesimpleinsomecases,suchasproteinaffinityasmeasuredbystandardpull-downassays3,4,orcellularfluorescenceasdeterminedbyFACS5,andinbothcases,thegenotypecanbedeterminedviasortingorenrichinglibrarymemberswiththedesiredphenotypefollowedbytechniquessuchassequencingtodeterminethegenotype.However,therearemanyphenotypesthatcannotbequantifiedwiththesetechniques.Phenotypesrangingfromcellularmorphologyanddynamicstotheintracellularorganizationofdifferentcellularcomponentsrequirehigh-resolution,time-lapseopticalmicroscopytobemeasured.Unfortunately,ithasprovenchallengingtocombinepooledlibraryscreeningwithhigh-resolutionopticalmicroscopybecauseitisdifficulttoisolateorrecoverindividuallibrarymembersbasedontheirimagedphenotypeandthendeterminetheirgenotype.If,however,onehadtheabilitytomeasurethegenotypeofindividuallibrarymembersalsobyimaging,thenlibrarymemberisolationandrecoverywouldnotberequired,makingitpossibletocombinetheeaseandscalabilityofpooledlibraryscreeningwithimaging-basedphenotypemeasurements.

Herewereportanovelhigh-throughput,imaging-basedscreeningmethodthatallowsthecharacterizationofbothphenotypeandgenotypeforpooledpopulationsofgeneticallydiversecells.Inthismethod,weassociateeachgeneticvariantwithauniquebarcodecomposedofaseriesofshortoligonucleotidehybridizationsites.Afterintroducingthebarcodedgeneticvariantlibraryintoapopulationofcellsandmeasuringphenotypeswithimaging,thecellsarefixedandthebarcodesaredeterminedusingmultiplexederrorrobustfluorescenceinsituhybridization(MERFISH),amethodthatutilizescombinatoriallabelingandsequentialimagingtoidentifyalargenumberofbarcodes6.Wetestedthefeasibilityandquantifiedtheaccuracyofthisscreeningapproachbyscreeningalibraryof1.5millioncellswith80,000uniquebarcodesinwhichcellseitherdidordidnotexpressthefluorescentproteinmMaple37.Basedonthesemeasurements,weestimatedthatourgenotypemisidentificationrateislessthan1%forindividualcells.Todemonstratethepowerofthisapproach,weutilizedittoimprovethebrightnessandphotostabilityofYFAST,arecentlydevelopedfluorescentproteinthat

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

3

becomesfluorescentuponbindingtoanexogenouschromophore8.Withthisapproach,weefficientlyscreened20millionE.colicellscontaining~160,000uniquebarcodesand~60,000uniqueYFASTmutants,whichresultedintheidentificationofYFASTvariantwithsubstantiallyincreasedbrightnessandphotostability.Byutilizinghigh-resolutionimaging,weenvisionthatthisapproachalsohasthepotentialtoscreentheeffectofgeneticperturbationsonothercellularpropertiesthatwouldbedifficulttomeasurewithnon-imagingmethods,suchasthemorphologyanddynamicsofthecell,ortheintracellulardistributionsofproteins/RNAsandthespatialorganizationofthegenome.

InourpreviousdemonstrationofMERFISH6,weutilizedthisapproachtomassivelyincreasethemultiplexingofsingle-moleculefluorescenceinsituhybridization9,10andmeasurealargenumberofdistinctRNAspeciessimultaneouslyinsinglecells6.WesubsequentlyimprovedthethroughputofMERFISHtoallowalargenumberofcellstobemeasuredinashortperiodoftime11.HerewereasonedthatwecouldusethisapproachtoidentifythegenotypeofindividualcellsifeachcellcontainsauniqueDNAbarcodethatisassociatedwiththegenevariantofinterest.Thus,cellscouldbeidentifiedbyeffectivelymeasuringtheidentityoftheRNAthatitexpressedfromthebarcode.Toconstructsuchbarcodes,wedesignednucleotidesequencesthatcontainaconcatenationofafixednumberofhybridizationsites,wherethesequenceofeachhybridizationsitewasoneofapairofuniquesequencesassociatedwiththatsite(Fig.1A).Wetermthesesequencesreadoutsequences.EffectivelyeachbarcodesequencecanbethoughtofasrepresentinganN-bitbinarycode,wherethereisauniquereadoutsequencetorepresenta“1”ora”0”ineachbit.Thus,eachbarcodeiscomprisedofasetofNhybridizationsitesdrawnfrom2Nuniquereadoutsequences:readoutsequence1-0,readoutsequence1-1,readoutsequence2-0,readoutsequence2-1,…,readoutsequenceN-0,readoutsequenceN-1.Forexample,abarcodesequencecorrespondingtothebinaryword101…1wouldconsistofreadoutsequence1-1,followedbyreadoutsequence2-0,thenreadoutsequence3-1,…,andfinallyreadoutsequenceN-1.Asimilarapproachcouldbeusedtoconstructalternativebarcodesusingmoreuniquereadoutsequencesateachsite,i.e.threesiteswouldcorrespondtoaternarycode,orusingtheabsenceofasiteasameasurablesignal,i.e.a‘1’couldbethepresenceofasitewhilea‘0’couldberepresentedbyitsabsence.

Asanillustrationofonewayinwhichthisbarcodingschemecouldbeusedtoidentifygeneticvariants,wecreatedalibraryofplasmidsexpressingtheseN-bitbarcodesandalibraryofplasmidsexpressingaseriesofgeneticvariantsofaproteinofinterest.Tocreateabarcodedlibraryofgeneticvariants,wefusedthebarcodesequencesandthesequencesexpressingthegeneticvariantstocreatealibraryofnewplasmids,eachofwhichexpressesarandomcombinationofageneticvariantandabarcode,andintroducedthelibraryintoE.colicellssuchthateachcellonlyexpressesoneplasmid(Fig.1B)(seeMaterialsandMethodsforthedetails).Toreducethechanceofabarcodeappearinginthelibraryassociatedwithmorethanonegeneticvariant,webottleneckedthediversityofthebarcodedgeneticvariantlibrarybylimitingthenumberofcellsexpressingthebarcodedgeneticvariantlibraryto1to10%ofthetotalbarcodediversityof2N.Wethendeterminedwhichbarcodeisassociatedwithwhichgeneticvariantbyextractingtheseplasmidsandsequencingthemwithnextgenerationsequencingtoconstructalook-uptable.Thesemeasurementsalsodetectedaremainingsmallfractionofbarcodesthatwere

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

4

eachassociatedwithmorethanonegeneticvariant.Ifdetected,thesebarcodeswereremovedfromfurtheranalysis.

Toscreenthisbarcodedgeneticvariantlibrary,weimagedcellularphenotypeswhilethecellswerestillalive(Fig.1C).Then,thecellswerefixedwithoutremovingthemfromthemicroscope,andtheRNAexpressedfromthebarcodeswerereadoutusingmultiplexederror-robustfluorescenceinsituhybridization(MERFISH)6.

Duringthebarcodereadoutprocess,multiplehybridizationroundswereusedandfluorescentlylabeledreadoutprobescomplementarytoeachreadoutsequenceonthebarcodewerehybridizedineachroundtodetectwhichreadoutsequencesarepresentinwhichcells.First,readoutprobe1-0,complementarytoreadoutsequence1-0,wasintroducedsothatitcanhybridizetocellsthatcontainreadoutsequence1-0,namelythecellscontainingbarcodeswhosefirstbitis“1”,causingthosecellstobecomebrightlyfluorescent.Allthecellswereimagedandthenthefluorescencesignalremovedfromthesample.Then,readoutprobe1-1washybridized.Sinceeverybarcodecontainseitherreadoutsequence1-0orreadoutsequence1-1,thecellsthatdidnotbecomefluorescentinthefirstroundshouldbecamefluorescentinthesecondround.Thevalueforbit1foreachcellwasthenassignedbasedonthefluorescenceintensityratiobetweenthesetworounds.ThisprocesswasiterateduntilallNbitswereprobed.Toreducethenumberofhybridizationrounds,weusedmultiplecolorimagingwithspectrallydistinctfluorescentdyestoprobeforthepresenceofmultiplereadoutsequencessimultaneouslyineachround11.Three-colorimagingwasusedinthisworkthoughareadoutschemeusingmorecolorsisalsopossible.

SinceeachcellexpressesmanycopiesofitscorrespondingbarcodeRNA,thefluorescencesignalwasverybright,andhencethereadouterrorrateforeachbitwasverysmall.Thus,wedidnotfinditnecessarytouseerrorcorrectingcodesinthiscase,aswepreviouslyusedforMERFISH6,butinsteadall2Npossiblebarcodescouldbeusedinprinciple.However,inpractice,toavoidabarcodeappearingpairedwithmultiplemutantsinthesamelibrary,webottleneckedthenumberofuniquelibrarymembers,asdescribedabove.Theuseofonlyasubsetofbarcodesallowedtheexperimentalmeasurementofthefrequencywithwhichunusedbarcodesweredetected,whichinturnallowedustoquantifytherateofmisidentifyingthegenotypeofacell,aswedescribebelow.Withaconstantdegreeofbottlenecking,thistypeofinternalerrormeasurementanderrorrobustnessismaintainedevenasthenumberofpossiblebinarybarcodesincreasesexponentiallywiththenumberofbits.Thus,thisbarcodingschemeshouldallowmillionsofuniquebarcodestobemeasuredwithonlytensofhybridizationrounds.

Totesttheaccuracyofthisscreeningapproach,wecreatedalibrarycontainingonlytwo“geneticvariants”,themTagBFP2geneandthefusionofmTagBFP2andmMaple3genes(Fig.2A).Twobarcodedlibrarieswerecreatedbymerginga21-bitbinarybarcodelibrary,consistingofmorethan2million(221)uniquebarcodes,withthetwoplasmids(onecontainingthemTagBFP2andtheothercontainingmTagBFP2-mMaple3gene),oneforeachlibrary,byisothermalassembly.Then,eachofthesecompletelibrarieswaselectroporatedintoE.colicells,andafixednumberofcellswereextractedtobottlenecktheselibrariesto~40,000uniquemembers.Theplasmidswereextractedfromthecellsandsequenced

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

5

todeterminewhichofthe~2millionpossiblebarcodeswerepresentineachlibraryand,thus,associatedwiththepresenceorabsenceofmMaple3.Sequencingconfirmedthatinthemixtureofthetwolibraries,~80,000uniquebarcodeswerepresent,asexpected,representing4%ofall221possiblebarcodes.

Wethencharacterizedthiscombinedlibraryusingourscreeningstrategy.WefirstmeasuredthefluorescencepropertiesofthecellsexpressingmTagBFP2ormTagBFP2-mMaple3byilluminatingwith405-nmlighttomeasuremTagBFP2fluorescence,illuminatingwith405-nmlightforanadditional~4sinordertoswitchthemMaple3proteintoitsred-shiftedfluorescentstate,andthenmeasuringthefluorescenceintensityofthered-shiftedmMaple3byilluminatingwith560-nmlight.Wethenfixedthecellswithmethanolandacetoneandreadoutthebarcodesineachcellusingtheproceduredescribedabove.WeutilizedalcoholfixationasopposedtocrosslinkingfixativessuchaparaformaldehydesinceithasbeenestablishedthatalcoholfixationgreatlyenhancestherateofhybridizationtoRNA12.

Duringthebarcodereadingstep,weexaminedthefluorescencesignalobservedforindividualcellsduringdifferentroundsofhybridizationandimaging.Indeed,asexpected,wefoundthatcellsthatwerebrightforonereadoutofagivenbitweredimfortheotherreadout(Fig.2B).Forall1.5millioncellsobserved,atwo-dimensional(2D)histogramofthebit1measurements,i.e.thefluorescenceintensitiesdeterminedintheprobe1-0imagingroundandprobe1-1imaginground,wasconstructed,andthishistogramsuggeststherearetwodistinctpopulationsofcells(Fig.2C).Thefirstpopulationappearedbrightwhenhybridizedtoprobe1-0anddimwhenhybridizedtoprobe1-1whilethesecondpopulationappeareddimwhenhybridizedtoprobe1-0andbrightwhenhybridizedtoprobe1-1.Thisobservationisconsistentwiththereadoutsequence1-0beingpresentinthefirstpopulationandthereadoutsequence1-1beingpresentinthesecondpopulation.However,asubstantialfractionofcellsappeareddarkinbothimagingrounds,possiblybecausetheyarenotexpressingsufficientbarcodeRNA,ortheyareinsufficientlypermeabilizedforreadoutprobehybridization.Wethereforeusedathresholdingstrategytoremovethesedimcellsfromfurtheranalysis.Specifically,werequirethatthe“0”or“1”readoutsignalforeachbitislargerthanthemedianintensityobservedforthatreadoutsignalacrossallcells.Morethan600,000measuredcellssatisfiedthisconservativecriterion,andthebarcodesexpressedinthesecellsweredetermined.Amongthesecells,84%ofthemeasuredbarcodesmatchedabarcodecontainedinthelibraryasdeterminedbysequencing(Fig.2D).

Fortheunmatched16%ofthecells,anexperimentalerrormusthaveoccurred.Eitherthebarcodewaspresentinthelibrarybutitwasnotdetectedbysequencingorthebarcodewasnotpresentinthelibraryandanerroroccurredduringbarcodeimagingthatmisidentifiedthebarcode.Whilewedidnotusethesecellsinfurtheranalysis,thepresenceofthisunmatchedfractioncanbeusedtoestimateourexperimentalerrorratesinbarcodeidentification.Wefirstnotethatthelibraryonlycontains4%ofallpossiblebarcodesforthe21-bitbinaryencodingused,asdescribedabove,soassumingthatareadouterrorintheimagingprocessisequallylikelytoresultinacellbeingassignedanyofthe221barcodes,thereisa96%chancethattheerrorresultsinidentifyingabarcodethatdoesnotmatchoneinthelibrary(type1error).Next,wedenotetheprobabilitythatthebarcodeinacellisincorrectlydeterminedasx.Thenthisprobabilityxmultipliedby96%shouldbeequalto16%,thefractionofcellsthatwefoundcontainingbarcodesthatdidnotmatchanyoneinthelibrary.Hence,xshouldbeequalto0.167andthe

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

6

probabilitythattheerrorresultsinadifferentbarcodethatispresentinthelibraryshouldbeonly4%ofx,whichis~0.67%(type2error).Wenotethatonlytype2errorswouldaffectourfinalresultsbecauseonlytheseerrorshavethecapacitytogenerateamisidentifiedgenotype.Type1errorswouldbedetectedasunmatchedbarcodesanddiscarded.Therefore,ourestimatedmisidentificationrate,i.e.theprobabilitythatbothanerroroccursandtheerroryieldsabarcodethisisalreadypresentinthelibrary,islessthanonepercent.

ThefidelityofthebarcodemeasurementcanalsobeverifiedbycomparingthemeasuredphenotypeofcellsfromthemMaple3fluorescenceleveltothephenotypepredictedbythemeasuredbarcode.WeusedthephenotypemeasurementsdescribedabovetodeterminethemTagBFP2fluorescenceintensityandthemMaple3fluorescenceintensityofeachcell(Fig.2F).AllcellsthatcontainedthesamemeasuredRNAbarcodeweregroupedandthemedianratioofmMaple3intensitytomTagBFP2intensitywascalculatedforeachgroupcontainingatleast5cells.SincefromsequencingweknowwhichbarcodesshouldbeassociatewithmMaple3,wecalculatedthemedianintensityratioforcellsidentifiedforeachofthesebarcodesandconstructedahistogramoftheseratios.Usingthesameapproach,weconstructedahistogramoftheratiosforthebarcodesknowntoonlybeassociatedwithmTagBFP2(Fig.2G).Thesetwodistributionsarelargelyseparatedwithonlyasmalloverlap.WesetathresholdbasedontheintersectionpointofthetwohistogramssuchthatthecellswithfluorescenceintensityratioslargerthanthisthresholdwereclassifiedascontainingthemTagBFP2-mMaple3fusionproteinandthecellswiththeintensityratiobelowthethresholdascontainingmTagBFP2.Wethencomparedthesephenotypeassignmentsbasedonfluorescenceintensitytothegenotypesbasedonthemeasuredbarcodes.Wefoundthatlessthan1%ofcellshaveaphenotypeandgenotypedisagreement,indicatingarateofmisidentificationcomparabletothatestimatedabove.However,wenotethatthiserrorrateislikelyanoverestimatesincethenaturalspreadintheintensitydistributionofcellsineachgroupshouldallowfromsomeoverlapofthesedistributions.Basedontheabovequantifications,weconcludethatourhigh-throughputimaging-basedscreeningapproachiscapableofaccuratelyassociatingthephenotypewiththegenotypeinalargenumberofcells.

Todemonstratetheutilityofourapproachforscreeningalargelibraryofmutantstofindproteinswithdesiredproperties,wescreenedforimprovedvariantsofarecentlydevelopedfluorescentprotein,YFAST.YFASTisnotitselffluorescentbutonlybecomesfluorescentuponbindingtoanexogenous,GFP-likechromophore,suchasHBRorHMBR(Fig.3A)8.WecreatedalibraryofYFASTvariants,mergeditwithour21-bitbarcodelibrary,andsequencedtheresultingplasmidlibrarytobuildthelookuptablebetweenvariantandbarcode.

WesoughttosimultaneouslyimprovetwopropertiesofYFAST:thefluorophorebrightnessandthephotobleachingkinetics.Wenotethatwhilefluorophorebrightnessisapropertythatcanbemeasuredandselectedviamoretraditionalscreeningmethods,e.g.FACS,photobleachingkineticsrequireatime-lapsemeasurementofthefluorescencefromasinglecelland,thus,wouldbechallengingtoperformwithotherapproaches.Inparticular,weobservedthatthephotobleachingdecayofYFASTfollowsadoubleexponentialdecaywithonecomponentdecayingmuchfasterthantheother.Wesoughttoidentifymutantsthateliminatethisfastdecayingcomponent.

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

7

TomeasurethebrightnessofdifferentYFASTvariantswhilecontrollingforpotentialvariationsintheexpressionlevel,wefusedYFASTvariantstomTagBFP2andimagedindividualcellswithboth405-nmand488-nmillumination,respectively(Fig.3A).TherelativebrightnessofYFASTwascalculatedastheratioofthebackgroundsubtractedYFASTfluorescenceintensitymeasuredwith488-nmilluminationinthepresenceofthechromophoretothemTagBFP2intensitymeasuredwith405-nmilluminationintheabsenceofthechromophore.TocharacterizethephotobleachingkineticsofYFASTvariants,wemeasuredthedecreaseinintensityover20framesunderconstant488-nmilluminationalone(Fig.3B).

WeconstructedaseriesofYFASTvariantlibrariesthatcontain(1)allpossiblesingleaminoacidsubstitutions,insertions,anddeletions(termedsingle-amino-acidlibraries),(2)multiplemutationssurroundingthechromophorebindingpocketineachlibrarymember(chromophoreadjacentlibrary),(3)acombinationofthebestmutationsidentifiedfromthechromophoreadjacentlibraryandthebestmutationsidentifiedfromthesingleaminoacidlibrariesthatarealsonearthechromophore,or(4)allpossiblesingleaminoacidsubstitutions,insertions,anddeletionsbasedonafavorablemutantderivedfromtheabovelibraries.Intotal,weconstructedlibrariescontainingroughly60,000uniqueYFASTvariantsassociatedwith~160,000barcodes,andweusedourhigh-throughput,image-basedscreeningmethodtomeasurethebrightness,photobleachingkinetics,andgenotypeforthislibraryofvariantsacross20milliontotalcells.Fromeachcell’sphotobeachingdecaycurve,wedeterminetherelativeamplitudeofthefastdecaycomponent(fast-photobleachingamplitude).Wethengroupedcellsbasedonthegenotypesmeasuredandcomputedthemedianbrightnessandfast-photobleachingamplitudeforeachofthesecellgroups(Fig.3C).

Weobserveawiderangeofrelativebrightnessvaluesandfast-photobleachingamplitudes.Toconfirmthatthesevariationsrepresenttruephenotypicvariability,wefirstselectedtwovariants,thewildtypeYFAST(greendotinFig.3C)andavariantwithamuchsmaller(nearlyeliminated)fast-photobeachingamplitude(bluedotinFig.3C),andplottedthemeasuredphotobeachingdecaycurvesforthehundredsofmeasuredindividualcellsthatcontainedthetwobarcodesassociatedwiththesegenotypes.Thephotobleachingdecaycurvesmeasuredforthesetwosetsofcellsclearlyseparateintotwopopulationsthatcorrelatestronglywiththeirgenotypes,indicatingahighdegreeofreproducibilityinthemeasurementsofphenotypeswithinindividualcells.Next,weindividuallyclonedthesetwoYFASTvariantsandmeasuredtheirpropertiesinpureculture.Weobservenearlyidenticalimprovementinphotobleachingkineticsinthesemeasurementsaswedidwhenthesephenotypesweremeasuredinthecontextofthevariantlibrary.Byscreening~60,000variantsofYFAST,weidentifiedmutantsthataresubstantiallybrighter,ormutantsthathaveeliminatedthefast-photobeachingcomponent,ormutantswithimprovementsinbothaspects(Fig.3F,G).Moreover,becausewehaveanexactgenotypemeasuredforeachphenotype,wehaveproducedarichdatasetofYFASTmutationsandtheirphenotypicconsequencesthatcouldbeexaminedtoextractinformationonboththebiophysicalpropertiesofthisproteinaswellasinformationthatcouldguidefuturemutationalscreens.

Insummary,wedevelopedamethodforimage-basedscreeningoflargegeneticvariantlibrariesbyco-expressingthegeneticvariantsandbarcodethatcanidentifythesegeneticvariantsincells,anddeterminingboththephenotypesofthegeneticvariantsandthebarcodesinthesamecellsusingimaging.ByreadingoutbarcodesusingmassivelymultiplexedFISH,wedemonstratedtheabilityto

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

8

screenhundredsofthousandsofbarcodesthatcorrespondtotensofthousandsofuniquegeneticvariations.Usingthisapproach,weidentifiedmutationsintheYFASTprotein,arecentlydiscoveredligand-dependentfluorescentprotein,withsubstantiallyimprovedbrightnessandphotostability.Weexpectthatthisnovelhigh-throughput,image-basedscreeningmethodcanbeappliedbroadlytoimprovingpropertiesoridentifyingnewpropertiesofproteinsandnucleicacids,aswellastodecipheringtheeffectsofgenesandgenenetworksoncellular/organismbehaviorsatthegenomicscale.

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

9

MaterialsandMethods

Barcodelibraryassembly

Thebarcodelibraryconsistsofasetofplasmids,eachcontainingaDNAbarcodesequencethatencodesaRNAdesignedtorepresentasingle22-bitbinarywordthatistranscribedbythelpppromoter.Everybarcodeinthelibraryhasexactly22readoutsequences,onecorrespondingtoeachbit,designedtobereadoutbyhybridizingfluorescentprobeswiththecomplementarysequence.Although22bitsarepresentinthebarcode,toreducethenumberofhybridizationrounds,experimentswereconductedreadingouteither18or21ofthepossiblebits.Foreachbitposition,weassignedone20-mersequencetoencodeavalueof0andanother20-mersequencetoencodeavalueof1.Toaidquickhybridization,theseencodingsequenceswereconstructedfromathree-letternucleotidealphabet,onewithonlyA,T,andC,inordertodestabilizeanypotentialsecondarystructures13.TheutilizedsequencesweredrawnfromthosepreviouslyusedforMERFISHwithadditionalsequencesdesignedusingapproachesdescribedpreviously11.Foreachbarcode,thebitsareconcatenatedwithasingleGseparatingeach.

Weassembledthisbarcodelibrarybyligatingamixtureofshort,overlappingoligonucleotides.Foreachpairofadjacentbits,therearefouruniquecombinationsofbitvalues.Eachcorrespondingsequencewassynthesizedasasingle-strandedoligo.Theseoligoswerethenligatedtofromcomplete,double-strandedbarcodesthatcontainconcatenatedsequencesofallbitswithallpossiblebitvalues.

Fortheligationstep,alloligosweremixedanddilutedsothateacholigowaspresentataconcentrationof100nM.ThemixturewasphosphorylatedbyincubationwithT4polynucleotidekinase(16µLoligomixture,2µLT4ligasebuffer,2µLPNK(NEB,M0201S))at37°Cfor30minutesandligatedbyadding1µLT4ligase(NEB,M0202S)andincubatingfor1houratroomtemperature.

Toprepareaplasmidlibrarycontainingthesebarcodesequencesalongwiththedesiredpromoter,wedilutedtheligationproduct10-foldandamplifiedbylimited-cyclePCRonaBio-RadCFX96usingPhusionpolymerase(NEB,M0531S0)andEvaGreen(Biotium,3100).ThePCRproductwasruninanagarosegelandthebandoftheexpectedlengthwasextractedandpurified(ZymoZymocleanGelDNARecoveryKit,D4002).Thepurifiedproductwasinsertedbyisothermalassembly14for1hourat50°C(NEBNEBuilderHiFiDNAAssemblyMasterMix,E2621L)intoaplasmidbackbonefragmentcontainingthecolE1origin,theampicillinresistancegene,andotherelementstakenfromthepZseriesofplasmids15.Theassembledplasmidswerepurified(ZymoDNACleanandConcentration,D4003),elutedinto6µLwater,mixedwith10µLofelectro-competentE.colionice(NEB,C2986K),andelectroporatedusinganAmaxaNucleofectorII.Immediatelyafterelectroporation,1mLSOCwasaddedandtheculturewasincubatedat37°Conashakerforonehour.Subsequently,theSOCculturewasdilutedinto50mLofLB(Teknova,L8000)supplementedwith0.1mg/mLcarbenicillin(ThermoFisher,10177-012)andplacedontheshakerat37°Covernight.Thefollowingday,theculturewasminiprepped(ZymoZyppyPlasmidMiniprepKit,D4019),yieldingthecompletebarcodelibrary.

Assemblingproteinmutantlibraries

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

10

Tocreatealibraryofmutantproteins,shortnucleotidesequencescontainingregionsoftheproteinofinterestwiththedesiredmutationsweresynthesizedascomplexoligonucleotidepools.Tothencreatethedesiredmutantgenesfromthesepools,weamplifiedthepoolanditscorrespondingexpressionplasmidvialimitedcyclePCRandassembledthesefragmentsusingisothermalassembly14.TheexpressionbackbonewasderivedfromthecolE1originandthechloramphenicolresistancegenefromthepZseriesofplasmids15.Oligopoolsynthesisispronetodeletions,whichcouldleadtoframeshiftmutationsthatproducenon-viableproteins.Toremovethesevariantspriortomeasurement,theproteinvariantsweretranslationallyfusedupstreamtothechloramphenicolresistanceprotein.TheseconstructswereelectroporatedintoE.coli,asdescribedabove,andtheseculturesgrowninthepresenceofchloramphenicoltoselectonlyforproteinvariantsthatdidnothaveframe-shiftmutationsandwhichcould,thus,translatecomponentchloramphenicolresistance.Theseplasmidswerere-isolatedviaplasmidminiprepandthegeneticvariantsextractedviaPCRpriortocombinationwiththebarcodelibrary.

Mergingmutationlibrarieswiththebarcodelibrary

Tomergeamutantlibrarywiththebarcodelibrary,thecorrespondinghalvesofeachplasmidlibrarywereamplifiedbylimited-cyclePCR.Ofnote,theforwardprimerforamplifyingthebarcodelibrarycontained20randomnucleotidessothateachassembledplasmidcontaineda20-meruniquemolecularidentifier(UMI).Also,theproteinmutanthalfcontainedtheplasmid’sreplicationorigin(colE1)whilethebarcodehalfcontainedtheampicillinresistancegeneensuringthatonlyplasmidscontainingbothhalveswerecompetent.ThetwohalveswereassembledbyisothermalassembleandtransfectedintoelectrocompetentE.coliasdescribedearlier.AfterincubatinginSOCfor1hourat37°C,theculturewasagaindilutedinto50mLandgrownuntilitreachedanOD600of~1.Tolimitthepossibilitythatasinglebacteriumhadtakenupmorethanoneplasmid,plasmidswereextractedagainfromthisculture,andre-electroporatedintofreshE.coliatadefined,lowconcentration.Thisculturewasthengrownandre-dilutedtothedesirednumberofcellsassuminganOD600of1correspondedto800millioncells.Thedilutedculturewasincubatedat37°Covernightandthefollowingdayitwasarchivedforfutureimagingexperimentsbydiluting1:1in50%glycerol(Teknova,G1796),separatinginto100µLaliquots,andstoredat-80°C.Theremainingculturewasmini-preppedtouseasaPCRtemplateforconstructingthebarcodetogenotypelookuptable.

Constructingbarcodetogenotypelookuptable

Sincebarcodesandmutantsareassembledrandomly,nextgenerationsequencingwasusedtoconstructalook-uptablefrombarcodetomutant.Thetotallengthoftheproteinmutantandthebarcodeexceededthereadlengthofthesequencingplatformused(IlluminaMiSeq).Tocircumventthischallenge,multiplefragmentswereextractedfromeachlibrary,sequencedindependentlyandgroupedcomputationallyusingtheUMI.

Themini-preppedlibrarieswerepreparedforsequencingbytwosequentiallimited-cyclePCRs.ThefirstPCRextractedthedesiredregionwhileaddingthesequencingprimingregions,andthesecondPCR

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

11

addedmultiplexingindicesandtheIlluminaadaptersequences.BetweenPCRs,theproductwaspurifiedinanagarosegelandthefinalproductwasgelpurifiedpriortosequencing.

Foreachsequencingread,thecorrespondingbarcodeormutantsequencewasextracted.ThereadswerethengroupedbycommonUMIandthemostfrequentlyoccurringbarcodeandproteinmutantseenforeachUMIwasassignedtothatUMI,constructingthebarcodetomutantlookuptableforeveryvariantinthelibrary.

Phenotypeandbarcodeimaging

Eachlibrarywaspreparedforimagingbythawingthe100µLaliquotfrom-80°Ctoroomtemperatureanddilutinginto2mLLBsupplementedwith0.1mg/mLcarbenicillin.Imagingcoverslips(Bioptechs,0420-0323-2)in60-mm-diametercellculturedisheswerepreparedbycoveringin1%polyethylenimine(Sigma-Aldrich,P3143-500ML)inwaterfor30minutesfollowedbyasinglewashwithphosphatebufferedsaline(PBS).TheE.coliculturewasdiluted10-foldintoPBS,pouredintotheculturedish,andspunat100gfor5minutestoadherecellstothesurface.

ThesamplecoverslipwasassembledintoaBioptech’sFCS2flowchamber.Aperistalticpump(Gilson,MINIPULS3)pulledliquidthroughthechamberwhilethreecomputer-controlledvalves(Hamilton,MVPandHVXM8-5)wereusedtoselecttheinputfluid.ThesamplewasimagedonacustommicroscopebuiltaroundaNikonTi-UmicroscopebodywithaNikonCFIPlanApoLambda60xoilimmersionobjectivewith1.4NA.Illuminationwasprovidedat405,488,560,647,and750nmusingsolid-statesingle-modelasers(Coherent,Obis405nmLX200mW;Coherent,GenesisMX488-1000;MPBCommunications,2RU-VFL-P-2000-560-B1R,MPBCommunication,2RU-VFL-P-1500-647-B1R;andMPBCommunications,2RU-VFL-P-500-750-B1R)inadditiontotheoverheadhalogenlampforbrightfieldillumination.TheGaussianprofilefromthelaserswastransformedintoatop-hatprofileusingarefractivebeamshaper(Newport,GBS-AR14).Theintensityofthe488-,560-,and647-nmlaserswascontrolledbyanacousto-optictunable-filter(AOTF),the405-nmlaserwasmodulatedbyadirectdigitalsignal,andthe750-nmlaserandoverheadlampwereswitchedbymechanicalshutters.Theexcitationilluminationwasseparatedfromtheemissionusingacustomdichroic(Chroma,zy405/488/561/647/752RP-UF1)andemissionfilter(Chroma,ZET405/488/461/647-656/752m).TheemissionwasimagedontoanAndoriXon+888EMCCDcamera.Duringacquisition,thesamplewastranslatedusingamotorizedXYstage(Ludl,BioPrecision2)andkeptinfocususingahome-builtautofocussystem.

Phenotypemeasurementswereconductedimmediatelyaftercellsweredepositedontothecoverslipandinsertedintotheflowchamber.FortheYFASTmutants,imageswerefirstacquiredinPBSintheabsenceofthechromophorewith405-nmilluminationtomeasurethemTagBFP2fluorescencefollowedbyanimagewithbrightfieldilluminationforalignmentbetweenmultipleimagingrounds.Then10µMHMBRorHBRinPBSwasflowedoverthecellsandafluorescenceimagewasacquiredwith488-nmilluminationtomeasureYFASTandabrightfieldimageforalignment,followedbytwentyoreightyimagesat8.4Hzwithconstant488-nmilluminationtomeasurethedecreaseinintensityuponphotobleaching.Ineachimaginground,imageswereacquiredat1,427or3,223locationsinthesample.

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

12

Followingthephenotypemeasurement,thecellswerefixedbyincubationinamixtureofmethanolandacetoneata4:1ratiofor30minutes.Topreventsaltsfromprecipitatingandcloggingtheflowsystem,waterwasflowedbeforeandafterthefixationmixture.Oncefixed,thecellswerewashedin2xSalineSodiumChloride(SSC)andhybridizationbegan.

TodeterminetheRNAbarcodeexpressedwithineachcell,MERFISHwasperformedusingsimilarprotocolstothosedescribedpreviously11.Foreachhybridizationround,thesamplewasincubatedfor30minutesinhybridizationbuffer[2xSSC;5%w/vdextransulfate(EMDMillipore,3730-100ML),5%w/vethylenecarbonate(Sigma-Aldrich,E26258-500G),0.05%w/vyeasttRNA,and0.1%v/vMurineRNaseinhibitor(NEB,M0314L)]withamixtureofreadoutprobeslabeledwitheitherATTO565,Cy5,orAlexa750(Bio-SynthesisInc)eachataconcentrationof10nM.Thedyeswerelinkedtothereadoutprobesthroughadisulfidebond.Then,thehybridizationbufferwasreplacedbyanoxygen-scavengingbufferforimaging[2xSSC;50mMTrisHClpH8,10%w/vglucose(Sigma-Aldrich,G8270),2mMTrolox(Sigma-Aldrich,238813),0.5mg/mLglucoseoxidase(Sigma-Aldrich,G2133),and40µg/mLcatalase(Sigma-Aldrich,C100-500mg)].Eachpositionintheflowcellwasimagedwith750-,647-,and560-nmilluminationfromlongesttoshortestwavelengthfollowedbybrightfieldilluminationbeforecontinuingtothenextlocation.Followingtheimagingofallregions,thedisulfidebondlinkingthedyestothereadoutprobeswerecleavedbyincubatingthesamplein50mMtris(2-carboxyethyl)phosphine(TCEP;Sigma-Aldrich,646547-10X1ML)in2xSSCfor15minutes.Thesamplewasthenrinsedin2xSSCandthenexthybridizationroundstarted.Foreachroundofhybridization,threereadoutswithspectrallydiscernabledyeswerehybridizedsimultaneously.Altogether,with14hybridizationrounds,all42readoutscorrespondingto21bitsweremeasuredin40hours.Forsmallerlibraries,theimagingareaandthenumberofhybridizationroundsweredecreasedtoreducethemeasurementtimeto22hours.

Imageanalysis

Tocorrectforresidualilluminationvariationsacrossthecamera,aflatfieldcorrectionwasperformed.Everyimagewasdividedbythemeanintensityimageforallimageswiththegivenilluminationcolor.Then,theimagesfordifferentroundscorrespondingtothesameregionwereregisteredusingtheimageacquiredunderbrightfieldilluminationbyup-sampledcross-correlationcreatinganormalizedimagestackofallimagesateachpositionintheflowchamber.Iftheradialpowerspectraldensityofanygivenbrightfieldimagedidnotcontainsufficienthighfrequencypower,theimagewasdesignatedasout-of-focusandallimagesforthecorrespondingregionwereexcludedforfurtheranalysis.

Toextractcellintensities,theedgesofeachcellweredetectedusingtheCannyedgedetectionalgorithmonthefirstimageacquiredwith405-nmillumination.Theedgesthatformedclosedboundarieswerefilledinandclosedregionsofpixelswereextracted.Ifagivenclosedpixelregionhadafilledareaofmorethan20pixelsandtheratioofthefilledareatotheareaoftheconvexhullwasgreaterthan0.9,itwasclassifiedasacell.Toincreasethecelldetectionefficiency,thedetectedcellswerethenremovedfromthebinaryimage,theimagewasdilated,filled,anderodedandcellswereextractedagain.Thisallowedcellswheregapsexistinthedetectededgestostillbedetected.Foreachcell,themeanintensitywasextractedforthecorrespondingpixelsineveryimage.

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

13

Fromthecellintensities,thephenotypesandbarcodeswerecalculated.Foreachmeasuredreadoutsequence,themeasuredintensitywasnormalizedbysubtractingtheminimumandtakingthemedianofthesignalobservedforthatreadoutsequenceinallcells.Todeterminewhetherabarcodecontaineda“1”ora“0”ateachbit,themeasuredintensityofthe“1”readoutsequenceandthe“0”readoutsequenceforthatbitwerecompared.Specifically,athresholdwasselectedontheratioofthesetwovalues.Ifthisratiowasabovethethreshold,thebitwascalledasa“1”.Ifitfellbelow,thebitwascalledasa“0”.Becausethe“1”and“0”readoutsequencesassociatedwitheachbitmightbeassignedtodifferentfluorophores,itwasnecessarytooptimizethisthresholdforeachbitindividually.Thisoptimizationwasperformedbyrandomlyselecting150barcodes(atrainingset)fromthesetofknownbarcodescontainedinthelibrary,asidentifiedbysequencing.Aninitialsetofthresholdswasselectedandthefractionofcellsmatchingthesebarcodeswasdetermined.Thethresholdforeachbitwasthenvariedindependentlytoidentifythethresholdsetthatmaximizesthisfraction.Thisoptimizedthresholdsetwasthenusedfordeterminingthebitvaluesforallcells.

Oncethebarcodewasdeterminedforeachcells,cellsweregroupedbybarcodeandthemedianofthevariousphenotypevalueswascomputedtodeterminethemeasuredphenotypeforthegenotypecorrespondingtothatbarcode.ForYFASTmeasurements,thenormalizedintensitywasdeterminedbytheratiooftheYFASTfluorescenceintensitiesunder488-nmilluminationinthepresenceoftheYFASTchromophoretothemTagBFP2fluorescenceintensitiesunder405-nmilluminationintheabsenceofchromophore.Toaccountforthenon-negligiblefluorescencebackgroundpresentinE.coliupon488-nmillumination,themagnitudeofthebackgroundwassubtractedbeforecalculatingthefluorescenceratio.Thebackgroundwasestimatedbycalculatingthemedianintensityofallcellsupon488-nmilluminationpredictedtocontainanon-fluorescentYFASTmutant.Cells,groupedbybarcode,wereassignedtothedarkpopulationifthePearsoncorrelationcoefficientbetweenthe488-nmintensityandthe405-nmintensityforthegroupedcellsfellbelowthearbitrarythresholdof0.2.Whenthetwointensitiesareuncorrelated,itsuggeststhenumberofYFASTproteinsinthecellsassociatedwiththatbarcodedoesnotaffectthebrightnessofthecellandhencetheYFASTshouldbedark.TodeterminetherelativeamplitudeofthefastdecaycomponentofthephotobleachingmeasurementofeachYFASTmutant,wefitthebackgroundsubtractedphotobleachingcurvetothesumoftwoexponentials,withoneofthedecayratessettothefastdecayratedeterminedfromtheoriginalYFAST.TodeterminethefastdecayrateoftheoriginalYFAST,itsbackgroundsubtractedphotobleachingcurvewasfittothesumoftwoexponentialswithbothdecayratesasadjustableparametersandthefasterofthetwodecayrateswasselected.

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

14

References

1. Cadwell,R.C.&Joyce,G.F.RandomizationofgenesbyPCRmutagenesis.PCRMethodsAppl.2,28–33(1992).

2. Kosuri,S.&Church,G.M.Large-scaledenovoDNAsynthesis:technologiesandapplications.Nat.Methods11,499–507(2014).

3. Smith,G.P.Filamentousfusionphage:novelexpressionvectorsthatdisplayclonedantigensonthevirionsurface.Science228,1315–7(1985).

4. Miltenyi,S.,Müller,W.,Weichel,W.&Radbruch,A.HighgradientmagneticcellseparationwithMACS.Cytometry11,231–238(1990).

5. Anderson,M.T.etal.Simultaneousfluorescence-activatedcellsorteranalysisoftwodistincttranscriptionalelementswithinasinglecellusingengineeredgreenfluorescentproteins.Proc.Natl.Acad.Sci.93,8508–8511(1996).

6. Chen,K.H.,Boettiger,A.N.,Moffitt,J.R.,Wang,S.&Zhuang,X.Spatiallyresolved,highlymultiplexedRNAprofilinginsinglecells.Science(80-.).348,aaa6090--aaa6090(2015).

7. Wang,S.,Moffitt,J.R.,Dempsey,G.T.,Xie,X.S.&Zhuang,X.Characterizationanddevelopmentofphotoactivatablefluorescentproteinsforsingle-molecule-basedsuperresolutionimaging.Proc.Natl.Acad.Sci.111,8452–8457(2014).

8. Plamont,M.-A.etal.Smallfluorescence-activatingandabsorption-shiftingtagfortunableproteinimaginginvivo.Proc.Natl.Acad.Sci.U.S.A.113,497–502(2016).

9. Femino,A.M.,Fay,F.S.,Fogarty,K.&Singer,R.VisualizationofSingleRNATranscriptsinSitu.Science(80-.).280,585–590(1998).

10. Raj,A.,vandenBogaard,P.,Rifkin,S.A.,vanOudenaarden,A.&Tyagi,S.ImagingindividualmRNAmoleculesusingmultiplesinglylabeledprobes.Nat.Methods5,877–879(2008).

11. Moffitt,J.R.etal.High-throughputsingle-cellgene-expressionprofilingwithmultiplexederror-robustfluorescenceinsituhybridization.Proc.Natl.Acad.Sci.201612826(2016).doi:10.1073/pnas.1612826113

12. Shaffer,S.M.,Wu,M.-T.,Levesque,&M.J.,Raj,A..TurboFISH:AMethodforRapidSingleMoleculeRNAFISH.PLoSOne8,e75120(2013).

13. Zhang,Z.,Revyakin,A.,Grimm,J.B.,Lavis,L.D.&Tjian,R.Single-moleculetrackingofthetranscriptioncyclebysub-secondRNAdetection.Elife3,e01775(2014).

14. Gibson,D.G.etal.EnzymaticassemblyofDNAmoleculesuptoseveralhundredkilobases.Nat.Methods6,343–345(2009).

15. Lutz,R.&Bujard,H.IndependentandtightregulationoftranscriptionalunitsinEscherichiacoliviatheLacR/O,theTetR/OandAraC/I1-I2regulatoryelements.NucleicAcidsRes.25,1203–10(1997).

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

15

Figures

Fig.1.Ahigh-through,image-basedscreeningmethodusingmassivelymultiplexedfluorescenceinsituhybridization.(A)SchematicdepictionofanRNAbarcode.EachRNAbarcodeconsistsoftheconcatenationofaseriesofhybridizationsequences,eachofwhichcanbeassociatedwithadifferentbitinanN-bitbinarybarcode.Eachhybridizationsequencecanutilizeoneoftworeadoutsequencesuniquetothatposition,withonereadoutsequenceassociatedwitha“1”atthatbitandanotherwitha“0”.(B)Schematicdepictionoflibraryconstruction.Thelibraryofbarcodesismergedwithalibraryofgeneticvariantsandtransformedintobacteria.Thecorrespondencebetweenthebarcodesandgeneticvariantsisdeterminedbysequencing.(C)Schematicdiagramoftheimage-basedphenotype-genotypecharacterization.Thephenotypeisfirstcharacterizedinsurface-adheredcells.Then,thecellsarefixed,andmultipleroundsofhybridizationareusedtomeasurethebarcodes.Duringthefirstround,readoutprobe1-0isaddedandcellswithbarcodesthatread“0”inthefirstbit,i.e.whichcontainthereadoutsequence1-0,shouldbindtotheprobeandbecomefluorescent,whereascellswithbarcodesthatread“1”inthefirstbitshouldremaindark.Oncereadoutprobe1-0isextinguished,readoutprobe1-1isaddedandthecellswithbarcodesthatread“1”inthefirstbit,whichcontainthereadoutsequence1-1,shouldbecomefluorescent.Thisdifferenceinfluorescenceintensityallowsthevalueofbit1tobedeterminedforeachcell.Thisprocessisrepeatedfortheremainingbits.Aftermeasuringallbits,thebarcodeisdetermined,revealingtheidentityofthegeneticvariantcontainedinthecellandlinkingthemeasuredphenotypetothegenotype.

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

16

Fig.2.Performancecharacterizationofthescreeningmethodbymeasuring600,000cellscontaining80,000uniquebarcodesassociatedwithtwoknowngenotypesandphenotypes.(A)Schematicdiagramofthelibraryconstituents.Amongthe80,000distinct21-bitbarcodes,halfareassociatedwiththemTagBFP2genewhiletheotherhalfareassociatedwiththemTagBFP2-mMaple3fusiongene.BothsetsofcellswillexpressRNAbarcodes,butonlythoseexpressingmTagBFP2-mMaple3willbefluorescentinthe560-nmchannel.(B)Fluorescentimagesforeachreadoutfromasubsetofthefull21bits.Imagesafterthereadoutsequencecorrespondingtoa“0”(top)ortoa“1”(middle)arehybridizedareshown.Thedifferenceimage(bottom)indicateswhethera“0”or“1”isassignedtothebarcodewithinthatcellinthatbit(redandgrayindicate“0”and“1”,respectively).(C)Two-dimensionalhistogramofnormalizedfluorescenceintensitiesforreadout0andreadout1ofbit1foreachcell.Thefluorescenceintensitiesarenormalizedtothemedianvalues.Thedottedlinedepictsthethresholdusedforeliminatingcellsthatappeardiminbothreadouts.Theshadeofgreenindicatesthenumberofcells.(D)Thepercentofbarcodesdecodedintheimagingexperimentthatmatchbarcodesdeterminedtobeinthelibrarybysequencing(orange)andthenumberofcells(magenta)abovethereadoutintensitythresholdwithvaryingthresholdmagnitude.Thedottedlinecorrespondswiththethresholdof1shownin(C).(E)Abundanceofeachbarcode.Theabundanceisthenumberofcellsintheimagingexperimentassignedtoeachbarcode.(F)FluorescenceimageofmTagBFP2andfluorescenceimageofpost-activationmMaple3ofthesameregionas(B).(G)HistogramsofmedianmMaple3fluorescenceintensitynormalizedtomTagBFPintensityforbarcodesassociatedwiththemMaple3-mTagBFP2fusiongene(red)andforthoseassociatedwiththemTagBFP2-gene(cyan).

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

17

Fig.3.ScreeningYFASTmutantlibrariesforimprovedbrightnessandphotobleachingkinetics.(A)SchematicdiagramofYFASTlibrarydesign.YFASTisdarkonitsown,butitbecomesfluorescentuponbindingtotheligand,forexampleHMBR8.AlibraryofYFASTvariantsfusedtomTagBFP2fornormalizationismergedwithalibraryofbarcodesandtransformedintoE.colicells.(B)Theinitialintensityandthephotobleachingcurve(blackcircles)oftheoriginalYFASTmeasuredfromasinglecellinthelibraryscreenmeasurement.Theinitialintensity(graydashedline)ismeasuredastheintensityofYFASTfluorescenceunder488-nmilluminationafterbackgroundsubtractionnormalizedtothemTagBFP2fluorescenceintensityunder405-nmillumination.Thephotobleachingcurve(circles)ismeasuredbyilluminationwith488-nmlighttoexciteYFASTonly.Thecurveisfittoadoubleexponentialdecay(redline)withthebackgroundleveldeterminedbytheintensityofcellsthathavedarkYFASTvariants.(C)Scatterplotoftherelativebrightnessandfast-photobleachingamplitudeforeachmeasuredmutant.Eachpointdepictsthemedianbrightnessandfast-photobleachingamplitudeofall

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;

18

cellsassociatedwithonemutantinthelibrary.ThebrightnessvaluesarenormalizedtothatoftheoriginalYFAST.Hereonlythemutantsthatcontainatleast10imagedcellsaredepicted.TheoriginalYFAST(green)andseveralselectedmutants(blue,red,andcyan)withgreaterbrightnessand/ormuchsmallerfast-photobleachingamplitudesaremarked.(D)ThephotobleachingcurvesforindividualcellcorrespondingtotheoriginalYFAST(green)andoneselectedmutant(bluedotin(C))fromthelibrarymeasurement.Thefluorescenceintensitiesoftheinitialtimepointarenormalizedto1.(E)TheaveragebleachingcurvesfortheoriginalYFAST(green)andoneselectedmutant(bluedotin(C))measuredinisolationinpureculture.TheoriginalYFASTandmutantwereindividuallyexpressedintheE.colicellstogetherwithamTagBFP2usingthesameplasmidconstructasdescribedforthelibraryconstructs.TheinitialbrightnessvaluesofYFASTandmutantarenormalizedasdescribedin(D).(FandG)Barchartsoftherelativebrightnessvalues(F)andfast-photobleachingamplitudes(G)forseveralselectedmutantsasmarkedin(C)bythesamecolors.ThebrightnessvaluesarenormalizedtothatoftheoriginalYFAST.*indicatesavalueclosetozeroandhencenotvisibleinthebargraph.

peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/143966doi: bioRxiv preprint first posted online May. 30, 2017;