supporting the scientific data lifecycle · 2016-01-07 · • horizon 2020 project starting april...
TRANSCRIPT
![Page 1: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/1.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|1
PatrickFuhrmann
Onbehaveoftheprojectteam
Supportingthescientificdatalifecycle
![Page 2: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/2.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|2
Content• Howaresoftwarefeaturesselected.• Howaresoftwarefeaturesfunded.• Hardeningnewfeatures.• Exploringnewcommunities.• RespondingonnewtechnologiesHWandSW• SomethingaboutINDIGO-DataCloud• EssentiallyarandomwalkfocusingonthingsIthoughtmightbeinteresting.
![Page 3: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/3.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|3
Somewordsonwhyandwhen
dCachedoeswhatitdoes.
![Page 4: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/4.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|4
Howaresoftwarefeaturesselected?
• Scientificcommunitiesbelieve, thatOpenSourceSoftwareisgrowingontrees.
• Consequently theyarenotwillingtocontributetothedevelopment andsoftwaremanagementatall.
• Theyassumethatcomplainsareveryvaluablecontributions.• Nextconsequence isthatOpenSourceteamsmainly
implementsoftwarefeatures,whicharerequiredbythelabs,wherethecoreteammembersarehosted.
• Inordertoexplorenewcommunitiesandsatisfytheirsoftwarerequirements,OpenSourceProjectsneedexternalmoney.
![Page 5: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/5.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|5
Howarenewfeaturesfunded?
• Thisiswhere“National”and“European”projectscomeintoplay.
• FordCache,this:– wasEMI– istheGermannationalLSDMAproject– andwillbeINDIGO-DataCloud
• Thedrawback:Theytellyouwhattheywanttoseeinyourcode.
![Page 6: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/6.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|6
Fundedfeaturesarenotnecessarythoseyouneed?
• However,dCachehassomeinvariantobjectives:– Themasterplan(lastslideofthispresentation)– Beuptodateonnewtechnologies,eithersoftwareorhardware.
– Attractnewcommunitiesastheirspecificrequirements, iftheycanbefulfilled,makedCacheevenbetter.
• Itcanbeabittrickytotunethefundingprojectsexactlyintothedirectionofourobjectives.
• So,let’sseehowdCachemanaged/es that…….
![Page 7: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/7.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|7
FundinginfluencesdCachedevelopmenttopics
2010 2013
Standardization
NFS4.1/pNFS
HTTP/WebDAV
ContributingtotheDynamicFederation
INDIGO DataCloud
2015 2018
DataLifeCycleMultiTierStorage
QualityofService
MigrationArchivingAAI
Deployingnewtechnologies intoProductionandexploringnewcommunities
![Page 8: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/8.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|8
From2013tonow,wesloweddowndevelopmentbetweentwoverydemandingdevelopmentprojects,EMIandINDIGO-DataCloud,to:
• Deploynewlyimplementedtechnologiesintoproduction.
• Explorenewcommunitiesandlearnabouttheirneeds.
![Page 9: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/9.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|9
DeployingNFSintoproduction
• CMSGridInfrastructure@DESY• TheDesy-Cloud• FERMIlab(variousIntensityDrontier)• Andtheissues
![Page 10: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/10.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|10
NewProductionSystemsbasedondCacheNFS.
NFS4.1/pNFS
DirectlowlatencyaccessWorkernodesHPC
dCacheBackendStorageLayer
WideAreaFTSGLOBUS(ONLINE)
Sync&ShareLaptopsMobileDevices
SeePaul’spresentationonThursday
![Page 11: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/11.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|11
CMSTierII@DESY
• SlowlymigratingCMSGridworkernodestoNFS4.1dataaccess.
• Goodexperienceaslongasthenetworkisstable.
![Page 12: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/12.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|12
NFS4.1pNFS dCap
ExecutionTime(hours)
JobEfficiency(CPU
/W
allTim
e)
JobEfficiency(NFS– dCap)
![Page 13: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/13.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|13
Aswithallnewsspec’s,thereareissues
• Networkproblemscausethesystemtobebehaveunpredictable.
• DataServerbehindfirewalls• WeakclientsonVM’s• SpecificationViolation– infinitestaterecoverywithLinuxkernel
![Page 14: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/14.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|14
Exploringnewcommunities.
![Page 15: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/15.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|15
Exploringnewcommunities.
• Jülich – AachenResearchAssociation,JADE– "SupercomputingandmodelingfortheHumanBrain(SMHB)”,associatedtotheEuropeanHumanBrainProject(PlenarybyKHMeier)
• MoSGrid– ScientificGatewayformolecularsimulation.
• VAVID– DataGatewayforanalyzingwindenergyinfrastructures
![Page 16: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/16.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|16
JADE
Aachen
Jülich
![Page 17: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/17.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|17
ProjectsinHPC
HPCjobsonsupercomputer
HPCjobsgetaccesstodCachestorage.
![Page 18: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/18.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|18
WiththestartofINDIGO-DataCloud,itsmoneyandalargerteam(8+3)wecancontinueto
explorenewhorizons.(Backtodevelopmentmode)
![Page 19: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/19.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|19
• NewDiskTechnologies– OpenEthernetDisks(HGST)
• NewObject-StoreBack-ends– CEPH
• NewEuropeanProjects(INDIGODC)– FocusingonDataQualityofServiceand– DataLifecycleManagement
Respondingtonewtechnologies
![Page 20: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/20.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|20
HGSTOpenEthernetDisks
• SmallARMCPUwithEthernetpiggybackedonregularDisk.
• Spec:– AnyLinux(Debian ondemo)– CPU32-bitARM,512Level2– 2GBDRAMDDR-3Memory
• 1792MBavailable
– BlockstoragedriverasSCSIsda– Ethernetnetworkdriveraseth0
![Page 21: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/21.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|21
HGSTOpenEthernetDisks(cont)
• AdditionalCPUisnotusedbydiskitselfandcanrunarbitrarycustomerOS.
• Diskisseenasregularblockdevice.
• Notyetonthemarket.• dCachegot5disksandweareevaluatingtorunpoolnodesonthediskitself.
• SeetalkonThursday.
![Page 22: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/22.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|22
ResponsetoCEPH
• CEPHcomplementsdCacheperfectly.– SimplifiesoperatingdCachedisks.– dCacheaccessesdataasobject-storeanywayalready.
• dCacheisevaluatinga‘twostepapproach’.– Eachpoolssees itownobjectspaceinCEPH– Allpoolshaveaccess totheentirespace,whichisaslightchangeofdCache
poolsemantics.• WouldmergeCEPHanddCacheadvantages
– MultiTier(Tape,Disk,SSD)– Multiprotocolsupportforacommonnamespace.
• Allprotocolsseethesamenamespace– AllthedCacheAAIfeatures
• SupportforX509,Kerberos,username/password
![Page 23: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/23.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|23
INDIGO-DataCloudCheat-Sheet
• Horizon2020projectstartingAprilorMay• Budget11.1MillionEuros(800.000fordCache)• 26Partners• Duration30months• TheprojectaimsforanOpenSourceDataandComputingplatformtargetedatscientificcommunities,deployableonmultiplehardware,andprovisionedoverprivateandpublice-infrastructures.
SeeLudek’s presentationonWednesday
![Page 24: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/24.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|24
INDIGOinanutshell
1. Self-service,on-demand2. Accessthroughthenetwork3. Resourcepooling4. Elasticity(withinfinite resources)5. Payasyougo
Intheend,ApplicationsRule.
StolenfromDavide Salomoni (ProjectDirector)
![Page 25: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/25.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|25
dCacheinvolvementinINDIGO
• dCacheismostlyinvolvedinWP4,whichisaboutVirtualInfrastructures.(IaaS)
• Forstoragesystems,likedCache,thisessentiallymeansSDS(SoftwareDefinedStorage),whichaccordingtoWikipedia is:– Software-definedstorage(SDS) isanevolvingconceptforcomputerdatastoragesoftwaretomanagepolicy-basedprovisioningandmanagementofdatastorageindependentofhardware.
![Page 26: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/26.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|26
SDSaccordingtodCache
• User/PaaS defined“QualityofService”management– User/PaaS defined“AccessLatency”
• SSDorTapedependingfromapplicationrequirements.
– User/PaaS Defined“DataProtection”• Ononedisk,twodisksortreetapesdependingonhowpreciousyourdatais.
– User/PaaS Defined“DataMigrationPolicies”• LikeAmazonGlaciervers.S3
• AutomaticStorage-Tiermigration– Basedonaccessprofile
• Allthiswouldn’tbeneededifSSD’s wouldbecheapand100%reliable.
![Page 27: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/27.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|27
dCacheiswellprepared
HistoricallydCachesupportsmulti-tierstorageandthecorrespondingtransition.
SSDs
SpinningDisks
Tape, BlueRay…
Virtual File-systemLayer
NFS/pNFS gridFTPhttpWebDAV xRootd/dCapAutomatic
andManualMedia
transitions
![Page 28: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/28.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|28
Recentlyadded
Weoptimizedthe‘smallfile’problemwithdisk<->tapetransitions.
TapeSystem
Containers
![Page 29: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/29.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|29
What’smissing
• Mainlyacommonagreement(standard)onhowtotriggertransitions.(Protocol,API??)
• WehavesomeexperiencewithSRM,howeveritseemsnottobesuitableforthispurpose.
• AnothercandidateisCMDI(SNIA),whichisanindustrystandard.
• MigrationPoliciesarealreadydiscussed,documentedandimplementedwithinRDA(PracticalPolicyWorkingGroup).
• DetailswillonlybeavailableaftertheINDIGOkickoffmeetingendofApril‘15.
![Page 30: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/30.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|30
Summary
Magically,uptonow,attherightmoment,therewasalwaysanEUorNationalProject,fundingdCacheexactlyforthosefeaturesoractivites,dCachewasplanningtodoanywayandwiththattheyhelpedusfollowingourmasterplan:
ThesupportoftheCompleteScientificBigDataLifeCycleManagement.
![Page 31: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/31.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|31
ScientificDataLifecycle
HighSpeedDataIngest
FastAnalysisNFS4.1/pNFS
WideAreaTransfers(Globus Online,FTS)byGridFTP
Visualization&SharingbyWebDAV,OwnCloud
![Page 32: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/32.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|32
Don’tforget
UpcomingdCacheWorkshop
![Page 33: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/33.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|33
TheEND
furtherreadingwww.dCache.org
![Page 34: Supporting the scientific data lifecycle · 2016-01-07 · • Horizon 2020 project starting April or May • Budget 11.1 Million Euros (800.000 for dCache) • 26 Partners • Duration](https://reader034.vdocuments.site/reader034/viewer/2022052613/5f1af98f9940231acd56e502/html5/thumbnails/34.jpg)
SupportingtheScientificDataLifecycle|ISGC2015,Taipei|PatrickFuhrmann|17March2015|34