the science dmz: recent developments · 16/05/2017  · • science dmz as plaorm • modern...

24
The Science DMZ: Recent Developments Eli Dart, Network Engineer ESnet Science Engagement Lawrence Berkeley Na@onal Laboratory WRNP17 Belém, Brazil May 16, 2017 © 2017, Energy Sciences Network

Upload: others

Post on 04-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

TheScienceDMZ:RecentDevelopments

EliDart,NetworkEngineerESnetScienceEngagementLawrenceBerkeleyNa@onalLaboratory

WRNP17

Belém,Brazil

May16,2017

©2017,EnergySciencesNetwork

Page 2: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

Overview

•  ScienceDMZAsPlaMorm•  ModernResearchDataPortal

•  PacificResearchPlaMorm– PRP– NRP

•  Note:ThistalkassumesyoualreadyunderstandtheScienceDMZ

–  Ifyouhaven’tencounteredtheScienceDMZ,severalfolksinRNPcanhelpyou,includingLeandroCiuffoandAlexMoura

–  Orcheckoutthefasterdataknowledgebase:•  hXp://fasterdata.es.net/science-dmz/

2 – ESnet Science Engagement ([email protected]) - 5/15/17

©2017,EnergySciencesNetwork

Page 3: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

•  OncetherearemanyScienceDMZsinyournetwork,morethingsbecomepossible

•  Easyfiletransferisgood,butwhatelsecanwedo?– Updatethearchitectureofdataportals– Buildservicesbetweenins@tu@ons–  Interconnectfacili@es

•  Severaleffortsunderwaytodothesethings

ScienceDMZAsAPla3orm

3 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 4: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

ScienceDataPortals

•  Largerepositoriesofscien@ficdata–  Climatedata–  Skysurveys(astronomy,cosmology)–  Manyothers–  Datasearch,browsing,access

•  Manyscien@ficdataportalsweredesigned15+yearsago–  Single-web-serverdesign–  Databrowse/search,dataaccess,userawarenessallinasinglesystem–  Allthedatagoesthroughtheportalserver

•  Inmanycasesbydesign•  E.g.embargobeforepublica@on(enforceaccesscontrol)

4 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 5: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

LegacyPortalDesign

10GE

Border Router

WAN

Firewall

Enterprise

perfSONAR

perfSONAR

Filesystem(data store)

10GE

Portal Server

Browsing pathQuery pathData path

Portal server applications:· web server· search· database· authentication· data service

•  Verydifficulttoimproveperformancewithoutarchitecturalchange–  Sodwarecomponentsalltangledtogether

–  DifficulttoputthewholeportalinaScienceDMZbecauseofsecurity

–  EvenifyoucouldputitinaDMZ,manycomponentsaren’tscalable

•  Whatdoesarchitecturalchangemean?

5 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 6: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

ExampleofArchitecturalChange–CDN

•  Let’slookatwhatContentDeliveryNetworksdidforwebapplica@ons•  CDNsareawell-deployeddesignpaXern

–  Akamaiandfriends–  En@reindustryinCDNs–  Assumedpartoftoday’sInternetarchitecture

•  WhatdoesaCDNdo?–  Storesta@ccontentinaseparateloca@onfromdynamiccontent

•  Complexityisn’tinthesta@ccontent–it’sintheapplica@ondynamics•  Webapplica@onsarecomplex,full-featured,andslow–  Databases,userawareness,etc.–  Lotsofintegratedpieces

•  Dataserviceforsta@ccontentissimplebycomparison

–  Separa@onofapplica@onanddataserviceallowseachtobeop@mized

6 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 7: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

ClassicalWebServerModel

•  Webbrowserfetchespagesfromwebserver–  Allcontentstoredonthewebserver–  Webapplica@onsrunonthewebserver

•  Webservermaycallouttolocaldatabase•  Fundamentallyallprocessingislocaltothewebserver

–  Webserversendsdatatoclientbrowseroverthenetwork•  Perceivedclientperformancechangeswithnetworkcondi@ons

–  Severalproblemsinthegeneralcase–  Latencyincreases@metopagerender–  Packetloss+latencycauseproblemsforlargesta@cobjects

HostingProvider

TransitNetwork

Residential BroadbandWEB

Long Distance / High Latency

Web Server

Browser

7 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 8: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

SoluFon:PlaceLargeStaFcObjectsNearClient

HostingProvider

TransitNetwork

Residential BroadbandWEB

Long Distance / High Latency

CDN

DATA

Short Distance / Low Latency

Web Server

CDN Data Server

Browser

•  CDNprovidessta@ccontent“close”toclient–  Latencygoesdown

•  Timetopagerendergoesdown•  Sta@ccontentperformancegoesup

–  Loadonwebservergoesdown(noneedtoservesta@ccontent)

–  Webservers@llmanagescomplexbehavior•  Localreasoning/fastchangesforapplica@onowner

•  Significantwinforwebapplica@onperformance

8 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 9: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

ClientSimplySeesIncreasedPerformance

•  Clientdoesn’tseetheCDNasaseparatething–  Webcontentisalls@llviewedinabrowser

•  Browserfetcheswhatthepagetellsittofetch•  Differentcontentcomesfromdifferentplaces•  Userdoesn’tknow/care

•  CDNsprovideanarchitecturalsolu@ontoaperformanceproblem–  Notbrute-force–  Worksmarter,notharder

The‘NetWEB

Browser

Web Server

Rich, Slow

DATA

CDN Data Server

Simple,Fast

The‘NetWEB

Browser

Web Server

©2017,EnergySciencesNetwork9 – ESnet Science Engagement ([email protected]) - 5/15/17

Page 10: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

ArchitecturalExaminaFonofDataPortals

•  Commondataportalfunc@ons(mostportalshavethese)–  Search/query/discovery–  Datadownloadmethodfordataaccess–  GUIforbrowsingbyhumans–  APIformachineaccess–ideallyincorporatessearch/query+download

•  Performancepainisprimarilyinthedatahandlingpiece–  Rapidincreaseindatascaleeclipsedlegacysodwarestackcapabili@es–  Portalserversodenstuckinenterprisenetwork

•  Canwe“disassemble”theportalandputthepiecesbacktogetherbeXer?–  UseScienceDMZasaplaMormforthedatapiece–  AvoidplacingcomplexsodwareintheScienceDMZ

10 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 11: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

LegacyPortalDesign

10GE

Border Router

WAN

Firewall

Enterprise

perfSONAR

perfSONAR

Filesystem(data store)

10GE

Portal Server

Browsing pathQuery pathData path

Portal server applications:· web server· search· database· authentication· data service

11 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 12: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

Next-GeneraFonPortalLeveragesScienceDMZ

10GE10GE

10GE

10GE

Border Router

WAN

Science DMZSwitch/Router

Firewall

Enterprise

perfSONAR

perfSONAR

10GE

10GE

10GE10GE

DTN

DTN

API DTNs(data access governed

by portal)

DTN

DTN

perfSONAR

Filesystem (data store)

10GE

Portal Server

Browsing pathQuery path

Portal server applications:· web server· search· database· authentication

Data Path

Data Transfer Path

Portal Query/Browse Path

12 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 13: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

PutTheDataOnDedicatedInfrastructure

•  Wehaveseparatedthedatahandlingfromtheportallogic•  Portaliss@llitsnormalself,butenhanced

–  PortalGUI,database,search,etc.allfunc@onastheydidbefore–  QueryreturnspointerstodataobjectsintheScienceDMZ–  Portalisnowfreedfrom@estothedataservers(runitonAmazonifyouwant!)

•  Datahandlingisseparate,andscalable–  High-performanceDTNsintheScienceDMZ–  Scaleasmuchasyouneedtowithoutmodifyingtheportalsodware

•  Outsourcedatahandlingtocompu@ngcentersorcampuscentralstorage–  Compu@ngcentersaresetupforlarge-scaledata–  Letthemhandlethelarge-scaledata,andlettheportaldotheorchestra@onofdataplacement

13 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 14: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

The Pacific Research Platform Creates a Regional End-to-End Science-Driven “Big Data Freeway System”

NSF CC*DNI Grant $5M 10/2015-10/2020

•  PI: Larry Smarr, UC San Diego Calit2

•  Co-PIs: -  Camille Crittenden, UC

Berkeley CITRIS, -  Tom DeFanti, UC San Diego

Calit2, -  Philip Papadopoulos, UC

San Diego SDSC, -  Frank Wuerthwein, UC San

Diego Physics and SDSC

Page 15: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

PRPProvidesInteroperability

•  ScienceDMZsatpar@cipa@ngsitesensureinteroperability•  PRPengineersworktoensuretheyinteroperate

–  GlobusdatatransferbetweenDTNs–  perfSONAR

•  Somevaria@oninDTNs–  SomehaveFIONADTNs

•  FIONA==FlashI/ONetworkAppliance•  DesignedbyPRPengineersatUCSanDiego•  hXps://fasterdata.es.net/science-dmz/DTN/fiona-flash-i-o-network-appliance/

–  SomehaveDTNsconnectedtoHPCstorage•  Key–theyallinteroperate,removingintegra@onburdenfromscien@sts

15 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 16: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

PRPScienceDrivers

•  Mul@plescienceareas– Astronomyandastrophysics– Biomedicalapplica@ons–  Lifesciences–  Par@clephysics– Virtualrealityanddatavisualiza@on

•  hXp://prp.ucsd.edu/

5/15/1716

Page 17: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

NaFonalResearchPla3orm(NRP)

•  ReplicatethePRPonana@onalscale•  Interoperable,high-performancecyberinfrastructure

–  Builttoservedomainscience–  Scaleupto~200ins@tu@ons

•  Firstworkshoptobeheldthissummer–  Domainscienceinput–  Policyques@ons–  Architecture,scalability–  IncludecampusIT,regionalnetworks,na@onalnetworks,fundingagencies,etc.inacommonconversa@on.

17 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 18: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

PetascaleDTNProject

•  AnotherexampleofbuildingontheScienceDMZ•  Supportsalldata-intensiveapplica@onswhichrequirelarge-scaledataplacement

•  Collabora@onbetweenHPCfacili@es–  ALCF,NCSA,NERSC,OLCF

•  Goal:per-Globus-jobperformanceat1PB/weeklevel–  15gigabitspersecond–  Withchecksumsturnedon,etc.–  Nospecialshortcuts,noarcaneop@ons

•  Referencedatasetis4.4TBofastrophysicsmodeloutput–  Mixoffilesizes–  Manydirectories–  Realdata!

18 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 19: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

PetascaleDTNProject

10.0 Gbps

17.6 Gbps

14.8 Gbps

19.3 Gbps

17.4 Gbps 17.0 Gbps

32.4 Gbps

25.3 Gbps

18.3 Gbps

16.3 Gbps

24.1 Gbps

24.0 Gbps

DTN

DTN

DTN

DTN

alcf#dtn_miraALCF

nersc#dtnNERSC

olcf#dtn_atlasOLCF

ncsa#BlueWatersNCSA

Data set: L380Files: 19260Directories: 211Other files: 0Total bytes: 4442781786482 (4.4T bytes)Smallest file: 0 bytes (0 bytes)Largest file: 11313896248 bytes (11G bytes)Size distribution:

1 - 10 bytes: 7 files10 - 100 bytes: 1 files100 - 1K bytes: 59 files1K - 10K bytes: 3170 files10K - 100K bytes: 1560 files100K - 1M bytes: 2817 files1M - 10M bytes: 3901 files10M - 100M bytes: 3800 files100M - 1G bytes: 2295 files1G - 10G bytes: 1647 files10G - 100G bytes: 3 files

March 2017L380 Data Set

19 – ESnet Science Engagement ([email protected]) - 5/15/17 ©2017,EnergySciencesNetwork

Page 20: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

Thanks!

[email protected](ESnet)LawrenceBerkeleyNa@onalLaboratory

hXp://fasterdata.es.net/

hXp://my.es.net/

hXp://www.es.net/

Page 21: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

ExtraSlides

5/15/1721

Page 22: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

WhatIsScienceEngagement?

•  Technologypeopleworkingwithscien@ststohelpsolveproblems–  Improvedatatransferperformance–  Improvedataworkflows(e.g.torequirelesshumaneffort)–  Improveexperimentopera@ons–  …andmore…

•  Usingexperiencegainedfromhelpingscien@ststoimprovecyberinfrastructure–  Networkdesign–  Tooldesign–  Systemdesign

5/15/1722

Page 23: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

EngagementIsImportant:OldModel

•  Scien@stasintegrator– Requiresscien@ststodiscovernewtechnologies– Requiresscien@ststobecomeexpertinnewtechnologies– Requiresscien@ststoassembledis@ncttechnologiesintoanintegratedsolu@onthatworksforthem

–  Somescien@stsdothisbrilliantly–mostdonot

5/15/1723

Page 24: The Science DMZ: Recent Developments · 16/05/2017  · • Science DMZ As Plaorm • Modern Research Data Portal • Pacific Research Plaorm – PRP – NRP • Note: This talk

EngagementIsImportant:NewModel

•  Scien@stascollaborator–  Technologistsunderstandtechnology–  Technologistsunderstandenoughofthesciencetoseehowtechnologyfits

–  Technologistshelpscien@stsadoptausefulsolu@on–  Thisismuchmoreproduc@ve,andrequiresscienceengagement

5/15/1724