paul avery university of florida phys.ufl/~avery/ [email protected]

26
Brussels Gri d Meeting (M ar. 23, 2001 Paul Avery 1 Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ [email protected] Extending the Grid Reach in Europe Brussels, Mar. 23, 2001 p://www.phys.ufl.edu/~avery/griphyn/talks/avery_brussels_23mar01.pp Global Data Grids The Need for Infrastructure

Upload: cassady-beasley

Post on 31-Dec-2015

38 views

Category:

Documents


0 download

DESCRIPTION

Global Data Grids The Need for Infrastructure. Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ [email protected]. Extending the Grid Reach in Europe Brussels, Mar. 23, 2001 http://www.phys.ufl.edu/~avery/griphyn/talks/avery_brussels_23mar01.ppt. Global Data Grid Challenge. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 1

Paul AveryUniversity of Floridahttp://www.phys.ufl.edu/~avery/[email protected]

Extending the Grid Reach in EuropeBrussels, Mar. 23, 2001

http://www.phys.ufl.edu/~avery/griphyn/talks/avery_brussels_23mar01.ppt

Global Data GridsThe Need for Infrastructure

Page 2: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 2

Global Data Grid Challenge

“Global scientific communities, served by networks with bandwidths varying by orders of magnitude, need to perform computationally demanding analyses of geographically distributed datasets that will grow by at least 3 orders of magnitude over the next decade, from the 100 Terabyte to the 100 Petabyte scale.”

Page 3: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 3

Data Intensive Science: 2000-2015Scientific discovery increasingly driven by IT

Computationally intensive analysesMassive data collectionsRapid access to large subsetsData distributed across networks of varying capability

Dominant factor: data growth (1 Petabyte = 1000 TB)

2000 ~0.5 Petabyte2005 ~10 Petabytes2010 ~100 Petabytes2015 ~1000 Petabytes?

How to collect, manage,access and interpret thisquantity of data?

Page 4: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 4

Data Intensive DisciplinesHigh energy & nuclear physicsGravity wave searches (e.g., LIGO, GEO, VIRGO)Astronomical sky surveys (e.g., Sloan Sky Survey)Global “Virtual” ObservatoryEarth Observing SystemClimate modelingGeophysics

Page 5: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 5

Data Intensive Biology and Medicine

Radiology dataX-ray sources (APS crystallography data)Molecular genomics (e.g., Human Genome)Proteomics (protein structure, activities, …)Simulations of biological molecules in situHuman Brain ProjectGlobal Virtual Population Laboratory (disease

outbreaks)TelemedicineEtc.Commercial applications not far behind

Page 6: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 6

The Large Hadron Collider at CERN“Compact” Muon Solenoid

at the LHC

Standard man

Page 7: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 7

1800 Physicists150 Institutes32 Countries

LHC Computing ChallengesComplexity of LHC environment and resulting dataScale: Petabytes of data per year (100 PB by 2010)Global distribution of people and resources

CMS Experiment

Page 8: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 8

Global LHC Data Grid Hierarchy

Tier 1

T2

T2

T2

T2

T2

3

3

3

3

3

3

3

3

3

3

3

Tier 0 (CERN)

4 4 4 4

3 3

Tier0 CERNTier1 National LabTier2 Regional Center at UniversityTier3 University workgroupTier4 Workstation

GriPhyN:R&DTier2 centersUnify all IT resources

Page 9: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 9

Global LHC Data Grid Hierarchy

Tier2 Center

Online System

CERN Computer Center > 20

TIPS

France Center

USA Center Italy Center UK Center

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations,other portals

~100 MBytes/sec

2.5-10 Gb/sec

100 - 1000

Mbits/sec

Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute has ~10 physicists working on one or more channels

Physics data cache

~PBytes/sec

2.5-10 Gb/sec

Tier2 CenterTier2 CenterTier2 Center

~622 Mbits/sec

Tier 0 +1

Tier 1

Tier 3

Tier 4

Tier2 Center

Experiment

Tier 2

Page 10: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 10

Global Virtual Observatory

Source Catalogs Image Data

Specialized Data:Spectroscopy, Time Series,

PolarizationInformation Archives:

Derived & legacy data: NED,Simbad,ADS, etcDiscovery Tools:

Visualization, Statistics

Standards

Multi-wavelength astronomy,Multiple surveys

Page 11: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 11

GVO: The New AstronomyLarge, globally distributed database engines

Integrated catalog and image databasesMulti-Petabyte data sizeGbyte/s aggregate I/O speed per site

High speed (>10 Gbits/s) backbonesCross-connecting, correlating the major archives

Scalable computing environment100s–1000s of CPUs for statistical analysis and discovery

Page 12: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 12

Infrastructurefor

Global Grids

Page 13: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 13

Grid InfrastructureGrid computing sometimes compared to electric

gridYou plug in to get resource (CPU, storage, …)You don’t care where resource is located

This analogy might have an unfortunate downside

You might need different sockets!

Page 14: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 14

Role of Grid Infrastructure Provide essential common Grid infrastructure

Cannot afford to develop separate infrastructures

Meet needs of high-end scientific collaborationsAlready international and even global in scopeNeed to share heterogeneous resources among membersExperiments drive future requirements

Be broadly applicable outside scienceGovernment agencies: National, regional (EU), UNNon-governmental organizations (NGOs)Corporations, business networks (e.g., supplier networks)Other “virtual organizations”

Be scalable to the Global levelBut EU + US is a good starting point

Page 15: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 15

A Path to Common Grid Infrastructure

Make a concrete planHave clear focus on infrastructure and standardsBe driven by high-performance applicationsLeverage resources & act coherentlyBuild large-scale Grid testbedsCollaborate with industry

Page 16: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 16

Building Infrastructure from Data Grids

3 Data Grid projects recently fundedParticle Physics Data Grid (US, DOE)

Data Grid applications for HENPFunded 2000, 2001http://www.ppdg.net/

GriPhyN (US, NSF)Petascale Virtual-Data GridsFunded 9/2000 – 9/2005http://www.griphyn.org/

European Data Grid (EU)Data Grid technologies, EU deploymentFunded 1/2001 – 1/2004http://www.eu-datagrid.org/

HEP in common

Focus: infrastructure development & deployment

International scope

Page 17: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 17

Background on Data Grid ProjectsThey support several disciplines

GriPhyN: CS, HEP (LHC), gravity waves, digital astronomy

PPDG: CS, HEP (LHC + current expts), Nuc. Phys., networking

DataGrid: CS, HEP, earth sensing, biology, networking

They are already joint projectsEach serving needs of multiple constituenciesEach driven by high-performance scientific applicationsEach has international componentsTheir management structures are interconnected

Each project developing and deploying infrastructure

US$23M (additional proposals for US$35M)What if they join forces?

Page 18: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 18

A Common Infrastructure Opportunity

GriPhyN + PPDG + EU-DataGrid + national effortsFrance, Italy, UK, Japan

Have agreed to collaborate, develop joint infrastructure

Initial meeting March 4 in Amsterdam to discuss issuesFuture meetings in June, July

Preparing management document Joint management, technical boards + steering committee Coordination of people, resourcesAn expectation that this will lead to real work

Collaborative projectsGrid middleware Integration into applicationsGrid testbed: iVDGLNetwork testbed (Foster): T3 = Transatlantic Terabit

Testbed

Page 19: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 19

iVDGL International Virtual-Data Grid Laboratory

A place to conduct Data Grid tests at scaleA concrete manifestation of world-wide grid activityA continuing activity that will drive Grid awarenessA basis for further funding

Scale of effortFor national, international scale Data Grid tests, operationsComputationally and data intensive computingFast networks

Who Initially US-UK-EUOther world regions laterDiscussions w/ Russia, Japan, China, Pakistan, India, South

America

Page 20: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 20

iVDGL ParametersLocal control of resources vitally important

Experiments, politics demand itUS, UK, France, Italy, Japan, ...

Grid ExercisesMust serve clear purposesWill require configuration changes not trivial“Easy”, intra-experiment tests first (10-20%, national,

transatlantic)“Harder” wide-scale tests later (50-100% of all resources)

Strong interest from other disciplinesOur CS colleagues (wide scale tests)Other HEP + NP experimentsVirtual Observatory (VO) community in Europe/USGravity wave community in Europe/US/(Japan?)Bioinformatics

Page 21: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 21

Revisiting the Infrastructure PathMake a concrete plan

GriPhyN + PPDG + EU DataGrid + national projects

Have clear focus on infrastructure and standardsAlready agreedCOGS (Consortium for Open Grid Software) to drive

standards?

Be driven by high-performance applicationsApplications are manifestly high-perf: LHC, GVO,

LIGO/GEO/Virgo, … Identify challenges today to create tomorrow’s Grids

Page 22: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 22

Revisiting the Infrastructure Path (cont)

Leverage resources & act coherentlyWell-funded experiments depend on Data Grid infrastructureCollab. with national laboratories: FNAL, BNL, RAL, Lyon, KEK,

…Collab. with other Data Grid projects: US, UK, France, Italy,

JapanLeverage new resources: DTF, CAL-IT2, …Work through Global Grid Forum

Build and maintain large-scale Grid testbeds iVDGLT3

Collaboration with industry next slideEC investment in this opportunity

Leverage and extend existing projects, worldwide expertise Invest in testbedsWork with national projects (US/NSF, UK/PPARC, …)

Part of same infrastructure

Page 23: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 23

Collaboration with Industry Industry efforts are similar, but only in spirit

ASP, P2P, home PCs, … IT industry mostly has not invested in Grid R&DWe have different motives, objectives, timescales

Still many areas of common interestClusters, storage, I/OLow cost cluster managementHigh-speed, distributed databasesLocal and wide-area networks, end-to-end performanceResource sharing, fault-tolerance, …

Fruitful collaboration requires clear objectivesEC could play important role in enabling

collaborations

Page 24: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 24

Status of Data Grid ProjectsGriPhyN

US$12M funded by NSF/ITR 2000 program (5 year R&D)2001 supplemental funds requested for initial

deploymentsSubmitting 5-year proposal ($15M) to NSF Intend to fully develop production Data Grids

Particle Physics Data GridFunded in 1999, 2000 by DOE ($1.2 M per year)Submitting 3-year Proposal ($12M) to DOE Office of

Science

EU DataGrid10M Euros funded by EU (3 years, 2001 – 2004)Submitting proposal in April for additional funds

Other projects?

Page 25: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 25

Grid ReferencesGrid Book

www.mkp.com/grids

Globuswww.globus.org

Global Grid Forumwww.gridforum.org

PPDGwww.ppdg.net

EU DataGridwww.eu-datagrid.org/

GriPhyNwww.griphyn.org

Page 26: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Brussels Grid Meeting (Mar. 23, 2001)

Paul Avery 26

SummaryGrids will qualitatively and quantitatively change

the nature of collaborations and approaches to computing

Global Data Grids provide challenges needed to build tomorrows Grids

We have a major opportunity to create common infrastructure

Many challenges during the coming transitionNew grid projects will provide rich experience and lessonsDifficult to predict situation even 3-5 years ahead