network in egee building end-to-end network services for the grid
DESCRIPTION
Network in EGEE Building end-to-end network services for the Grid. Mathieu Goutelle – CNRS UREC, France EGEE-II SA2 “Networking support” [email protected]. Outline. Short presentation of EGEE, The network in EGEE: Network services? - PowerPoint PPT PresentationTRANSCRIPT
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
Network in EGEE
Building end-to-end network servicesfor the Grid
Mathieu Goutelle – CNRS UREC, FranceEGEE-II SA2 “Networking support”[email protected]
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 2
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Outline
• Short presentation of EGEE,• The network in EGEE:
– Network services?– EGEE focus on end-to-end services in a multi-domain context.
• Network services:– Resource reservation,– Service Level Agreement.
• Operational services:– Monitoring,– EGEE Network Operational Centre.
• Summary & conclusion
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 3
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
EGEE in a nutshell…• EGEE:
– 1 April 2004 – 31 March 2006– 71 partners in 27 countries, federated in regional Grids
• EGEE-II:– 1 April 2006 – 31 March 2008– 91 partners in 32 countries – 13 Federations
• Objectives:– Large-scale, production-quality
infrastructure for e-Science– Attracting new resources and
users from industry as well asscience
– Improving and maintaining “gLite” Grid middleware
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 4
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
EGEE in a nutshell…
• More than 20 applications from 7 domains:– Astrophysics:
MAGIC, Planck– Computational Chemistry– Earth Sciences:
Earth Observation, Solid Earth Physics, Hydrology, Climate – Financial Simulation:
E-GRID– Fusion– Geophysics:
EGEODE– High Energy Physics:
4 LHC experiments (ALICE, ATLAS, CMS, LHCb) BaBar, CDF, DØ, ZEUS
– Life Sciences: Bioinformatics (Drug Discovery, GPS@, Xmipp_MLrefine, etc.) Medical imaging (GATE, CDSS, gPTM3D, SiMRI 3D, etc.)
– Multimedia– Material Sciences – …
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 5
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
EGEE Infrastructure
Country participating
in EGEE
Scale (June 2006):~ 200 sites in 40 countries
~ 25 000 CPUs
> 10 PB storage
> 35 000 jobs per day
> 100 Virtual Organizations
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 6
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Network infrastructure
Connects 32 NRENsOver 3M users
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 7
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Network infrastructure (cont.)
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 8
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
End-to-end network services?
• What type of services?– Network services are available to the EGEE sites:
Premium IP and similar (QBSS e.g.), “lightpath” or network resource reservation, IPv6, multicast…
– Operational services are available to the EGEE sites: Monitoring of the network (local & backbone), Operational data (incident, maintenance).
• How to ensure the service continuity along the path?– In the last mile?– In a multi-domain context?
• What about service availability, interface standardization, inter-domain agreements, etc.
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 9
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
EGEE focus
• Network services:– Network resource reservation:
Bandwidth Allocation and Reservation (BAR), Dedicated talk on that subject (see session 1, “End to End
Bandwidth Allocation and Reservation for Grid applications”).
– Service Level Agreement (SLAs): End-to-end SLAs?
• Operational services:– Monitoring:
Network Performance Monitoring (NPM), Dedicated talk on that subject (see session 2, “Federated Network
Performance Monitoring for the Grid”).
– Coordination of operational actions: Concept of the EGEE Network Operational Centre (ENOC).
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 10
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Network resource reservation
• Based on the framework currently being built by the GÉANT2 project:– Hides the multi-domain, multiple technologies issues;– Provides at the Grid level:
A seamless interface for service requests at the “customer” layer; High-level view of the network, with request of characteristics and
not of a particular service; Reduced configuration lead-time; A description of the service level.
• Issues remain:– A component (BAR, see dedicated talk) gives access to these
interfaces at the middleware layer, but the application layer is not yet ready;
– Need of sub-management of the macroscopic reserved resource at the Grid level;
– What about domains outside the GÉANT2 cloud?
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 11
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Quick look at the BAR architecture
• Clear demarcation between the Grid and the network:– The network is hidden from the Grid (technology, multi-domain
issues…);– The Grid is hidden to the network (only knows one “EGEE” user);– Allows a two-stage process (reservation & activation) suitable in a Grid
context;
Extended QoS Network
HLM
BARBAR
Site 1 Site 2
NSAPL-NSAP L-NSAP
Network
EGEE
Network 1 Network 2 Network 3
L-Network L-Network
NSAPNSAP
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 12
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SLAs
• “SLAs”?– Description of the characteristics of the service provided (e.g.
after a successful resource reservation request);– Provided by each domain crossed by the data path;– Either manually filled in by a human or automatically if the
request is all handled by software.– Definition of templates in cooperation with GÉANT2:
Based on previous work inside EGEE and answers from GÉANT2 to some open issues (procedures, demarcation point…)
• SLA template:– Administrative part (contact, duration, troubleshooting
procedures);– SLS (Service Level Specification) part.
• The SLA is formed using the individual SLAs provided by all domains along the end-to-end path.
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 13
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SLAs (cont.)
• EGEE end-to-end SLA template:– Concatenation of the individual SLAs in each participating domains;– SLA between the border of the NRENs cloud (border-to-border SLA);
• Difficulty to accommodate and take into account the “last mile”:– If the “last-mile” network is not participating (no resource reservation
system, no SLA, etc.);– Try to address this with static information on these networks to provide
service characteristics to the user/application.
NREN 2
GEANT
NREN 1
EGEE RC A
Campus/MAN EGEE RC B
Campus/MAN
SLA 1 SLA 2
SLA 3
border-to-border connectivity
end-to-end connectivity
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 14
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SLA institution
• All domains involved in network services provisioning to EGEE as part of the existing network infrastructure hierarchy have to be categorized as one of:– Compliant with the Premium IP service,– Supportive of the Premium IP service,– Indifferent to the Premium IP service.
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 15
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
EGEE focus
• Network services:– Network resource reservation:
Bandwidth Allocation and Reservation (BAR), Dedicated talk on that subject (see session 1, “End to End
Bandwidth Allocation and Reservation for Grid applications”).
– Service Level Agreement (SLAs): End-to-end SLAs?
• Operational services:– Monitoring:
Network Performance Monitoring (NPM), Dedicated talk on that subject (see session 2, “Federated Network
Performance Monitoring for the Grid”).
– Operational Interface with the network: Concept of the EGEE Network Operational Centre (ENOC).
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 16
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Monitoring
• Not Yet Another Monitoring Framework!– Role of a Mediator between the various monitoring frameworks and the
various clients (diagnostic tools, middleware, etc.);– Network Performance Monitoring (NPM) gives access to data collected
at existing monitoring frameworks (site, backbone);– Use of the NMWG interface to access those frameworks and republish
data;– Special requirements for some middleware
components for faster access to data.
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 17
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Operational Interface
• The network infrastructure of EGEE is mainly served by a set of NRENs via GÉANT2;
• Need of an entity coordinating all the NOCs involved and the Grid Operations:– Concept of an end-to-end Coordination Unit (GÉANT2);– Providing an end-to-end operational support.
• A single point of contact as an operational interface between EGEE and GÉANT2/NRENs dealing with:– Network problems troubleshooting,– Interactions with network providers and Grid sites,– Notifications from NRENs,– Network SLA installation and monitoring.
• Two Functional Entities inside EGEE:– EGEE Network Operational Centre (ENOC);– A Network Trouble Ticket Manager – GGUS.
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 18
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
ENOC
• From the EGEE point of view:– GGUS acts as the first line support (interacts with the user);– Support units are the second level support;
• From the NRENs’ point of view:– EGEE (via the ENOC) is a single entity;– The ENOC is the only point of contact for the NRENs (submitter of the
problem).
GGUS
Users
SupportUnits
ENOC
NRENs
GÉANT2
EGEE Network
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 19
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
ENOC (cont.)
• Main challenges: – To create a network support structure inside EGEE;– To define the associated network operational procedures.
• The ENOC is the user support for network failures:– End-to-End network problems troubleshooting;– Coordination unit of the actions of all the entities involved in a
network incident;– Try to have an overall view of the end-to-end service, gathering
information from all the involved domains;– SLA Management: installation and monitoring.
• ENOC Operational Procedures have been defined and validated during the first phase of EGEE;
• EGEE-II will fully implement ENOC.
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 20
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
ENOC (cont.)
• ENOC Service:– Collect tickets from NRENs which agree to provide them to the
ENOC;– Forward to GGUS the ones that seem relevant (possible impact
on the Grid infrastructure);– Receive tickets assigned to ENOC by the GGUS 1st level
support;– Troubleshoot them with the help of monitoring tools;– Contact identified faulty domains or reassign ticket to the
associated site if there is no evidence of a backbone problem (e.g. LAN issue).
• Main Issues:– Load on the ENOC team (amount of info, etc.);– Heterogeneity of systems the ENOC has to deal with
(languages, trouble ticket format, monitoring, etc.).
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 21
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
ENOC status
• ENOC team is ready! 5 people (2 FTE) including one dedicated to it.
• ENOC receives operational information from GÉANT2 and 10 NRENs (more to come):
About 80% of all the EGEE sites covered; An average of 5 tickets handled per day; 8 different languages.
• Building tools to follow up or enhance the network support:
Network Operational Database (interconnection of administrative domains between the EGEE resource centres);
TT parsing and filtering tool; Dashboard to present overall status
of the “EGEE network”.
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 22
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
EGEE expectations
• Towards a better solution against our “multi-domain” and “end-to-end” issues
• Seamless access to network monitoring data: GÉANT2 will provide such access (PerfSonar), from multiple
domains, aggregating data from multiple frameworks;
• Network resource reservation: Requests expressed not in terms of service but of characteristics; The choice of the underlying technology to fulfil them is up to the
network; Answer to a request = SLA (depending of the current network status
& load); What about the last mile? The non-NRENs domains?
• Standardization of the operational interface: Trouble Ticket format (data schema and exchange format); Access method.
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 23
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Summary & conclusion
• Focus on providing end-to-end services in a multi-domain context:– Hiding the network complexity from the Grid (users, middleware,
Grid support);– Hiding the Grid complexity from the network (single point of
contact, operational interface);
• Many building blocks depend on the providers:– Resource reservation frameworks, SLA installation, backbone
monitoring;– Fortunately, EGEE and GÉANT2 built up a strong collaboration!
• Many things remains pending:– Mainly on the operational side (homogenization of the network
interface);– How to cope with domains outside the GÉANT2 cloud?
• The two infrastructures need to collaborate on these aspects.
GridNets 2006 – 2006-10-01 – San Jose, CA, USA 24
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Thank you for your attention!