introduction lhcopn dashboard (proposal functional design)
DESCRIPTION
Introduction LHCOPN dashboard (proposal functional design). Monitor Working Group: Initiated in Bologna 10 th & 11 th December 2009 WLCG MB mandate (see url below) First meeting 22 th January 2010 TC 26 th May 2010 TC 15 th June 2010 - PowerPoint PPT PresentationTRANSCRIPT
Introduction LHCOPN dashboard(proposal functional design)
Monitor Working Group:
• Initiated in Bologna 10th & 11th December 2009
• WLCG MB mandate (see url below)
• First meeting 22th January 2010
• TC 26th May 2010
• TC 15th June 2010
• Barcelona 28th and 29th June 2010: first proposal
Chairman: John Shade (CERN)
Website: https://twiki.cern.ch/twiki/bin/view/LHCOPN/MonWG
Full version of functional design proposal on above url.
My name Hanno Pet <[email protected]> (NL-T1 / SARA)
SARA Computing & Networking service, 25-6-2010
The problem
LHC experiments and WLCG users have not enough insight in the functioning of the LHCOPN because:
• Monitoring is decentralized at T0/T1 sites
• Monitoring is not accessible to them
The dashboard should solve these problems!
SARA Computing & Networking service, 25-6-2010
Requirements (1/4)
The requirements of the dashboard are as follows:
• Must only provide information about the LHCOPN keeping in mind the way application layers are using the LHCOPN. This means a full mesh of measurements is required
• Must provide correct and up to date information about each site’s IPv4 connectivity in the LHCOPN
• Must be simple for the LHC experiments and the WLCG user community
• Must provide more in-depth information for the T0/T1 sites router operators. The router operators must be able to drill down into the dashboard to see which measurements are causing the degraded or down status
SARA Computing & Networking service, 25-6-2010
Requirements (2/4)
• Must display a full mesh of end-to-end IPv4 unicast connectivity in the LHCOPN between each T0/T1 site
• Must use the application programming interface (API) of the perfSONAR-MDM measurement points to collect the data which is necessary for the functioning of the dashboard
• Must collect and display One Way Delay data gathered by the perfSONAR-MDM measurement points (and other parameters in the future)
• Must store (historical) data in its own database
SARA Computing & Networking service, 25-6-2010
Requirements (3/4)
• Must add new data from perfSONAR-MDM measurement points to its own database every <to be defined> minute(s)
• Must refresh dashboard status each <to be defined> minute
• Must provide an API for T0/T1 sites to generate alarms in their own NMS
• Must be able to make end-to-end IPv4 unicast connectivity reports
SARA Computing & Networking service, 25-6-2010
Requirements (4/4)
• Must be accessible via a web (https) interface for the LHC experiments and WLCG users with a grid certificate
• More detailed information will be available for the T0/T1 sites router operators with a grid certificate
• Must provide an explanation of the impact if end to end IPv4 unicast connectivity between two sites becomes degraded or down or if no data is available
SARA Computing & Networking service, 25-6-2010
Current perfSONAR-MDM implementation in LHCOPN (1/2)
The GEANT application service desk has installed perfSONAR-MDM measurement points at each T0/T1 site with the following applications/tools:
• Weathermap based on End to End Monitoring (E2EMON) information
• E2EMON information (no E2EMON measurement point)
• perfSONAR User Interface (UI)Alarm Service (Prototype based on Nagios)
SARA Computing & Networking service, 25-6-2010
Current perfSONAR-MDM implementation in LHCOPN (2/2)
• Hades Performance Measurements• Bandwidth Test Control / Achievable Bandwidth (BWCTL,
automated 1Gbit/s TCP Bandwidth Control Test)• One Way Delay (OWD) measurements using OWAMP• One Way Delay Variance / Jitter (OWDV) measurements
using OWAMP• Packet loss (measured between Hades nodes)• Traceroute (number of hops between each Hades nodes)• Possibly duplicate packets (measured between Hades
nodes)• Possibly out of order packets (measured between Hades
nodes)
SARA Computing & Networking service, 25-6-2010
Current perfSONAR-MDM setup en future dashboard
SARA Computing & Networking service, 25-6-2010
Dashboard approach
The first version of the dashboard must be based on:• The “keep it simple” principle• The data which perfSONAR-MDM is already collecting at the
moment
Proposal is to use One Way Delay (OWD) (using One Way Active Measurement Protocol (OWAMP)) to make the first version of the dashboard to “monitor” end-to-end IPv4 connectivity between each site in the LHCOPN (full mesh).
So OWAMP is “only” used to monitor connectivity and not yet used to monitor the delay itself.
Later versions of the dashboard could include parameters that are new(er) to perfSONAR-MDM (i.e. packet loss, traceroute, achievable bandwidth, interface status, BGP status, OWD and OWDV)
SARA Computing & Networking service, 25-6-2010
How it might look like (1/3)(current view)
SARA Computing & Networking service, 25-6-2010
End to End IPv4 unicast connectivity availability (current view)
ToCA-TRIUMF
ToCH-CERN
ToDE-KIT
ToES-PIC
ToFR-CC-IN2P3
ToIT-INFN-CNAF
ToNDGF
ToNL-T1
ToTW-ASGC
ToUK-T1-RAL
ToUS-BNL
ToUS-FNAL-CMS
From CA-TRIUMF 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%From CH-CERN 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%From DE-KIT 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 0%From ES-PIC 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%From FR-CC-IN2P3 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%From IT-INFN-CNAF 100% 100% 100% 100% 100% 100% 100% 75% 100% 100% 100%From NDGF 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%From NL-T1 100% 100% 100% 100% 100% 0% 100% 100% 100% 100% 100%From TW-ASGC 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%From UK-T1-RAL 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%From US-BNL 50% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%From US-FNAL-CMS 100% 100% 100% 100% 100% No data 100% 100% 100% 100% 100%
Date and time: 17-6-2010 12:30 UTC
What does "Normal" meanWhat does "Degraded" meanWhat does "Down" meanWhat does "No data" mean
How it might look like (2/3)(hourly view)
SARA Computing & Networking service, 25-6-2010
End to End IPv4 unicast connectivity availability daily view 17-6-2010
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Availability
From CA-TRIUMF to NL-T1 91%
How it might look like (3/3)(weekly view)
SARA Computing & Networking service, 25-6-2010
End to End IPv4 unicast connectivity availability weekly view 17-6-2010
1 2 3 4 5 6 7 Availabilty
From IT-INFN-CNAF to US-BNL 95%From IT-INFN-CNAF to CH-CERN 100%From IT-INFN-CNAF to US-FNAL-CMS 85%From IT-INFN-CNAF to FR-CCIN2P3 100%From IT-INFN-CNAF to DE-KIT 100%From IT-INFN-CNAF to NDGF 100%From IT-INFN-CNAF to NL-T1 85%From IT-INFN-CNAF to ES-PIC 100%From IT-INFN-CNAF to UK-T1-RAL 100%From IT-INFN-CNAF to TW-ASGC 100%From IT-INFN-CNAF to CA-TRIUMF 96%
Status on the dashboard
The status of the end-to-end IPv4 unicast connectivity between sites must be shown on the dashboard in the following way:
• Normal, availability of the end-to-end IPv4 unicast connectivity between site A en B is 100% in the given timeframe
• Degraded, availability of the end-to-end IPv4 unicast connectivity between site A en B is less then 100% in the given timeframe
• Down, availability of the end-to-end IPv4 unicast connectivity between site A en B is 0% in the given timeframe
• No data, the dashboard server can connect to the perfSONAR-MDM measurement point on site but receives no data from the measurement archives.
SARA Computing & Networking service, 25-6-2010
Notifications
Notification should be done via:
• RSS-feeds
• API for integration into T0/T1 site NMS systems for raising alarms
• Grid Notifications for LHC experiments
We need to discuss this with grid notification experts at the LHC experiments and ask them how they would integrate this in their dashboards.
SARA Computing & Networking service, 25-6-2010
Questions
Interesting to know:
• Is this the right direction for the dashboard?
• Is perfSONAR-MDM able to support this?
• Is it possible to use OWAMP like this?
• Are T0/T1 sites going to use this?
• Are the LHC experiments going to use this?
• Are WLCG users (physicists) going to use this?
• Do we agree on the functional design?
SARA Computing & Networking service, 25-6-2010
WRAP UP
Read the full version of the functional design!
Please send your comments about this functional design to [email protected] before the 5th of July 2010!!
Thank you for your attention!
SARA Computing & Networking service, 25-6-2010