an introduction to network performance monitoring with ... · 10/31/2019 · an introduction to...
TRANSCRIPT
An introduction to network performance monitoring with perfSONAR
www.geant.org
Szymon Trocha (Poznań Supercomputing and Networking Center)WP6T3
DeiC, Frederica (DK), 31 October 2019
Public
2 www.geant.org
• Identify problems, when they happen or (better) earlier
• The tools must be available (at campus endpoints, demarcations between networks, at exchange points, and near data resources such as storage and computing elements, etc)
• Access to testing resources
Motivations
Source: https://www.deic.dk/
3 www.geant.org
Heterogeneous world
• The global Research & Education network ecosystem is comprised of hundreds of international, national, regional and local-scale networks
• While these networks all interconnect, each network is owned and operated by separate organizations (called “domains”) with different policies, customers, funding models, hardware, bandwidth and configurations
• This complex, heterogeneous set of networks must operate seamlessly from “end to end” to support yourscience and research collaborations that are distributed globally
4 www.geant.org
Challenges
• Delivering end-to-end performance
• Get the user, service delivery teams, local campus and metro/backbone network operators working together effectively• Have tools in place• Know your (network)
expectations• Be aware of network
troubleshootingSource: https://www.deic.dk/
5 www.geant.org
What is perfSONAR?
• It’s infeasible to perform at-scale data movement all the time – as we see in other forms of science, we need to rely on simulations
• perfSONAR can be used to to:• Set network performance expectations• Find network problems (“soft failures”)• Help fix these problems• All in multi-domain environments
• These problems are all harder when multiple networks are involved
• perfSONAR is provides a standard way to publish monitoring data
• This data is interesting to network researchers as well as network operators
6 www.geant.org
The good old times of iperf
• A good TOOL• But a server has to be started on
the remote end
• perfSONAR is a frameworkusing the set of tools• Including iperf
Source: https://www.deic.dk/
7 www.geant.org
The Toolkit
• Network performance comes down to a couple of key metrics:• Throughput (e.g. “how much can I get out of the network”)• Latency (time it takes to get to/from a destination)• Packet loss/duplication/ordering (for some sampling of packets, do
they all make it to the other side without serious abnormalities occurring?)
• We can get many of these from a selection of measurement tools – the perfSONAR Toolkit• Plus more like , disk-to-disk transfer, HTTP or DNS request time
• The “perfSONAR Toolkit” is an open source implementation and packaging of the perfSONAR measurement infrastructure and protocols
• All components are available as RPMs, DEBs, and bundled as a CentOS ISO
• Very easy to install and configure (usually takes less than 30 minutes for default install)
8 www.geant.org
The importance of regular testing
• We can’t wait for users to report problems and then fix them
• Things just break sometimes• Bad system or network tuning• Failing optics• Somebody messed around in a patch panel and kinked a fiber• Hardware goes bad
• Problems that get fixed have a way of coming back
• System defaults come back after hardware/software upgrades
• New employees may not know why the previous employee set things up a certain way and back out fixes
• Important to continually collect, archive, and alert on active test results
9 www.geant.org
perfSONAR deployment possibilities
• Dedicated server (example)• A single CPU with multiple cores
(2.7 GHz for 10Gps tests)
• 4GB RAM
• 1Gps onboard (mgmt + delay)
• 10Gps PCI-slot NIC (throughput)
• Low cost – small PC• e.g. GIGABYTE BRIX GB-BACE-
3150, Intel NUC
• Ad hoc testing• Docker
10 www.geant.org
Deployment styles
Beacon
Island
Mesh
11 www.geant.org
Measurement node location criteria
• Where it can be integrated into the facility software/hardware management systems?
• Where it can do the most good for the network operators or users?
• Where it can do the most good for the community?
EDGE
NEXT TO SERVICES
NEXT TO SERVICES
12 www.geant.org
Mesh building and results visualization
• Key challenges• Scheduling the tasks you want to run at each location• Visualization components to display results of the measurements from
multiple hosts
• pSConfig is a template framework for describing and configuring a topology of tasks. If you manage more than one perfSONAR host itassists with above by providing tools to automate each of the configuration tasks listed above.
• pSConfig Web Admin (PWA) is a web-based UI for perfSONAR administrators to define and publish pSConfig meshes, which automates tests executed by test nodes, and provides topology information to various services, such as MadDash.
• MadDash collects and presents two-dimensional monitoring data as a set of grids referred to as a dashboard.
13 www.geant.org
perfSONAR ow-to in 3 minutes
Choose yourhome
• Connect to network
Install Toolkit software
• By siteadministrator
Configure hosts
• Networking
• 2 interfaces
• Visibility
Point to central host
• To consumecentral meshconfiguration
Install and configure
• By meshadministrator
• Central data storage
• Dashboard GUI
• Home for meshconfiguration
Configure mesh
• Who, whatand when
• Every 6 hours(bandwidth)
• Be careful 10G -> 1 G
Publish meshconfiguration
• To be consumed by measurementhosts
Run dashboard
• Observethresholds
• Look for errors
HO
STS
CEN
TRA
L SE
RV
ER
14 www.geant.org
Performance Measurement Platform (PMP) – an Examplemanaged service
• Consists of set of low-cost hardware nodes with preinstalled perfSONAR software
• The central components that manage the platform elements, gather, store and represent the performance data, are operated and maintained by the GÉANT project
• Coupled with GEANT MPs to create a partial mesh for NRENs• Small nodes users can shape the predefined setup and
configure additional measurements to their needs and get more familiar with the platform• Can become example measurement experimentation and training
platform about network measurement, network management, network performance
• Can provide an easy way to setup a new perfSONAR small nodes on new small devices (not tied to the hardware) through providing ways for image creation and guidelines
• Can serve as an example of managed service
15 www.geant.org
A Production GÉANT Service
Throughput
Latency / packet loss
IPv6 (and IPv4)
16 www.geant.org
PMP update: current coverage
• +3 in AfricanNRENs
• Somecontrieshave >1
UK
PT
ES
NL
BY
EE
LT
DEBE
RO
CY
ATHU
RS
IT
ME
GE
AM
DK
PL
SIHR
AL
FR
IE
LU
UA
GR
17 www.geant.org
• 4.2.0 contains a number of changes including:• New pScheduler Disk-to-Disk test• pScheduler Task Priorities• pSConfig Web Admin (PWA) RPMs• perfSONAR Ansible Roles• MaDDash ServiceNow Integration
• 4.2.1, 4.2.2• Bug fix release
• 4.3 (1Q2020) plans• Transition to Python 3
• 4.4 (2H2020) plans• Improved archiving to non-Esmond sources (e.g. Elasticsearch)• Grafana integration support
perfSONAR latest changes (4.2.0 - 4.2.2) and near future
18 www.geant.org
Get Involved!
• http://www.perfsonar.net/
• http://docs.perfsonar.net/
• http://www.youtube.com/perfSONARProject/
• perfSONAR Consultancy and Expertise service• What can we help you with?
Thank you
www.geant.org
Any questions?
© GÉANT Association on behalf of the GN4 Phase 3 project (GN4-3).The research leading to these results has received funding fromthe European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 856726 (GN4-3).
An introduction to network performance monitoring with perfSONAR
www.geant.org
Szymon Trocha (Poznań Supercomputing and Networking Center)WP6T3
DeiC, Frederica (DK), 31 October 2019
Public
The scientific/academic work is financed from financial resources for science in the years 2019 - 2022 granted for the realization of the international project co-financed by Polish Ministry of Science and Higher Education.