an information services algorithm to heuristically summarize ip addresses for a distributed,...

32
An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*, Jason Zurawski , *Computer and Information Sciences Dept. University of Delaware [email protected], [email protected] Internet2 [email protected]

Upload: randolph-darren-banks

Post on 23-Dec-2015

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service

Marcos Portnoi*, Martin Swany*, Jason Zurawski†,

*Computer and Information Sciences Dept.

University of [email protected], [email protected]

[email protected]

Page 2: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

What is perfSONAR?

2

• Infrastructure for network performance monitoring, making it easier to solve end-to-end performance problems on paths crossing several networks.

• Service Oriented• Each service has a well defined function• Construct your own framework of arbitrary size and complexity

• Open Protocols (developed through the OGF’s NM-WG/NMC-WG/NML-WG)

• Consortium of Organizations• ESnet• GÉANT2/GÉANT3• Internet2• RNP

• Software Releases/Products• perfSONAR-MDM – Java Based• perfSONAR-PS – Perl Based

Page 3: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

The architecture of perfSONAR

3

Page 4: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

The architecture of perfSONAR

4

• Data Services• Close to the network – performing, storing, exchange measurements• Vary in type and capability• Interoperable via the aforementioned protocols

• Information Services• “Glue” that holds the infrastructure together• Locating information and services wherever they may be• Controlling access to services or altering the view of the available data

• Presentation (Analysis/Visualization)• Doing “useful” work, e.g. visualizing performance into a graph• Transforming data into other formats• Alarming based on prior baselines or meeting certain conditions

Page 5: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

perfSONAR Information Services

5

• Each service and client is “aware” of the Information Service Plane• May be statically configured• Could dynamically locate where it may be (or find the closest instance)

• Services: Must register their location and capabilities on a regular basis• Services have a name, associated domain, contact information• Services more than likely have measurement data (e.g. interfaces they are

monitoring, or a set of host pairs that perform active tests)• Consist of a data type• Hostname/IP information• Other “Metadata”

• Regular “push” of data to the Information Services Plane (e.g. facilitates a “heartbeat” to establish the service is operational

• Clients: Interested in locating services• Act on behalf of the end user to do something useful• Interested in data of specific types, but may not know address of a service• Consult the Information Services with specific questions (e.g. “I want metric A

for domain X”)

Page 6: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

perfSONAR Information Services

6

• Idea is similar to DNS for measurement services, and is not new (e.g. Globus MDS, gLite BDII)

• Home Lookup Service (hLS)• “Local” service that ideally lives in a domain• Accepts registrations directly from measurement services• Automatically “finds” the upper layer

• Global Lookup Service (gLS)• “Global” cloud of information services, similar to the root DNS system• Peer with each other to exchange information• Accept the registrations of hLS instances only• Currently several maintained by partners in the perfSONAR project

Page 7: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

7

Lookup Service = distributed directory for services

• The Lookup Service (LS) is a distributed directory, composed of levels.– Local directories (hLS): knowledge of local services

(measurement tools, archives) that directly register.– Global directories (gLS) of local directories [hLSs]

Page 8: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

Summarization in the Information Service

8

• Trade off between information size and locality to the service

• Information loss occurs the further away it travels• Client in the same domain may want to know specifics on a IP routed interface

and links• Client across the country will simply want to know if there are metrics related

to interface utilization in the domain

• Information size will decrease as we shed unnecessary information• Service: Origin of the data set, should be the largest• hLS:

• Contains a copy of the service set and is able to answer any and all queries a service could (draw activity away from service)

• Contains a reduced data set of “everything” in the hLS• gLS:

• Contains a copy of the data sets for all registered hLSs• Contains a further reduced data set built from these sets

Page 9: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

9

Service discovery using the Lookup Service

1. Services register to hLS

2. hLS finds a gLS

3. hLS “Summarizes” internal data set

4. Client is interested in finding data – locates a gLS to speak with

5. Query from Client to gLS to locate services. Response will be address of hLS

6. Similar query to hLS to find something. Response is a service to ask

7. Service and Client transfer data

Page 10: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

10

Summarization: What’s Important?• Hostnames

– Measurement points may have 100’s of measurements for a domain– Hostnames are not important, domains (and subdomains) are– E.g. damsl.cis.udel.edu has 3 pieces of info we care about as we move

away from the MP: cis.udel.edu, udel.edu, edu

• Metrics (e.g. eventTypes)– Interface utilization is different than interface drops is different than

Layer4 achievable bandwidth– Enumerate all – don’t summarize

• Note that with better adoption of the OGF Hierarchy of Characteristics, we could summarize the metrics as well (Lowekamp et al., Grid2003)

• IP Addresses (v4 and v6)– Observations: natural structure, divided into CIDR ranges operated by

a given administrative domain– Wish to find common CIDR descriptions for a given set of (unrelated)

addresses– May not know a priori which domain/operator owns an address (and

may not look this up)– Crux of this work

Page 11: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

11

The Problem of Balancing Compression and Miss Rate

• A key aspect of Information Summarization is dealing with IP addresses (v4 and v6)

• IP summarization must fulfill two goals:– Decrease the original set of IP addresses by reasonable

amount.• i.e., achieve a good compression rate.

– But it must not summarize too much.• Results in claiming many more IP addresses than original

set.• Less precision.

Page 12: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

12

The Problem of Balancing Compression and Miss Rate

• Current mode of operation:– hLS has some set of addresses from measurement tools– Simplistic approach: find natural cut points, determine a CIDR

range– hLS may advertises something like a /20 (or larger).

• It is claiming to have (in its directory) all 212 hosts in advertised /20 subnet!– Even if hLS truly only holds small portion of this range.

• Claiming large subnet for comparable small number of hosts within that subnet: extra burden in search process.– Client will believe advertiser hLS does possess all hosts in

subnet.– Must query hLS to confirm.– Multiple hLSs may overlap in what they claim

• If desired IP address is not in the hLS;– Penalty: wasted time and resources to perform confirming

query.– Analogous to a cache miss, and the penalty, to a miss penalty.

Page 13: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

13

The Problem of Balancing Compression and Miss Rate

• Less compression more precision more “space” consumed• More compression less precision more miss penalty

• IP summarization must balance compression and miss rate.• Optimum balance between compression and miss rate is

susceptible to administrator interpretation.

Page 14: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

Our heuristic for IPv4 summarization

• Our heuristic summarizes a list of IP addresses by employing IP subnet addresses to represent the actual host IP addresses controlled by an hLS.

14

198.129.248.121134.55.217.89134.55.219.9134.55.209.41134.55.218.5134.55.213.205134.55.213.74198.124.194.9134.55.42.10134.55.208.126198.124.216.157134.55.217.82134.55.42.18198.124.238.1134.55.217.6134.55.200.74192.168.201.5192.107.175.3134.55.222.62134.55.221.42134.55.218.70134.55.217.113…

134.55.0.0/16134.167.160.49/32138.18.155.22/32192.0.0.0/9192.150.29.210/32192.150.31.78/32192.168.201.0/26192.188.106.140/32198.0.0.0/8

IP summarizationheuristic engine

Page 15: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

15

How does it do it

• The heuristic constructs a special data structure – a PATRICIA trie• Within which the inner nodes are placeholders, and the leaves

contain the data• Example:

Page 16: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

16

How does it do it

• For our needs:– Within which the inner nodes are the subnet addresses,– The leaves are the actual host IP addresses.

• Data set we will manipulate:– 10.10.0.1– 10.10.0.2– 10.10.0.3– 10.10.0.4

Page 17: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

17

(a) First IP address inserted

• Data set:– 10.10.0.1– 10.10.0.2– 10.10.0.3– 10.10.0.4

Page 18: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

18

(b) Second IP address inserted

• Data set:– 10.10.0.1– 10.10.0.2– 10.10.0.3– 10.10.0.4

00001010.00001010.00000000.000000xx

10.10.0.0/30

00000000.00000000.00000000.00000000

0.0.0.0/0

00001010.00001010.00000000.00000001

10.10.0.1/3200001010.00001010.00000000.00000010

10.10.0.2/32

(b)

Page 19: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

19

(c) Third IP address inserted

• Data set:– 10.10.0.1– 10.10.0.2– 10.10.0.3– 10.10.0.4

Page 20: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

20

(d) Last IP address inserted

• Data set:– 10.10.0.1– 10.10.0.2– 10.10.0.3– 10.10.0.4

Page 21: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

21

How does it do it

• Uses three metrics to decide which inner nodes to pick:– Distance: notion of how many IPs a subnet claims, but do not

actually exist in the network;– Density: number of actual IP addresses over total number of

possible IPs in a subnet;– Minimum Subnet Mask: avoids

too large subnets.

• User-controllable by twoparameters.

Page 22: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

22

Metric: Distance

• Distance: notion of how many IPs a subnet claims, but do not actually exist in the network.– Difference, in bits, between a child node’s mask and its

parent’s mask.

Page 23: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

23

Metric: Density

• Density: number of actual IP addresses over total number of possible IPs in a subnet.

Page 24: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

24

Metric: Minimum Subnet Mask

• MinMask: avoids too large subnets;– Assures no node with mask < minMask will be selected as

summarizing node.– This metric takes precedence over the previous ones.

Page 25: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

25

How it integrates with perfSONAR LS

• Two parameters to control the summarization algorithm• summarization_granularity: Controls the granularity or

coarseness of the summarization. • summarization_minMask: Controls the minimum mask that a

summarizing node must have. Accepts values from 0 to 32 (IPv4).– Default = 8.

Page 26: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

26

Granularity

• Granularity: from 0 to 3– 0: finer, less compressed summarization (more IP addresses).– 3: coarser, more compressed, less precise summarization (less

IP addresses).

• To compose this parameter: mapping of threshold values from distance and density.– Empirical.

Granularity Distance Density 0 4 1e-5 1 8 1e-6 2 12 1e-7 3 16 1e-8

Page 27: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

27

Granularity

• To compose this parameter: mapping of threshold values from distance and density.– Empirical.

• Internally, convert from granularity to distance and density by means of equations.

Granularity Distance Density 0 4 1e-5 1 8 1e-6 2 12 1e-7 3 16 1e-8

Page 28: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

28

Some IP summarization techniques

• Route aggregation algorithms:– Degermark, M., Brodnik, A., Carlsson, S., & Pink, S. (1997). Small

forwarding tables for fast routing lookups. SIGCOMM Computer Communication Review, 27, 3-14.

– Draves, R.; King, C.; Venkatachary, S. & Zill, B. (1999), ‘Constructing optimal IP routing tables’, in 'INFOCOM '99. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE', pp. 88 -97 vol.1.

– Nilsson, S., & Karlsson, G. (1998). Fast address look-up for internet routers. (pp. 11-22). Chapman & Hall, Ltd.

– Srinivasan, V., & Varghese, G. (1998). Faster IP lookups using controlled prefix expansion. SIGMETRICS Performance Evaluation Review, 26, 1-10.

– Waldvogel, M., Varghese, G., Turner, J., & Plattner, B. (1997). ‘Scalable high speed IP routing lookups’. SIGCOMM Computer Communication Review, 27, 25-36.

Page 29: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

29

Some IP summarization techniques

• Main objective of those efforts: IP lookup performance improvement for routing.– They utilize “next hop” information to make decisions.

• Our algorithm is primarily not intended to be used for routing.• “Next hop” information is not available.

Page 30: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

30

Summarization algorithm currently in perfSONAR

• Relies on a voting scheme to identify subnets that represent most of the original IP addresses.

• For each original address, algorithm expands all subnets.– Stores them into a list.– If a subnet was already expanded by a previous address,

increment its vote counter.• Select candidates for summarizing addresses by picking subnets

that have at least one original, /32 IP address child.

Page 31: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

31

Summarization algorithm currently in perfSONAR

• User cannot influence address selection.• Final summarizing subnets might be of any size.

– Very large sets (/8s) were common: there is no where near that amount of data available

• Distinctively, our heuristic allows for control of the compression level of the summarization.

• Also implements mechanisms to avoid selecting summarizing subnets that might be considered too large.

Page 32: An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*,

32

Conclusion

• Algorithm being evaluated as a replacement for perfSONAR hLS and gLS instances

• Experimental results being collected on PlanetLab to evaluate efficiency and accuracy

• Additional enhancements to the heuristic are being evaluated

• Questions?

• Thanks!– Marcos Portnoi ([email protected])– Martin Swany ([email protected])– Jason Zurawski ([email protected])