Department of Computer Science - State University of New York, Binghamton
1
Non-Uniform Information Dissemination for Dynamic
Grid Resource Discovery
Vishal IyengarSameer Tilak
Michael J. LewisNael B. Abu-Ghazaleh
2
Outline Motivations Resource Discovery and Information
Dissemination Approach Non-uniformity Non-Uniform Protocols New Dissemination Protocols Results Directions for the future
3
Motivations Information is at a premium in Grids
The scale of the Grid is on the way up – potentially, millions of hosts, each with several services/objects
Each service/object needs resources The requirement can be specific or general Critical parameters decided by the
service/object
How do we make information and/or meta-information available across the Grid?
4
Resource Discovery Problem: How to find (efficiently) resources, that conform to certain
specifications provided by a Grid Client?
Harchol-Balter et al. [1], suggested algorithms (Name-Dropper) Location service for nodes in a network
Condor Matchmaking [3], uses class ads to match requirements with resources
Class Ads are Name-Value pairs Sent and Matched passively Our work similar to their flocking mechanism
Maheswaran et al. [2], suggested protocols, similar to the ones we propose, for information dispersal
Use data dissemination efficiency to tune the system We feel that including topology and resource variations could be useful
5
Synopsis Designed and studied 4 information dissemination
protocols Two are implementations of randomized protocols from
Sensor Networks, applied to the Grid context Two others build on the above randomized protocols,
bringing some intuition to the dissemination
Simulations involve different topologies and different data models to get a broader perspective
Emphasis is on reducing Network Overhead while keeping Error within reasonable limits
Salient feature is the configurability of each protocol to system needs
6
Information Dissemination So, in order to find resources, we need to
Know about them Allow every node in the Grid to participate equally in this
process Do so without utilizing too much Bandwidth
Iamnitchi et al. [4], suggested that a peer to peer approach might be useful
Reactive approach driven by queries
We agree. But there may be other ways to use these P2P networks to get information to where it may be used
Proactive approach based on data and its dispersal Eventually, hope for an equilibrium between reactive and
proactive
7
The Approach
Our approach borrows from a related problem domain – Sensor Networks
We think of Grid environments as having some resemblance to Sensor Networks
Both have large number of nodes that need to behave in P2P fashion
Each node gathers information that needs to be known to the others
Tilak et al. [4], proposed the use of Non-uniformity to disseminate data among nodes in a Sensor Network
8
Non-Uniformity In our context, Information has 2 main dimensions
Temporal – How old is it ? Spatial – How far is the source ?
We try to relate these two by using the following premise
Any application to be scheduled should be as close to its origin as possible reduces the overhead of sending data to a remote location
We propose to Let neighbors know about each other more
frequently and/or with more detail Let far-off nodes know less about each other and/or
with less frequency
9
Non-Uniform Protocols
Randomized protocols that take the decision to forward probabilistically. We incorporated these in the Grid context
Unbiased – Each data item is forwarded with a probability X, and discarded with a probability 1-X
Biased – Each data item is forwarded with a probability inversely proportional to its distance from source
Based on these protocols and their probabilistic forwarding policies we built 2 more dissemination protocols
10
New Dissemination Protocols Change Sensitive Protocol (CSP)
Resources and their availability change constantly
Those that change too rapidly or too slowly are not useful farther away from the source
If they change too fast, far-off nodes will hear about these too late too unstable for scheduling purposes
If they change slowly then, sending this information is wasteful bandwidth waste
So, aggressively propagate only information that changes moderately
Propagate fast and slow changing information less often
The trick here is in defining what is slow and fast changing Different data models simulated
11
New Dissemination Protocols (Contd.) Priority Dissemination Protocol (PDP)
In the preceding protocols, the intermediate nodes make forwarding decisions for every data item they process
But, what about Site Autonomy? Providers should be able to decide where information about their resources is seen
So, we suggest a protocol in which each source decides the priority of its information
Intermediate nodes abide by these Certain high-end/unique resources may need more coverage
otherwise requests for them will not be satisfied Others are easily available and can do with lesser coverage Useful in commercial Grids with accounting capability ?
12
Factors affecting Dissemination Protocols decide the dissemination policy
Topology (Connectivity) of hosts makes a difference
Used different topologies to see interaction with the protocols and effect on dissemination
Topologies used included Waxman, Locality-Based, Pure Random etc., created using the GT-ITM tool
Variation in resource information representative of different resources
Faster vs. Slower changing models to test CSP Models used a simple Monotonic Step function, a
Gaussian distribution, Poisson distribution etc.
13
Experimental Setup A prototype implementation in JAVA that has been
tested on smaller clusters (16 – 32 nodes)
SSFNet Simulation testbed Used to design large scale networks In our context, 100-150 nodes may represent few thousand
end-hosts
Base case implementation Flooding Compare our protocols against it
Probabilistic protocols were implemented along with CSP and PDP
14
Performance Metrics Error - Absolute and Weighted
Host A’s local view of Host B compared to actual value at Host B
Weighted with inverse of distance between them
Network Overhead Total number of bytes exchanged over the network
sum of the bytes sent by each host
15
Results Topology – Waxman 100 Nodes RIVM – Monotonic Step
Flooding has the least values but pays for it with a very high network overhead
Unbiased with p = 0.5, cuts the overhead by a thirds but has a correspondingly high Overhead
16
Results Topology – Waxman 100 Nodes RIVM – Poisson Distribution
A similar trend with the Poisson Distribution as with the previous data model
Flooding doesn’t out perform others in terms of Error here because of difference in Data models – here values don’t change as often
17
Results Topology – Pure Random 150
Nodes RIVM – Uniform Distribution
The protocols provide a range of trade-off points between Error and Overhead
Another point to note – Biased works well for the first few hops while Unbiased 0.8 works well as the number of hops increases
18
Results Overhead comparison for each RIVM across different topologies
Monotonic Step Function Uniform Distribution
The protocols follow similar trends across different topologies and different data models
19
Results Overhead comparison for each RIVM across different topologies
Gaussian Distribution Poisson Distribution
20
To Wrap Up … The idea of non-uniform dissemination is new
to the area of Grid computing
The P2P approach is useful and looks to the future of Grids
Results are promising and motivate further research in this field
21
Directions for the Future Design a framework that will use the information
disseminated to make scheduling decisions more efficient – Query Forwarding
Protocols that will use the feedback from the Query Forwarding framework to disseminate data smartly – Adaptive Protocols
(Hierarchical) Protocols that can do neighbor discovery in addition to information dissemination
Aggregation of data – forward one item which compacts the data from multiple sources … (we found this non-trivial)
22
References1. M. Harcol-Balter, P. Leighton, and D. Lewin. Resource discovery in
distributed networks. In Proc. of ACM PODS 1999, pages 229-237, 1999.
2. A. R. Butt, R. Zhang, and Y. C. Hu. A Self-Organizing Flock of Condors. SC '03, November 15-21, 2003, Phoenix, AZ.
3. M. Maheswaran and K. Krauter. A parameter-based approach to resource discovery in grid computing system. In GRID, pages 181-190, 2000.
4. S. Tilak, A. Murphy, and W. Heinzelman. Non-uniform information dissemination for sensor networks. In 11th IEEE International Conference on Network Protocols (ICNP'03), 2003.
5. Vishal Iyengar, Sameer Tilak, Michael J. Lewis, Nael B. Abu-Ghazaleh. Non-uniform information dissemination for Dynamic Grid Resource Discovery. 2004