2007.10 grid nets-slides
DESCRIPTION
Yehia El-khatib and Chris Edwards. "A Survey-based Study of Grid Traffic". In Proceedings of the International Conference on Networks for Grid Applications (GridNets 2007), Lyon, France, October 17-19 2007.TRANSCRIPT
Diversity of Grid Traffic:Diversity of Grid Traffic:
A Survey-based Study
Yehia El khatib, Christopher Edwards
Computing DepartmentComputing Department
Lancaster University
Outline
� Introduction
� Survey Goals
� Survey Process
� Survey Results
� Traffic Behaviour
� Future Work
� Conclusion
Introduction
� EC-GIN (Europe-China Grid InterNetworking) is a Framework 6 STREP project.a Framework 6 STREP project.
� EC-GIN aims at introduction a networking interface that provides programming abstractions to improve the performance of grid applications.
The design of the interface requires an � The design of the interface requires an understanding of the network characteristics of grid applications.
Survey Goals
� The survey is to highlight some of the characteristics of current grid applicationscharacteristics of current grid applications
� Scale and composition of the grid
� Dataset granularity
� Data delivery requirements (time restrictions, encryptions, one-to-many services)
Others: transport layer protocol, middleware, etc.� Others: transport layer protocol, middleware, etc.
� Special network services
Survey Process
� Questionnaire Structure
2 pages, also an online version� 2 pages, also an online version
� 11 MCQ's + 1 open-ended question.
� Level of Detail
� As simple as possible.
� Target Audience� Target Audience
� Developers, administrators, and advanced users.
� Dissemination
� Research projects that are employing or developing a grid application.
Survey Results [outline]
1. Research Field
2. Scale
3. Composition
4. Dataset Granularity
5. Special Network Services
Survey Results [1/5]
� Research Field
Visualization
6%
Environmental
Medicine
6%
Meteorology
6%
Software
Development
6%
Particle
Physics
18%
Astronomy
13%
Social Sciences
13%
Mathematical
Analysis
13%
Sciences
6%
13%
Engineering
13%
Survey Results [2/5]
� Scale
15
20
25
30
35
40
45
50
55
% o
f th
e s
urv
ey
ed
ap
pli
ca
tio
ns
20
25
30
35
40
45
50
55
60
65
70
75
% o
f s
urv
ey
ed
ap
pli
ca
tio
ns
< = 10 10-100 100-400 400-1000 > 10000
5
10
15
Num ber of nodes
% o
f th
e s
urv
ey
ed
ap
pli
ca
tio
ns
3 – 10 10 – 100 100 – 1000 > = 1000
0
5
10
15
20
Number of domains
% o
f s
urv
ey
ed
ap
pli
ca
tio
ns
Survey Results [3/5]
� CompositionOverall Grid Composit ionOverall Grid Composit ion
Clusters
Desk top
Machines
Em bedded
Dev ices
Mobile
Dev ices
Survey Results [3/5]
� CompositionOverall Grid Composit ion
� 47% are deployed only on clusters
� Image analysis applications
� Simulation applications
� 7% are deployed only on desktop machines
Overall Grid Composit ion
Clusters
Desk top
Machines
Em bedded
Dev ices
Mobile
Dev ices
machines
� Data management applications
Survey Results [4/5]
� Dataset Granularity
10
20
30
20
40
60
80
100
0
10 kB 100 kB 1 MB 10 MB 100 MB 1 GB 10 GB 100 GB 1 TB
0
10 kB 100 kB 1 MB 10 MB 100 MB 1 GB 10 GB 100 GB 1 TB
Survey Results [4/5]
� Dataset Granularity
10
20
30
20
40
60
80
100
0
10 kB 100 kB 1 MB 10 MB 100 MB 1 GB 10 GB 100 GB 1 TB
0
10 kB 100 kB 1 MB 10 MB 100 MB 1 GB 10 GB 100 GB 1 TB
� Most common dataset size is 10 MB
� 12% of all datasets are 100 GB in size
� 23% of all datasets are ≤ 1 MB
� 50% of all datasets are ≤ 10 MB
� 25% of all datasets are ≥ 10 GB
Survey Results [5/5]
� Special Network Services
40%
60%
80%
100%
Not Sure
Unnecessary
Would Be Used
Used
% o
f su
rve
ye
d a
pp
lic
ati
on
s
Transfer Delay Pre-
d ict ion
Advanced Network
Reservat ion
Network Topology
Inform at ion
%
20%
Used
% o
f su
rve
ye
d a
pp
lic
ati
on
s
Traffic Behaviour [1/2]
� The results give an image of the traffic flow sizes that is different from common belief.sizes that is different from common belief.
� We define five distinct classes of applications according to dataset sizes:
� Class A: less than 10 MB
� Class B: 0.5 – 100 MB
� Class C: 10 MB – 1 GB
� Class D: 100 kB – 100 GB
� Class E: 1 MB – 1 TB
Traffic Behaviour [2/2]
A
E
20%
� The most common class is A, where datasets are no larger
A
34%
B
C
13%
D
13%
where datasets are no larger than 10 MB.
� Only 33% of all applications have datasets over 1 GB in size.
� Only 20% of all applications have datasets that stretch beyond 100 GB.
� All class C applications are deployed on mostly desktop machines.
� All class B applications are Astronomy and Meteorology applications, deployed over 100-300 nodes across 6-8 domains.
B
20%
13%
beyond 100 GB.
Future Work
� We intend to monitor the traffic created by a number of grid applications.number of grid applications.
� We aim to present mathematical models of grid traffic that could be used to create artificial grid traffic (in simulators).
Conclusion
� We presented the outcome of a survey of grid application requirements and network application requirements and network behaviour.
� The results reflect a list of real demands of grid applications, which provides a solid starting point to the design of our interface.
� The suggested classification portrays the diversity in the traffic footprint of grid applications.