scaling to millions of concurrent sparql queries on the cloud
Post on 11-May-2015
5.093 Views
Preview:
TRANSCRIPT
Sep 2010
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
OWLIM Replication Cluster @ Amazon EC2
Goals
• Test the scalability of OWLIM RC on a really largecluster
• Can we break the million queries per hour barrier?
#2OWLIM Replication Cluster @ AWS Sep 2010
INTRODUCTION
OWLIM Replication Cluster @ AWS #3Sep 2010
Berlin SPARQL Benchmark (BSBM)
• http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/
• Evaluates the performance of RDF query engines inan e-commerce use case
– searching products and navigating related information
• Randomized query mixes (25 SPARQL queries) areevaluated continuously
• Different dataset size & number of concurrent clients
– 25M, 100M and 200M triples
#4OWLIM Replication Cluster @ AWS Sep 2010
Benchmarking AWS
• Extensive performance tests of EC2 instances
– I/O, CPU, Network
– BSBM (SPARQL), RDF materialisation
• High Memory EC2 instances offer (surprisingly) goodperformance for RDF related processing
– Comparable to local non-virtualised hardware
#5OWLIM Replication Cluster @ AWS Sep 2010
Benchmarking AWS – testbeds
#6OWLIM Replication Cluster @ AWS Sep 2010
CPU cores RAM (GB) Virtualisation
Local-L 2×2.4 GHz 8 ESX
Local-XL 4×2.9 GHz 12 No
Local-3XL 8×3.3 GHz 48 No
L 2×2 ECU* 7.5 Xen
XL 4×2 ECU* 15 Xen
High-Mem XL 2×3.25 ECU* 17 Xen
High-Mem 2XL 4×3.25 ECU* 34 Xen
High-Mem 4XL 8×3.25 ECU* 68 Xen
High-CPU XL 8×2.5 ECU* 7 Xen
1 ECU provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor
Benchmarking AWS – BSBM 100M results
#7OWLIM Replication Cluster @ AWS Sep 2010
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
1 4 16 32 64
Qu
ery
mix
es /
ho
ur
concurrent clients
Local-L
L-ub
Local-XL
XL-ub
HM-XL-ub
HM-2XL-ub
Local-3XL
Local-3XL-SSD
HM-4XL-ub
HC-XL-ub
Benchmarking AWS – RDF materialisation
#8OWLIM Replication Cluster @ AWS Sep 2010
0
1000
2000
3000
4000
5000
6000
ma
teri
ali
sa
tio
n t
ime
(se
c)
UMBEL
DBP-SKOS
OWLIM Replication Cluster
• Improves scalability with respect to concurrent userrequests
• How does it work?
– Each write request is multiplexed to all repositoryinstances
– Each read request is dispatched to one instance only
– To ensure load-balancing, read requests are sent to the instance with the shortestexecution queue
#9OWLIM Replication Cluster @ AWS Sep 2010
OWLIM CLUSTER ON EC2 –BENCHMARKS
OWLIM Replication Cluster @ AWS #10Sep 2010
AWS testbed setup
• OWLIM Replication Cluster
– One Master node, 10-100 Slave nodes
– 100 million triples / 16GB database size
• BSBM 100M dataset
– Each cluster node has a replica of the database
– 1000 concurrent BSBM clients
• Amazon EC2
– Master node – HM-2XL (34GB RAM, 4x3.25 ECU)
– Slave nodes – HM-XL (17 GB RAM, 2x3.25 ECU)
– Ubuntu (x64)
#11OWLIM Replication Cluster @ AWS Sep 2010
Total QMpH (Query Mix per Hour)
#12OWLIM Replication Cluster @ AWS Sep 2010
0
50000
100000
150000
200000
250000
10 20 30 40 50 60 70 80 90 100
tota
l Q
Mp
H
cluster size (HM-XL nodes)
BSBM-100M, 1000 concurrent clients
1000 clients
Total QMpH – summary
• (almost) Linear scalability of the cluster
• 20 nodes handle more than 1 million SPARQL queriesper hour (40,000 QMpH)
– 1 Query Mix = 25 SPARQL queries
• 100 nodes handle 5 million SPARQL queries per hour(200,000 QMpH)
#13OWLIM Replication Cluster @ AWS Sep 2010
QMpH per cluster node
#14OWLIM Replication Cluster @ AWS Sep 2010
1800
1900
2000
2100
2200
2300
2400
10 20 30 40 50 60 70 80 90 100
QM
pH
pe
r n
od
e
cluster size (HM-XL nodes)
BSBM-100M, 1000 concurrent clients
1000 clients
trendline (Power)
QMpH per cluster node – summary
• Low parallelisation overhead
– Only 10% deterioration in QMpH per cluster node whenthe cluster grows 10 times (from 10 to 100 nodes)
– Cluster nodes handle 2,000-2,300 QMpH (a standaloneHM-XL node on EC2 handles ~2,500 QMpH)
#15OWLIM Replication Cluster @ AWS Sep 2010
What about the cost?
• 100,000 SPARQL queries per 1$ on AWS
– ~4,000 Query Mixes / $• 1 Query Mix = 25 SPARQL queries
– EC2 pricing• Master node (on-demand HM-2XL) – $1.00/hour
• Slave node (on demand HM-XL) – $0.50/hour
#16OWLIM Replication Cluster @ AWS Sep 2010
What about the cost (2)
#17OWLIM Replication Cluster @ AWS Sep 2010
3400
3600
3800
4000
4200
4400
4600
10 20 30 40 50 60 70 80 90 100
Qu
ery
Mix
es /
$
cluster size
Query Mixes per 1 USD
QMpH/$
DETAILED CLUSTER METRICS
OWLIM Replication Cluster @ AWS #18Sep 2010
Cluster monitoring
• Amazon CloudWatch provides instance levelmonitoring for EC2
– CPU load, Bandwidth utilisation, I/O, …
– Minimum granularity of monitoring periods – 1 minute
• OWLIM Cluster metrics
– Monitor Master and a random Slave for ~180 min
– Many test runs• a single run takes a few minutes
– Idle CPU/IO/Network on diagram is the time between testruns
#19OWLIM Replication Cluster @ AWS Sep 2010
CPU load (Master)
#20OWLIM Replication Cluster @ AWS Sep 2010
0
10
20
30
40
50
60
70
80
0 5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
110
115
120
125
130
135
140
145
150
155
160
165
170
175
180
185
%
time (min)
CPU load (Master)
CPU load
CPU load (Slave)
#21OWLIM Replication Cluster @ AWS Sep 2010
0
20
40
60
80
100
120
0 5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
110
115
120
125
130
135
140
145
150
155
%
time (min)
CPU load (random Slave)
CPU load
Network traffic (Master)
#22OWLIM Replication Cluster @ AWS Sep 2010
0
5
10
15
20
25
30
35
0 5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
110
115
120
125
130
135
140
145
150
155
160
165
170
175
180
185
MB
/s
time (min)
Network traffic (Master)
inbound (MB/s)
outbound (MB/s)
Network traffic (Slave)
#23OWLIM Replication Cluster @ AWS Sep 2010
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0 5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
110
115
120
125
130
135
140
145
150
155
MB
/s
time (min)
Network traffic (random Slave)
inbound (MB/s)
outbound (MB/s)
I/O (Slave)
#24OWLIM Replication Cluster @ AWS Sep 2010
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
0 5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
110
115
120
125
130
135
140
145
150
155
160
165
170
MB
/s
time (min)
I/O (random Slave)
Disk Read (MB/s)
Disk Write (MB/s)
top related