m.nedim alpdemir, anastasios gounaris¹, arijit mukherjee², desmond fitzgerald, norman w. paton¹,...
TRANSCRIPT
M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹,
Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹, Jim Smith2
1University of Manchester, UK2University of Newcastle, UK
Acknowledgement: Much of the content in many of the slides has been authored by co-workers, especially Nedim Alpdemir and Jim Smith. Errors are mine, of course.
Experience on Performance Evaluation with OGSA- DQP
21 Sep 2005 AHM 2005 2
Outline of talk The OGSA-DQP system Impact of infrastructure layers Profiling
21 Sep 2005 AHM 2005 3
OGSA-DQP system
Unified view of and access to remote DBMSs
Unified view of and access to remote DBMSs
21 Sep 2005 AHM 2005 4
Selecting Resources in OGSA-DQP
Unified
schema
machines
21 Sep 2005 AHM 2005 5
Evaluating Queries in OGSA-DQP
query plan
21 Sep 2005 AHM 2005 6
High level architecture
21 Sep 2005 AHM 2005 7
Brief tour: an illustration
G D Q S
GD S 1
GD S 2
W e bS e rv ice s
C lie n t
re s o u rce lis t
W S D L
D B S ch e m a
D B S ch e m a
G L o g ica lO pt im is e r G
Ph y s ica lO pt im is e r
G Pa rt it io n e r GS ch e du le r
G
OQ
L P
arse
r
Po la r* Q u e ry O pt im is e r En g in e
GD S Q u e r yR e qu e s t D oc .
O Q LQ u e r y
pr in t
e xc h an g e
h as h join
s c ane x ch a n g e
s c an
P1
P2
P3
GGQ ES 3
GGQ ES 2
GGQ ES 1
Distributed QueryExecution Engine
sub- pla n
sub- pla n
da ta b lock s
da ta b lock s
s u b-qu e ry
s u b-qu e ry
o pe ra t io n ca ll
<?xml version="1.0" encoding="UTF-8"?>
<GDQDataSourceList xmlns="http://dqp.ogsadai.org.uk/schema/gdqs" >
<importedDataSource>
<GDSFactoryHandle>http://phoebus.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle>
<GDSFactoryHandle>http://rpc676.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle>
<GDSFactoryHandle>http://mygrid.ncl.cs.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle>
</importedDataSource>
<importedService>
<wsdlURL>http://phoebus.cs.man.ac.uk:9090/axis/services/EntropyAnalyserService?WSDL</wsdlURL>
</importedService>
</GDQDataSourceList>
<?xml version="1.0" encoding="UTF-8"?>
<databaseSchema xmlns="">
<logicalSchema>
<table name="goterm">
<column fullName="goterm_id" length="32" name="id">
<sqlTypeName>varchar</sqlTypeName>
<sqlJavaTypeID>12</sqlJavaTypeID>
</column>
<column fullName="goterm_type" length="55" name="type">
<sqlTypeName>varchar</sqlTypeName>
<sqlJavaTypeID>12</sqlJavaTypeID>
</column>
<column fullName="goterm_name" length="255" name="name">
<sqlTypeName>varchar</sqlTypeName>
<sqlJavaTypeID>12</sqlJavaTypeID>
</column>
<primaryKey>
<columnFullName>id</columnFullName>
</primaryKey>
</table>
</logicalSchema>
<physicalSchema>
<hostMachine>130.88.192.230</hostMachine>
<database join_buffer_size="131072" max_join_size="4294967295">
<physTable avgRowLength="67" dataLength="766784" indexLength="126976" name="goterm" rowFormat="Dynamic" rows="11369"/>
</database>
</physicalSchema>
<GDSFHandle>http://phoebus.cs.man.ac.uk:9090/ogsa/services/ogsadai/GridDataServiceFactory</GDSFHandle>
</databaseSchema>
<?xml version="1.0" encoding="UTF-8"?>
<Partitions>
<Partition>
<evaluatorURI>http://130.88.198.195:9090/ogsa/services/ogsadai/dqp/GridQueryEvaluationFactory/hash-11025450-1076603541049</evaluatorURI>
<Operator operatorID="0" operatorType="TABLE_SCAN">
<tupleType>
<type>goterm</type>
<name>goterm.OID</name>
<type>string</type>
<name>goterm.id</name>
<type>string</type>
<name>goterm.type</name>
<type>string</type>
<name>goterm.name</name>
</tupleType>
<TABLE_SCAN>
<dataResourceName> goterms </dataResourceName>
<GDSHandle> http://130.88.192.230:9090/ogsa/services/ogsadai/GridDataServiceFactory/hash-31056514-1076603576481</GDSHandle>
<tableName> goterms </tableName>
<predicateExpr>
<predicate>
<comparativeOperator>LIKE</comparativeOperator>
<leftOperand name=" goterm.id" type="13"/>
<rightOperand name=" GO:0000%" type="16"/>
</predicate>
</predicateExpr>
</TABLE_SCAN>
</Operator> . . .
</Partition> . . .
</Partitions>
21 Sep 2005 AHM 2005 8
Where we are The OGSA-DQP system Impact of infrastructure layers Profiling • Work based on publicly available releases OGSA-DQP 2.0, OGSA-DAI 4.0, GT 3.2.
!! Some of the figures presented do not describe the behaviour of the system any more and refer to pre-optimisation stages.
• Complementarily, see OGSA-DAI papers in AHM2004-5.
21 Sep 2005 AHM 2005 9
Data Sources Protein_goterm (16803 rows @ 24B ~ 404KB)
ORF1 [varchar(50)]
ORF2 [varchar(50)]
baitProtein [varchar(50)]
interactionType [varchar(5)]
repeats [int(11)]
experimenter [varchar(100)]
YER081W YIL074C YIL074C Y2H 0 Uetz et al YER081W YIL074C YER081W Y2H 2 Ito et al
● Protein_interaction (4716 rows @ 47B ~ 227KB)
ORF [varchar (55)] GoTermIdentifier [varchar(32)] Q0010 GO:0000004 YAL037W GO:0005554
21 Sep 2005 AHM 2005 10
Representation in WebRowSet
Table name Original data size
XML overhead per row
Total XML overhead
Total Size
protein_goterm 404 KB 160 B 2.68 MB ~3 MB protein_interaction 226 KB 376 B 1.77 MB ~2 MB
<currentRow><columnValue>YAL037W</columnValue><columnValue>GO:0000004</columnValue>
</currentRow>
21 Sep 2005 AHM 2005 11
(1) Access Techniques
Scan-1, a full scan of the protein_goterm table: select * from protein_goterm;
Scan-2, a full scan of the protein_interaction table: select * from protein_interaction;
Join, which is an equi-join of the two tables: select i.ORF2 from protein_goterm as p, protein_interaction as i where p.ORF=i.ORF1;
21 Sep 2005 AHM 2005 12
Configurations
JDBC local/remote
GDS local/remote sync/async
OGSA-DQP GQES co-located with GDS GQES calls GDS asynchronously
21 Sep 2005 AHM 2005 13
Scan-1 (protein_goterm)
local JDBC
remote JDBC
local Synch-GDS
remote Synch-GDS
local Asynch-GDS
remote Asynch-GDS
OGSA-DQP-scan
0.0020.0040.0060.0080.00
100.00120.00140.00160.00180.00200.00220.00
0.43 5.33 6.33
117.00
218.33
156.37
Data Access Mode
Ex
ec
. Tim
e (
sec
.)
21 Sep 2005 AHM 2005 14
Scan-2 (protein_interaction)
local JDBC
remote JDBC
local Synch-GDS
remote Synch-GDS
local Asynch-GDS
remote Asynch-GDS
OGSA-DQP-scan
0.00
10.00
20.00
30.00
40.00
50.00
60.00
0.34 2.00 3.00
32.33
63.00
47.27
Data Access Mode
Qu
ery
Ex
ec. T
ime
(se
c.)
21 Sep 2005 AHM 2005 15
Join (both tables)
local JDBC
remote JDBC
local Synch-GDS
remote Synch-GDS
local Asynch-GDS
remote Asynch-GDS
OGSA-DQP-join
0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
0.24 2.00 3.00
97.67
197.00
349.07
Data Access Mode
Qu
ery
Ex
ec. T
ime
(se
c.)
21 Sep 2005 AHM 2005 16
Remote Access by Block
1 5 10 20 30 40 500
10
20
30
40
50
60
protein_interactions
Tuples per Block
Ela
pse
d T
ime
(s)
1 5 10 20 30 40 500
25
50
75
100
125
150
175
200
225
protein_goterms
Synch
Asynch
Tuples per Block
Ela
pse
d T
ime
(s)
Block access benefits local cases too (e.g. DQP access to protein_goterms 156 -> 28s)
21 Sep 2005 AHM 2005 17
(2) Breaking Down Costs
8.1
26.8 (4.8+1+22)
15.1 (9.6+5.5)
Total Cost: 28.6 secs
21 Sep 2005 AHM 2005 18
(3) Parallelizing Operation Call
select p.ORF, go.id, calculateEntropySlow(p.sequence)
from protein_sequences p, goterms go, protein_goterms pg
where go.id=pg.GOTermIdentifier andp.ORF=pg.ORF and pg.ORF like "YCL0\%" and go.id like "GO:0\%";
21 Sep 2005 AHM 2005 19
Configuration
Two parameters: a number of copies
of WS are available;
a number of spare machines are available – compiler plants an op-call on each.
21 Sep 2005 AHM 2005 20
Measurements
1 2 3 4 5 60
20
40
60
80
100
120
140
160
2 Op call ops
3
4
5
6
Service copies
Ela
pse
d T
ime
(s)
21 Sep 2005 AHM 2005 21
Lessons Increase granularity of inter-service
communication, like access to GDS. Reduce delivery cost – coalesce root
evaluator within GDQS – in progress. Parallelizing expensive operations
can be beneficial – ongoing work. Translations performed in transfers
and XML WebRowSet processing can be expensive – ongoing work.
21 Sep 2005 AHM 2005 22
Current Work New Release
Supports WS-RF/ WS-I Builds on top of OGSA-DAI 7.0
(due in September) Includes optimisations
Investigating adaptive and fault-tolerant mechanisms
21 Sep 2005 AHM 2005 23
Contact OGSA-DAI and OGSA-DQP software
http://www.ogsadai.org.uk http://www.ogsadai.org.uk/dqp
Project site, mentioning adaptivity and fault-tolerance
http://www.ncl.ac.uk/polarstar