symplectic.co.uk vivo isf: investigating speed factors graham triggs head of repository systems...
TRANSCRIPT
![Page 1: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/1.jpg)
symplectic.co.uk
VIVO ISF:Investigating Speed Factors
Graham TriggsHead of Repository Systems
@grahamtriggs
![Page 2: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/2.jpg)
symplectic.co.uk
About the title..
versus
pre-ISF
This is not
VIVO-ISF
![Page 3: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/3.jpg)
symplectic.co.uk
This is..
Practical use of VIVO 1.8
Challenges encountered
Solutions and suggestions
![Page 4: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/4.jpg)
symplectic.co.uk
Loading Data
![Page 5: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/5.jpg)
symplectic.co.uk
Demo Client #1 Client #2
Users 136 27,489 5,544
External Co-authors ~46,000 ~120,000 ~140,000
Articles ~36,000 ~110,000 ~150,000
Events ~8,000
Asserted Triples 6,683,071 12,372,999
Inferred Triples 6,848,955 12,236,798
Total Triples 13,532,026 24,609,797
Datasets
![Page 6: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/6.jpg)
symplectic.co.uk
r3.large
- optimized for memory-intensive applications• 2 vCPU (Intel Xeon E5-2670 v2 Ivy Bridge)• 15.25 GiB memory• 32 GB SSD instance storage• added 50 GB SSD general purpose (gp2) storage
Demo Server
![Page 7: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/7.jpg)
symplectic.co.uk
24 hours – data still not loaded
Unreserved SSD = limited IO by size
Small disks = low IO
(AWS GP2 = max 128 MiBs rising to 160. 3 IOPs per GiB)
4000 IOPs provisioning max – at $0.065 per IOP/month ($260)
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
IO Problems
![Page 8: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/8.jpg)
symplectic.co.uk
• Amazon EBS Provisioned IOPS (SSD) volumes
• $0.125 per GB-month of provisioned storage
• $0.065 per provisioned IOPS-month
• EX40-SSD
• 32 GB RAM, 2x240 SSD, i7-4770
• ~60 euros
• Load time - ~ 3hours (plus inferencing / indexing)
New Server
![Page 9: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/9.jpg)
symplectic.co.uk
fio AWS VM Dedicated
Read IOPS 155 91937
Read Bandwidth 636KB/s 367.7MB/s
Write IOPS 23 11345
Write Bandwidth 96KB/s 45.3MB/s
IO Comparison
![Page 10: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/10.jpg)
symplectic.co.uk
2.0 Gb RDF/XML
3.2 Gb MySQL database (pre inference)
6.1 Gb MySQL database (post inference)
Transfer slows dramatically after ~ 1Gb written
Regains speed after ~2Gb
MySQL – Demo Dataset
![Page 11: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/11.jpg)
symplectic.co.uk
Processing Data
![Page 12: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/12.jpg)
symplectic.co.uk
Fast (~8-12ms per individual)
However…
2 million individuals = 6-7 hours
Large datasets still slow down (up to 60ms per individual)
Memory problems
Suspect IndexListener
Inferencing
![Page 13: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/13.jpg)
symplectic.co.uk
Query for graphs• Co-authorship
Client #1 • SDB – 10 secs• TDB – 1 sec
Triple store performance
![Page 14: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/14.jpg)
symplectic.co.uk
Using YourKit profiler to show SQL executed
No evidence of complex queries
Combined predicates, functions appear to be processed in Java
Is performance of TDB down to in-memory vs SQL parsing?
Simple SQL Queries
![Page 15: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/15.jpg)
symplectic.co.uk
select g, count(*) from Quads whereg IN (-364693509095697557,786347385076487474)GROUP BY g;
24 seconds
select count(*) from Quads;
14.72 seconds
select count(g) from Quads whereg=786347385076487474
4.16 seconds
MySQL Performance
Total rows: 24,647,663
![Page 16: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/16.jpg)
symplectic.co.uk
Co-author graph query executed• On page access• On GraphML retrieval
Two queries = twice the effort
When each takes 10 secs rather than 1…
Redundant Effort
![Page 17: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/17.jpg)
symplectic.co.uk
Number of triples not necessarily relevant
Small queries still execute quickly
Amount of data matched by SPARQL important• This may include parts of the query• 1 author may have
• 90 publications• 10 investigator roles (grants)
Result sets vs Triples
![Page 18: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/18.jpg)
symplectic.co.uk
Would subproperties give simpler queries with fewer results?e.g.
vivo:hasAuthorshipvivo:hasInvestigatorRole
As subproperties of vivo:relates
Parent property can be inferred and available
Should subproperties be used to ease understanding?vivo:bearerOf vs obo:RO_0000053
(UI hides ontologies with labels, but not from developers)
So, More Triples?
![Page 19: Symplectic.co.uk VIVO ISF: Investigating Speed Factors Graham Triggs Head of Repository Systems graham@symplectic.co.uk @grahamtriggs](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ecf5503460f94bdc8c4/html5/thumbnails/19.jpg)
symplectic.co.uk
Thank you!