graph analysis of candidate gql features · 2/28/2019  · • the idea is to have this landscape...

19
Graph Analysis of Candidate GQL Features Graph Query Language Project Existing Languages Working Group Thomas Frisendal [email protected], @VizDataModeler 2019-02-26

Upload: others

Post on 24-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

GraphAnalysisofCandidateGQLFeaturesGraphQueryLanguageProject

ExistingLanguagesWorkingGroupThomasFrisendal

[email protected],@VizDataModeler2019-02-26

Page 2: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

The”ExistingLanguagesWorkingGroup”• Inpreparationtothecommencement ofplanningforGQL, interestedparties-- drawnfromindustry(Neo4j,Oracle,RedisLabsand

TigerGraph),thecommunity(anoteddatamodellingexpertandpublishedtechnicalauthor),andacademia(theUniversityofTalcainChile)-- formedaninformalworkinggroupcalledthe“ExistingLanguagesWorkingGroup”.

• Wehaveworkedinanincrementalfashiononsystematically identifying,surveying,analysingandcomparinggraphquerylanguagefeatures,drawnfromthefollowingexistingquerylanguages:• Cypher• PGQL• GSQL• SQLPGQ[Framework:2020,Foundation:2020,SQL/PGQIWD,ERF-035• G-CORE.

• Wehopetocompriseacatalogueof:• thegroupsoffeatures• towhichextent(ifatall)these aresupportedineachlanguage• exemplarsyntax• supplementaryartifactstoaidintheunderstandingoftheunderlyingsemantics• grammarconstructs• andanyadditionaldetailsofinterest.

• TheideaistohavethislandscapeofexistingquerylanguagesinordertoinformthedesignanddevelopmentofGQLbyvirtue of awell-informedworkplanandhelpingtoleadtoamorerobustoutcome;i.e.thiswouldhelpustohaveclearandmeaningfuldiscussionsonscopeandpriorities,andwillfacilitateclearandunambiguousdesignchoices.Moreover,thiswillhelpusto identifyareasofconsolidation,innovationandopportunitiesforlanguageinteroperationinGQL(forexample,withSPARQL).

Page 3: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

CombattingComplexity:TheELWGGraphDatabase• Establishingananalyticalgraphdatabaseforall5languagesacrossall212features

• Downtothekeywordlevelforeachfeatureofeachlanguageacross5descriptive(text/syntax)dimensions• Nowinits3rdedition• Methodology:

• Consolidateallsheetsintoone• GenerateMERGEcommandsforthefeaturestreeandthe5languages(bywayofExcelformulas)• Somemanualintervention(removeCR’sandchange;’sto§’s)• LoadintoNeo4j• Connectallcomponents• BuildtagsforDescriptors,GrammarTagsandSyntaxTags• BuildaKeywordtagtreebasedonallofthe3above• Dosomereporting(thispptandsomeexcelsheets)

• Willbemadeavailabetophase2andintheGQLdesignwork(foranalysis)• Ambition:Pragmatic,analyticalsupporttool,notanormativesource• Errarehumanumest– reporterrorsandomissions,please(afewknownissuesalready)

Page 4: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

Curren

tMetaMod

el

Page 5: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

StatisticsNodetypes Count Minrels MaxrelsFeature 212 6 14FeatureArea 6 1 17FeatureGroup 30 2 27InclDoc 5 80 549InclLang 1306 4 4Language 5 208 311GCOREFeature 212 2 18GSQLFeature 212 2 30OpenCypherFeature 212 2 29PGQLFeature 212 1 25SQLFeature 212 2 29DescriptorTag 401 1 22GrammarTag 299 1 424KeywordTag 659 1 247SyntaxTag 214 1 247

Page 6: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

TheFeaturesTree

Page 7: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

Comparison

ofPlann

edor

Implem

entedFeatures

GCORE GSQL OpenCypher

PGQL SQL

Page 8: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

Implem

entatio

nStatus

(Not=’X’)

GCORE:72,GSQL:152,Cypher:168,PGQL:113,SQL:140

Page 9: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

Implem

entatio

nStatus

NotSup

ported

(’X’)

GCORE:118,GSQL:54,Cypher:43,PGQL:99,SQL:71

Page 10: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

TheDe

scrip

torTags

Page 11: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

TheGrammarTags

FunctionInvocation(Cypher)

NotDefined(SQL)

Page 12: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

TheSyntaxGraph

Page 13: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

Partofthe

SyntaxG

raph

Page 14: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

Zoom

inginona”W

ord”

inth

eSyntaxGraph

Page 15: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

Even

MoreTagsinth

eKeyw

ordGraph

Essentially theSyntaxTagsenhanced withkeywordsextractedfromtheDescriptorandGrammar Tags

Page 16: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

Collected

Keywordsper

FeatureandLanguage

Page 17: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

UsingaGraphAlgorithmtoMeasureSimilarityofExpression(Jaccard)

FeatureName AvgSimAnd 1,00Comparingvalues(equality) 1,00Equality 1,00Greaterthan 1,00Greaterthanorequalto 1,00Inequality 1,00Lessthan 1,00Lessthanorequalto 1,00Negation 1,00Or 1,00Typecoercions(i,e,implicittypeconversions) 1,00approximate32-bitbinarydecimalnumber 1,00approximate64-bitbinarydecimalnumber 1,00Edgedirections:l-to-r 0,87Specifyingaconditionalvalue 0,87date 0,83localtime 0,83Checkifapropertyexistsonanodeoranedge 0,80Edgedirections:r-to-l 0,79Edgepatternwithdisjunctionoflabels 0,79

MATCHwithmorethanonenode/edge/pathpattern(i,e,allowingfor'star'-shapedpatternsetc),Essentiallythiscanalsobeusedtoobtainacrossproduct 0,75Edgepatternwithdirection 0,75Subtraction 0,74Edgedirections:anydirection 0,73

FeatureName AvgSimDynamicpropertyaccess(accessingapropertyofanodeoredgebyusingadynamically-computedstringvalueasthekey§ e,g,allowingforthekeytobepassedinasaparameter) -Escapingcharacters -Flatteningalist(transformalistintoaseriesofrows§transpose) -Get alltheelementsofalist/collection/arrayexcludingthefirstelement -Get allthelabelsforanode -Get theidentifierofanodeoredge -Nodepatternwithlabelnegation -interval -multidimensionalarray -Obtainthecurrentdate/time 0,06Get allthenodesinapath 0,07List/collection/arrayconcatenation 0,07Get alltheedgesinapath 0,08Determinewhether ornotavalueisamemberofamultiset 0,08Inputgraphspecification 0,08Listequality 0,08Create anedge 0,09Get theedgelabelasastring 0,09

Subtractionoperatorfortemporaltypesanddurations 0,11Create anode 0,11

Get thefirstelementinalist/collection/array 0,11Replace 0,11Checkingifapatternexists 0,12Amalgamatemultiplevaluesintoasinglelist 0,13

-

0,20

0,40

0,60

0,80

1,00

1,20

And

Lessth

anapproximate64-bitbinary…

Edgedire

ctions:r-to

-lEdgepatternwithlabel

Compute'e'raisedtoagiven…

Sortingreturnedro

wsEdgepropertypredica

tes

timewith

timezone

Updateallpropertie

sona

n…basiclist/array

Projectin

grows

Standardaggregatin

goperatio

nsDe

leteanedge

Elem

ente

xistencechecking

Conversio

nPower

Additio

noperatorfortem

poral…

Readingfro

magraph

multiset

Createanedge

Geta

llthenodesinap

ath

Geta

lltheelem

entsofa…

AvgSim

Page 18: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

10DataExtractsinExcel(ELWG_reports_20190228.zip)• CandidateFeatures_20190228• DescriptorTags_20190228• FeaturesNotSupported_20190228• FeatureSyntaxSimilarity_20190228• GrammarTags_20190228• KeywordTagsAcrossLanguages_20190228• KeyWordTagsCollections_20190228• SyntaxSummary_20190228• SyntaxTags_20190228• SyntaxXref_20190228

Page 19: Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL

Contact information:

ThomasFrisendal(Copenhagen, Denmark)

[email protected]@VizDataModelerlinkedin.com/in/thomas-frisendal-19a56a