using neo4j for exploring the research graph connections made by rd-switchboard

47
Using Neo4j for exploring the research graph connections made by RD-Switchboard Dr. Amir Aryani (ANDS), and Dr. Jingbo Wang (NCI) October 2016

Upload: amiraryani

Post on 10-Feb-2017

48 views

Category:

Software


0 download

TRANSCRIPT

Using Neo4j for exploring the research graph connectionsmade by RD-Switchboard

Dr. Amir Aryani (ANDS), and Dr. Jingbo Wang (NCI)

October 2016

Agenda

• Background: RD-Switchboard & Research Graph

• Neo4j: Queries

• NCI: Graph connections made by RD-Switchboard using NCI’s metadata

• Q & A

Background Challenge of Cross-Platform Discovery

of Research Data

{All started here!}

Research Data Australia Suggested Links

March 2014, Version 12

Data Description Registry Interoperability (DDRI) Working Group

Research Data Alliance

Goal: enabling cross-platform discovery between research data infrastructures

Precipitous Growth RDA Launch / First Plenary March 2013

RDA Second Plenary September 2014

RDA Third Plenary

March 2014

RDA Fourth Plenary

September 2014

RDA Fifth Plenary

March 2015

Amsterdam, Netherlands

Washington, DC, USA

Dublin, Ireland Gothenburg, Sweden

240 participants

First Working Groups and Interest Groups

380 participants from 22 countries

First “neutral space” community meeting (Data Citation Summit)

First Organizational Partner Meet-up

First BOFs

497 Participants from 32 countries

First Organizational Assembly

6 co-located events

14 BOF, 12 Working Groups, 22 Interest Groups

San Diego, CA, USA

550 Participants from 40 countries

1st RDA Deliverables presented

Organizational Assembly and first OAB / Council meeting

10 co-located events

11 BOF, 14 Working Groups, 36 Interest Groups

383 Participants from 30 countries

2nd RDA Deliverables presented

Organizational Assembly / Council meetings

1st Adoption Day & Large scale data projects meeting

10 BOF, 10 Working Groups, 20 Interest Groups; 10 joint Sessions; 4 thematic Plenary Sessions

Research Data Alliance

June 2016: close to 4,200 members from 110 countries

DDRI WG Approach

Connecting datasets on the basis of co-authorship or other collaboration models such as joint funding and grants.

Research Data Alliance

https://researchdata.ands.org.au/idmm-immunome-database-for-marsupials-and-monotremes/11139

Show 105 morepublications

http://dx.doi.org/10.1371/journal.pone.0079092

One of the 105 articles …

doi:10.5061/dryad.4qq0v

Authors: Wong ESW, Nichol S, Warren WC, Belov K

Dryad Dataset

http://datadryad.org/resource/doi:10.5061/dryad.4qq0v

We have found another dataset from the same author…

Dataset

Researcher

Publication

Dataset

Using machines…

Connecting Datasets by Three Degrees of Separation

http://researchgraph.org/schema/

More info

http://researchgraph.org/schema

https://github.com/researchgraph/schema

https://github.com/rd-switchboard/Inference

Neo4j

Neo4j Graph Browser

Neo4j Queries1. Find a Dataset2. Fina a Publication3. Find a Grant4. Find a Researcher5. Find links to ORCID6. Find datasets that have DOI7. Find DOIs using prefix8. Find highly connected datasets9. Connections with multiple degrees of separation10. Find shortest path between two researchers

Find a Dataset

match (n:dataset) where n.doi='10.5524/100166' return n

match (n:dataset) where n.title='The genome of the Australian dragon lizard Pogona vitticeps' return n

Find a Publicationmatch (n:publication) where n.doi='10.5170/CERN-2014-008.181'

return n

match (n:cern:publication) where n.title='LHC Results - Highlights' return n

Find a Grantmatch (n:grant) where n.purl='purl.org/au-research/grants/

arc/LP0991658' return n

match (n:grant) where n.title='Hyper-accumulations of monosulfidic sediments' return n

Find a Researchermatch (n:researcher) where

n.scopus_id='37071260700' return n

match (n:researcher) where n.orcid='0000-0002-7875-2902' return n

match (n:researcher) where n.last_name='Rajiah' and n.first_name='Kingston' return n

Find links to ORCID

match (n:dataset:dryad)- -(o:orcid) return count(n)

match (n:dataset:ands)- -(o:orcid) where n.ands_group='The University of Sydney' return n limit 10

Find Datasets With DOI

match (n:dataset) where exists (n.doi) return count(n)

Find DOIs using Prefix

match (n:dataset) where n.doi=~'10.4225/.*' return n limit 10

Find Highly Connected Datasets

match (n:ands:dataset)--(x) return n.key, n.title, count(x) order by count (x) DESC limit 25

Connections with Multiple Degrees of Separation

match (n:ands:dataset)-[*1..3]-(d:dryad:dataset) return n.title, d.key limit 25

Find Shortest Path Between Two Researchers

MATCH p=shortestPath( (d1:dryad:dataset {doi: '10.5061/dryad.4qq0v'})-[*]-(d2:ands:dataset {doi:'10.1186/1471-2172-12-48'})

) RETURN p

NCI: Graph connections made by RD-Switchboard

using NCI’s metadata

nci.org.au@NCInews

nci.org.au

Research Data Collections 10PB+

CMIP5 3PB

Astronomy (Optical) 200 TB

WaterOcean1.5 PB

Atmosphere2.4 PB

Earth Observ.

2 PB

MarineVideos 10 TB

Geophysics 300 TB

Weather340 TB

© National Computational Infrastructure 2015

NERDIP: National Environment Research Data Platform

nci.org.aunci.org.au

Each individual catalogue record describes a linear relationship among entities:

© National Computational Infrastructure 2015

Current research record status

Researcher Ause

Data 1 Supported by Grant a Paper I, IIGenerate

Researcher B Data 1 Supported by Grant b Paper II, IIIuse Generate

Researcher B Data 2 Supported by Grant b Paper IV

use Generate

nci.org.aunci.org.au

Relational database is converted and presented in graph database using Research Data Switchboard (RD-Switchboard):

© National Computational Infrastructure 2015

Graph database structure

Researcher A

use Supported by

Grant a Paper IGenerate

Researcher B

Data 1

Supported by Grant b Paper IIIuse Generate

Data 2

Supported by

Paper IV

use Generate

Paper IIGenerate

nci.org.au

User question: RD-switchboard query:

nci.org.aunci.org.au

NCI GeoNetwork architecture http://geonetwork.nci.org.au

© National Computational Infrastructure 2015

Catalogue system infrastructure

nci.org.au

Harvest and synchronization

nci.org.au

nci.org.au

RD-Switchboard benefits so far…

© National Computational Infrastructure 2015

• Identify the missing critical metadata entries;

• Identify errors in the catalogue entries;

• Provide analytical view of how research data has been used so far (high-level of utilisation or underutilised?);

• Evaluate the impact of the datasets, researchers and institutes;

• Encourage the usage of URI, DOI and ORCID, etc.

nci.org.au

researcher 2researcher 1 paper 2paper 1 dataset

Any conflict of interest?

Possible collaboration?

data2 data3

data4 data5

nci.org.au

eResearch BOF

Tuesday 11 October 2016 / 16:35

BoF: Research Graph: Connecting Researchers, Research Data, Publications and Grants using the Graph Technology

Dr. Amir [email protected] Twitter: @amir_at_ands

Dr. Jingbo Wang (NCI)[email protected]