niso/dcmi may 22 webinar: semantic mashups across large, heterogeneous institutions: experiences...

64
NISO/DCMI Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service May 22, 2013 Speaker: John Fereira, Senior Programmer/Analyst and Technology Strategist at Cornell University http://www.niso.org/news/events/2013/dcmi/vivo

Upload: national-information-standards-organization-niso

Post on 22-Jan-2015

17.268 views

Category:

Education


1 download

DESCRIPTION

 

TRANSCRIPT

  • 1. NISO/DCMI Webinar:Semantic Mashups AcrossLarge, Heterogeneous Institutions:Experiences from the VIVO ServiceMay 22, 2013Speaker:John Fereira,Senior Programmer/Analyst andTechnology Strategist at Cornell Universityhttp://www.niso.org/news/events/2013/dcmi/vivo

2. Semantic mashups acrosslarge, heterogeneousinstitutions: experiencesfrom the VIVO serviceJohn FereiraCornell University 3. Overview What is VIVO? History of VIVO High level Overview Ingesting Data into VIVO Exposing Data in Vivo 4. What is VIVO? VIVO is not an acronym A semantic web application that enables the discovery ofresearch and scholarship across disciplines in aninstitution. VIVO enables collaboration and understanding across aninstitution and among institutions and not just forscientists. A powerful search/browse functionality for locating peopleand information within or across institutions. 5. What is VIVO? An ontology editor. Vivo includes a vivo ontologywith can be modified and extended An instance editor. Instances of classes such as aPerson, Organization, Event, etc. can be created,modified, and deleted Content can also be brought into VIVO in automatedways from local systems of record, such as HR,grants, course, and faculty activity databases, orfrom database providers such as publicationaggregators and funding agencies. 6. What is VIVO? VIVO is a content disseminator Views of People, Organizations, etc. can be highlycustomized VIVO provides visualizations such as topic maps, co-authorship networks Open data means other applications can use it 7. A brief History of VIVO 2003 Vivo created for local use at Cornell Universityfor life sciences collaboration 2007 - Reimplemented using RDF, OWL, Jena andSPARQL 2007 Implemented at Cornell and University ofFlorida as production systems 8. A brief History of VIVO 2009 - seven institutions received $12.2 million infunding from the National Center for ResearchResources of the NIH to enable a national network ofscientists 2010 Version 1.0 released as open source 2013 Now at version 1.5.1 2013 Transitioning from funded project to asustainable community open source project 9. A high level Overview Core ideas Searching/browsing Self editing 10. Core ideas Research and researchers should be discoverableindependently of administrative hierarchies Relationships are as interesting as the facts Its the network, not just the nodes Static data models are too confining Granular data management allows multiple views andre-purposing Discovery is improved by linking pages to surroundingcontext 11. VIVO and Linked Open Data VIVO enables authoritative data about researchers to becomepart of the Linked Open Data (LOD) cloudTim Berners-Lee, http://www.w3.org/2009/Talks/0204-ted-tbl 12. Linked Data principlesTim Berners-Lee: Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide usefulinformation, using the standards (RDF, SPARQL) Include links to other URIs so that people can discovermore thingshttp://linkeddata.org 13. VIVO in the LOD cloud 14. Searching and Browsing Triple store indexed into a SOLR instance Searches are against SOLR Instance data comes from triplestore An example 15. Food security 16. Self Editing Users can edit their own profile System can delegate editing to proxy editors Some data can be locked An example 17. Editable and non-editable fields 18. Most text fields support rich text 19. External Concepts for terms 20. Data Ingest (harvesting) 21. VIVO harvests much of its data automatically fromverified sourcesReduces the need for manual input of dataProvides an integrated and flexible source of publiclyvisible data at an institutional levelData, data, dataIndividuals may also edit and customize their profiles tosuit their professional needsExternal datasourcesInternal datasources 22. Ingesting data with the Vivo Harvester A pipeline of tools Tools are written java, using Jena APIs Can fetch data from a variety of data formats Data can be sanitized and disambiguated Data is ingested directly to the triple storedoes notrequire VIVO web app to be running 23. Harvesting Pipeline Fetcher/Parser Translate: maps rdf to vivo RDF Transfer to local triple store (Jena TDB) Disambiguate using Scoring/Matching Changenamespace (mint unique URIs) Diff with previous model to create subtractions Transfer to VIVO triple store 24. Fetching and Parsing Fetches data from a URL, Database, local file Many different types of fetchers CSV fetcher JDBC fetcher SimpleXMLFetcher JSONFetcher Output is intermediate RDF Format, one file perrecord Fake namespace used 25. http://aims.fao.org/sites/default/files/profiles/profile_image_108074.jpghttp://www.valeriapesce.name108074In the last six years at the Global Forum on Agricultural research (GFAR) I have worked extensively onmetadata standards and protocols for managing and exchanging information between systems, in strict collaboration with the OEKCSgroup inFAO.Food and Agriculture Organization of the United Nations (FAO)Information management tools, information systems, information [email protected]://aims.fao.org/aos/geopolitical.owl#Italyhttp://aims.fao.org/node/108074valeria.pesceValeriaInformation Management SpecialistagINFRA, AgriDrupal, AgriFeeds, AgriVIVO, authority control, automatic indexing, CIARD ContentManagementTask Force, CIARD RING, cloud services, CMS - Content Management Systems, data exchange, Drupal, IAALD - InternationalAssociation ofAgricultural Information Specialists, information management, institutional repository software, interoperability, Linked Open Data- LOD, RDF - Resource Description Framework, Semantic Web 26. Translate Map fake namespace to VIVO classes andproperties Uses XSLT transform Unique ID for each record node-person:Organization becomesfoaf:Organization Relationships created 27. Translated RDFPesce, [email protected] and Agriculture Organization of the United Nations (FAO) 28. Transfer Load RDF into TDB triplestore Duplicate URIs are not loaded Further operations are made in the triple store 29. Scoring/Match Disambiguates People, Organizations, etc. basedupon property values Supports Equality, NameCompare,NormalizedLevenshteinDifference, Soundexalgorithms Each property is weighted firstName: 0.5 lastName: 0.5 Email: 1.0 MatchThreshHold: 1.0 30. Matching Determines what should be done with a recordwhich matches another record based upon itsscore Replace old record Merge records Ignore record 31. ChangeNameSpace Match old namespace pattern in configuration filehttp://vivo.example.com/harvest/aims_users/person/ Specify namespace in VIVOhttp://agrivivodev.mannlib.cornell.edu/vivo/individual/ Mint a new URI in the vivo namespacehttp://agrivivodev.mannlib.cornell.edu/vivo/individual/n123456 32. Diff of previous harvest Compare TDB model with previous harvest Generate vivo-additions.rdf Generate vivo-substractions.rdf 33. Final Transfer Load vivo-subtractions.rdf file into SDB Load vivo-additions.rdf file into SDB 34. Data Ingest alternatives Karma: an information integration tool whichprovides a GUI for modeling data into an ontology Google Refine: Good for one time ingests and has aVIVO RDF plugin VIVO admin tools can load RDF 35. Exposing Data in VIVO Vivo web pages View data as RDF Query a Sparql Endpoint and transform results Drupal front end 36. Default VIVO theme 37. Cornell VIVO 38. Griffiths University 39. Melbourne Find an Expert 40. Visualization Completed Work Co-Author visualization Sparklines VIVO world activity map 41. VIVO 1.0 source code was publicly released on April 14, 201087 downloads by June 11, 2010. 917 downloads on July 16, 2o10.The more institutions adopt VIVO, the more high quality data will be available to understand, navigate,manage, utilize, and communicate progress in science and technology.06/2010 42. View RDF from profile page 43. Requesting RDF using an Accept Header curl -H "Accept: application/rdf+xml" -X GEThttp://vivo.ufl.edu/display/n25562 44. Retrieving data with SPARQL Fuseki sparql endpoint installed (not included) Callable with a SPARQL Client Semantic Services Manages custom sparql queries Exposes URL for external sites Can ask for output as html, xml, json 45. Semantic Services application 46. Hector Abruna in VIVO 47. Hector Abruna on Chemistry Site 48. Viewing VIVO data with Drupal Import data with Feeds module and Linked DataImporter Examples 49. Cals Impact Statements 50. Agrivivo Home Page 51. Agrivivo map page 52. AgriVivo 53. VivoSearch: search across multiplevivo sites 54. Vivo SearchLight bookmarklet 55. Vivo Searchlight 56. Some Links Vivoweb http://vivoweb.org Vivoweb on Sourceforge http://www.sourceforge.net/projects/vivo VivoSearch http://vivosearch.org Vivo Wiki on Duraspace https://wiki.duraspace.org/display/VIVO Mailing Lists http://sourceforge.net/p/vivo/sfx-list/ 57. Thank you 58. NISO/DCMI WebinarSemantic Mashups Across Large, HeterogeneousInstitutions: Experiences from the VIVO ServiceNISO/DCMI Webinar May 22, 2013Questions?All questions will be posted with presenter answers onthe NISO website following the webinar:http://www.niso.org/news/events/2013/dcmi/vivo 59. Thank you for joining us today.Please take a moment to fill out the brief online survey.We look forward to hearing from you!THANK YOU