growing the future linked data librarians - vivo as a means to explain open science issues in...
TRANSCRIPT
Dr. Ina Blümel
VIVO Conference 2015August 14, 2015
Growing the Future Linked Data LibrariansVIVO as a Means to Explain Open Science Issues in University Education
Agenda
• I will show you some of our current VIVO related projects
• And what we are doing beside that at TIB’s Open Science Lab
• Especially how I use VIVO also for educational purposes
• I will give a glimpse of what is happening in Germany / Europe
in CRIS context
OSL
2
That‘s us!
Current VIVO „core“ team at TIB Open Science Lab
3
• plus two (former) students
• We are currently seeking and really in need for a computer scientist!!!
• …and also want to hire a „linked data librarian“ for metadata/ontology
work
How our VIVO engagement began
and some institutional context
• In 2013 we started TIB “Open Science Lab” (OSL), a small
interdisciplinary team
• Context ext: Leibniz Research Alliance Science 2.0, int: R&D dept.
• TIB: German national library for all areas of engineering, as well as
architecture, chemistry, information technology, mathematics and physics
• Calls itself the “largest specialist library” worldwide in those subject areas
• 160 million data sets available via GetInfo Portal (incl. AV media, 3D models
and research data)
• Budget 42 million € (project grants 2,9 million) in 2014
• Strategic guidelines, areas of activity: e.g. DataCite / research data, non textual
data, NEW: open science see English folder
• TIB advises Leibniz Univ. during institutional CRIS set-up
4
Another prerequisite for the development
described in the following
• Hanover's University of Applied Sciences and Arts, Information
Management
• Lecturing Open Science, Research Information, Linked Open
Data & more ~ perfectly in line with TIB OSL issues
• Integrating OS-issues, RI + LOD into IM classes, involving
students and young researchers, leading projects at HsH
working as basis for TIB OSL developments
• thereby growing a new generation of „linked data librarians“
(useful for TIB as well)
5
Young librarians (who want to work in academic
libraries) should learn at least some
• Open Science issues
• In the means of how WWW enables research to be more open
and collaborative, examine and cultivate new opportunities
together with researchers communities!
• And deal with the question, which specific problems libraries
may help to solve in this context!
So: Univ. of Applied Science, here we go
• Classes: Current developments in information science,
Semantic web, Summer term projects (always the semester
bevor writing the Bachelor thesis)
6
2014
Student Project #1:
Identify and structure research information from various
websites
Cross institutional VIVO
7
Challenge:
From the vast array of research inf. objects
on the web to structured research
information
If possible, automatically
8
Brainstorming: Getting students to understand relevance of and
quests for research information – examples for cross-institutional
searches (by various stakeholders)
Which articles have been
accepted or distinguished by
the relevant conferences?
Which work groups are
working at the interface
between computer
science and biology?
Which industrial partners or
public bodies are involved
in e-learning projects?
Which universities
focus is on the
geosciences?
What is the structure of
cooperative activities
(publications / projects)
between research
institutions?
How does the
institute/staff structure of
engineering institutes
differ from that in natural
science institutes?
Public
Scholars
writing research proposals
analyses in science research…updating about actors
involved in subject area
…establishing networks
with colleagues
Task 2: define a community of interest, find
suitable websites as sources for VIVO ingest
Science 2.0 community
• Websites with publications,
projects, information about
organizations, persons, ...
• with structured and unstructured
information
Identify websites with repetitive,
similarly structured content, worth
setting up a harvesting pipeline!
9
Project Setting
• 16 weeks project
• 6th semester bachelor students of library and information
science
• supported by an information and a computer scientist
• identify and document research information items on the
websites
• map to the VIVO ontology
• certain steps re-defined or split up during running project
according to students needs / prior knowledge
10
11
Identify and document research information items on
the websites (collaborative docum. using Etherpads)
Identify entities
Visualize relations with RDF-Graphs
Map to namespaces used in VIVO
12
How to make students think in objects and
relations and not in attributes? add 2 steps
Identify entities
Visualize relations with RDF-Graphs
Map to namespaces used in VIVO
13
Identify entities
Visualize relations with RDF-Graphs
Map to namespaces used in VIVO
14
A lot easier after the 2 steps beforehand: Adding
namespaces to Etherpads
Take over by (former) TIB OSL developer
left: students, right: OSL
15
16
Some challenges…
17
• inconsistent
publication data,
entered as freeform
text in CMS, e.g.,
up to 13 different
versions of journal
volume
representation
• templates don’t
provide RI in
machine-readable
formats
Some challenges…
• Variable content, stable structures
• Duplicates with different structure (publications, persons, …)
http://www.hiig.de/ausgewahlte-publikationen/ http://www.hiig.de/ausgewaehlte-veroffentlichungen/
18
Man and machine drawing same conclusions?
http://www.hiig.de/kooperationen/
Partners are marked with a logo (image)
Luckily „alt“-tags available
19
Some challenges…
Results
20
• Discovery layer with
aggregated research
information
• Also approach for
bootstrapping
institutional research
information systems from
available web sources
• no substitute, but
complementary to those
systems
https://osl.tib.eu/vivo
Taming the wilderness of open research
information, Ina Blümel, Gabriel Birke
http://dx.doi.org/10.1145/2637748.2638443
Discovery layer idea continued in…
• Current grant application (1) for „Leibniz Discovery“: cross
institutional VIVO for more Leibniz Association institutions (2)
• with Leibniz Research Alliance Science 2.0 Science 2.0
VIVO being part of it
towards a large-scale VIVO based aggregation of german
researcher profiles
(2) at WGL research funding 2016
(1) Leibniz connects >80 research institutions (range from natural sciences,
engineering via economics, social sciences to humanities) including the three
German National Libraries TIB, ZBW, ZBMed
21
Integration of existing research information from
three different sources
1. structured information that Leibniz institutions provide in
accordance with institutional privacy agreements (CRIS,
internal Databases, Spreadsheets, …)
2. structured information from external data sources – from
specialist databases, library catalogs (eg issued by the
German National Library, OCLC WorldCat) repositories (arXiv,
PubMed Central, LeibnizOpen) and CrossRef + DataCite
3. manual entries: All researchers of the Leibniz Association
should be able to change or supplement the information at all
times
22
Benefits
for institutions
• complement any existing institutional CRIS or sites of the
Leibniz institutions
• help to systematize available, previously unstructured data
• and to transfer these, for example into a separate CRIS later
• help to get / provide better (reusable) data
23
Benefits
for Leibniz
• Increase visibility of diversity within the Leibniz
Association, support the external representation and public
relations of the scientific community
• Stimulate and facilitate production and use of synergies and
the generation of partnerships and projects within the
Leibniz Association
For the individual researcher
• automatic integration of existing structured research
information promises a comfortable and sustainable database
for its own scientific information gathering, networking and not
least the personal profiling
24
…meanwhile continuing Open Science lectures, adding Semantic
Web…
2015
Student Project #2:
A „real client“ situation: Comprehensive research information
from one institution, transformation and import into VIVO
Institutional VIVO
25
Challenge:
Stucturing data and getting comprehensive
representation of TIB research staff‘s
activity for all needs (profiles, reports, …)
TIB as an institution
• Wants to present own scientific activitites: information on
projects, papers, that is spread over many places (later)
• Needs to facilitate reporting processes about those activities,
getting out statistics for evaluation and funding (research
management)
• Wants to know indicators for certain topics at a glance where its
stuff is involved knowledge management & more
26
…is not the only one
• Several major universities + number of facilities belonging to
the four large German scientific associations (Max Planck,
Fraunhofer, Helmholtz and Leibniz) are introducing institutional
research information systems
• They all have in common
• research management as driving force: reporting tools, etc.
• mostly proprietary CRIS implementations at institutional
level, … (Pure, Converis, et al)
• “Run” on info events for CRIS (e.g. at Leibniz Research
Aliance)
• RI paradigm more an institutional, “closed world”
27
It‘s all about individuals
• People of the institution with any kind of research output /
involvement Identify entities like projects, journal articles,
department affiliations and how they are connected
• Vanity, own CV, own scholarly output at a glance
• want to identify and connect with peers
• RI paradigm more an open, “discovery world”
• establish networks, see success of “Facebooks for
Scientists”, e.g. ResearchGate
• connect and integrate with other information and person-IDs
like GND-ID, ORCID, .. researchers do NOT want to
maintain several redundant profile pages
28
Why not try both?
• Institutional RI system AND linkable, openly reusable research
information
• and the way to tackle another problem
Do not exchange reports, but data!
29
Do not exchange reports, but data!
30
31
Building-up a TIB VIVO (considering both
people’s and institutions’s needs)
Main challenge:
• From the vast array of research information objects on institute’s
homepages, internal wiki pages, …
• to structured, well-connected, non-redundant research information in
one place (TIB VIVO), and
• that can be reused anywhere, in any other context
Project setting:
• 13 weeks (3 - 6/2015) à 5 hrs, 9 participants, few to almost no
knowledge about Research Information Systems and Semantic Web
Step 1: Consider sources,
define research information objects and how all this could be
connected in a global graph
32
• Various websites, wiki-pages, intranet
organising, re-structuring, connecting
publicationscooperations
projects
people
Target: RDF (main „challenge“ for IM students,
but still for many librarians, too)
33
Identify entities
Visualize relations with RDF-Graphs
Map to VIVO-Data model
lookup more examles at https://wiki.duraspace.org/display/VIVO/VIVO-ISF+1.6+Relationship+Diagrams
Step 2 (after collecting and pre-structuring
source data)
34
Step 3: make it machine-readable
• Adding ontologies used in VIVO (and by the way learning how
to lookup documentations, representations / examples in VIVO,
elsewhere )
• I introduced Open Refine with RDF ext. to my students, others
would have used Karma, it doesn‘t matter which tool…
35
Step 4+5: Adapt VIVO templates to TIB design and add
german localisation, import RDF data
36
Btw, students liked to lookup and learn from numerous great US VIVO‘s
templates!
Departments – People – Research Areas
37
Profile Page (research areas derived from
publications and projects)
38
But this year the project group did some more working
packages in-line with OSL (and VIVO) activities…
39
Documentation
40
• MediaWiki-based
collaborative authoring tool
developed by OSL
• Students work (while getting
themselves into VIVO…)
• Suggest chapters
• Collaboratively seed
content ranging from, e.g.
how to set up VIVO, how
to structure and ingest
data
• Add more chapters
• …
41
Collaborative WWW platform handbuch.io (here: „Handbuch
CoScience“ with young researchers as target audience),
model: Springer Open Book „Opening Science“
The content will change over time, and revisions of the book chapters are provided as the text evolves.
42
Used first time in March 2014, in conjunction with our Book
Sprint „CoScience“ at CeBIT 2014 (inviting experts from
Open Science Community as authors)
43
Just to tell the whole story: later on in 2014 OSL got Foster
(EU) Funding for producing video lectures with authors:
active involvement of audience (comments, interaction)
44
Nothing new, but we have the workaround at TIB: content
based indexing and display in TIB‘s AV portal, ensuring
citation by assigning persistent identifiers
Coming back to the students VIVO work…
• Language matters (sometimes)
• At least when you deal with your personal profile / scholarly
representations and interests
• organisational structures and job titles (come back to this later)
• So we started a german VIVO localisation
45
• Using VIVO language
files in GitHub
• Looking up all terms
from VIVO frontend,
• and collecting them in
a dictionary,
• Discussing
collaboratively
Quick and dirty approach
46
• Transfer terms to the files on
GitHub
• Check translations and edit
possibly• https://github.com/VIVO-DE/VIVO-
languages/tree/master/vivo-1.6
Re-usable and versionable
47
In conjunction therewith: Ontology adaption
(Bachelor thesis)
• How can US VIVO match specific needs of german academic
institutions?
• Most of VIVO-ISF works fine -- except some of the
organisational structures (1. organisations) and job titles (2.
positions)
• Twofold approach / sources
1. University institutional websites
2. Ongoing National Science Council‘s „Kerndatensatz
Forschung“ (KDSF) project
48
1. Look up and generalize organizational
structures from university institutional websites
49
2. Lookup and map to National Science Council‘s
„Kerndatensatz Forschung“ (KDSF) project
• KDSF = Definition of a „core data set“ for research
information metadata german „to-be“ CRIS standard for
data exchage and for reporting and comparing institutions
(to better distribute national and organisational funding)
• Official beta version release mid 2015
http://kdsf.fit.fraunhofer.de/beta/ Quest for comments
(come back to this later)
50
Soon to appear in new OA information science series
52
Recommendations for KDSF - german research core data
set, 7.2015 (via German Initiative for Network Information
(DINI) Working Group)
• Consistent use of common namespaces
• consistent use of object instead of data properties
• Furthermore, the consistent involvement of IDs as ORCID in
KDSF should be sought in the context of international
standardization efforts.
53
KDSF kick-off: before nat. standardisation let‘s
have a look at international efforts!
54
VIVO mentioned already
55
Torulf Lind (Swedish Research Council)
An ecosystem approach to national research
information governance
See also: PIDs in scholarly LOD compliant systems “resolvable problem” mentioned in Persistent Identifiers and URLs 03 June 2015 by Martin Fenner (citing Den Haag Manifesto 2011)
Enhancing VIVO visualisations – just started project
using TIB VIVO entities
• obtain deeper insights into TIB‘s People – Publications –
Research Areas basis for e.g. strategy decisions (remember :
do not exchange reports, but data )
56
Starting point: „Pivotpaths“
by Marian Dörk, HS Potsdam
Get new implementation with D3.JS
57
58
59
60
61
62
Some thoughts on VIVO from an edu. perspective
• Research information and networking systems – upcoming
default activities for modern research libraries
• New topics arise, e.g.
• rapidly evolving nature of scholarly objects
• information discovery in a linked open data world
• new impact measures and visualisations
• VIVO and its consistently open research information paradigm
• works well for practical projects in information science
education context
• to convey some of the key skills for future librarians
63
In conjunction with evolving topics in library context
I certainly should mention another OSL activity
• Horizon Report Library Edition, 2014• Since 2005, NMC (New Media Consortium - international community of experts in
educational technology) Horizon Reports are the world's leading annual
technology radar for school and university teaching
64
• Examines key trends, significant challenges, and
emerging technologies for their potential impact on
academic and research libraries worldwide
• TIB in collaboration with University of Applied
Sciences (HTW) Chur, ETH-Bibliothek Zurich
• With more than 1 million downloads the most
successful Horizon Report all time
Some more activity: Conferences
• …where we have spread VIVO & other CRIS related activities
• SWIB13, SWIB14 Co-organisation of VIVOcamp13 at SWIB13
with Valeria Pesce, John Ferreira, et al
• CRIS 2014, Rome
• i-Know14, Graz
• ELAG14 Co-organisation of VIVO bootcamp with Violeta Ilik, Ted
Lawless, et al
• Science 2.0 conferences, Hamburg, 2014+15
• OAI9 (CERN Workshop on Innovations in Scholarly Communication),
Geneva, 2015
• IATUL15, Hannover
65
After showing up with our VIVO related activities
at European conferences or national meetings…
• we discovered there is a huge interest
• in alternatives to traditional institutional CRIS
• in cross-institutional discovery systems
• especially concerning the values
• no cost for the „product“ itself
• open source
• LOD representation
• but there are also reservations
• Community project?
• Do I get the functionalities my institution needs (esp.
mandatory reports)
• US research landscape?
• Who is/will be using it in Germany?
66
So the decision was made!
• 1st German language VIVO Workshop
• 9.9. in Hannover
• In conjuction with DINI’s CRIS working group meeting
• No real hands-on workshop, rather for dicision makers
no VIVO implementations, no community
• VIVO implementations in DE/EU: Lessons learned & challenges
• CRIS landscape in DE and current developments: Standards, etc.
• Current extensions, adaptions in development
67
Yesterday’s BOF meeting outcome
„VIVO in Europe“ TF
• Growing VIVO Community in EU
• Expand the idea of open, linkable research information in
EU
• Get more instances running
• Similar needs to adapt US version?
• Get VIVO (again) more involved with euroCRIS
• …
• Let’s get in contact!
Thank you for your attention!