machine support for interacting with scientific publications improving information retrieval, and...
DESCRIPTION
4th German-Russian Young Researchers Forum Saint-Petersburg, August 2014TRANSCRIPT
Introduction Vision Technology Solutions Conclusion
Machine Support for
Interacting w. Scienti®c Publications,
Improving Information Retrieval, and
Assessing Quality of Scienti®c Output
4th
German-Russian Young Researchers Forum 2014
Christoph Lange1,2
1Enterprise Information Systems, Institute for Applied Computer Science, University of Bonn
2Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Sankt Augustin
http://langec.wordpress.com/about
2014-07-07
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 1
Introduction Vision Technology Solutions Conclusion
Machine Support for
Assessing Quality of Scienti®c Output
4th
German-Russian Young Researchers Forum 2014
Christoph Lange1,2
1Enterprise Information Systems, Institute for Applied Computer Science, University of Bonn
2Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Sankt Augustin
http://langec.wordpress.com/about
2014-07-07
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 1
Introduction Vision Technology Solutions Conclusion
Hello, World!
2011 PhD at Jacobs Univ. Bremen, Germany: software for
collaborating on mathematical documents [Lan11]
2011/12 Univ. Bremen, Germany: making knowledge of
diªerent complexity manageable for computers
[OntoIOp13]
2012/13 Univ. Birmingham, UK: enabling domain
experts to make mathematical models
machine-veri®able [KLR]
2013– Enterprise Information Systems @ Univ. Bonn,
Germany / Organized Knowledge @ Fraunhofer IAIS:
enterprise information integration [AL14], data
quality assessment, . . .
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 2
Introduction Vision Technology Solutions Conclusion
Assess Quality of Scienti®c Output (I)
Vision: answer the following questions about the quality
of scienti®c output:
Author “What is a good workshop to discuss my latest
idea?”
Senior Researcher “Should I accept an invitation to the
programme committee of this conference?”
PhD Student “What are the best publications I should
read to get started?”
Reviewer “Is this paper based on high-quality data?”
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 3
Introduction Vision Technology Solutions Conclusion
Assess Quality of Scienti®c Output (II)
How? – Semantic Web / Linked Open Data technology
weak arti®cial intelligence – does not aim at
replacing, but at supporting humans
practically applicable, and scalable to the size of
the Web (→ search engine example)
suitable for connecting data from heterogeneoussources:
scienti®c publications(bibliographic metadata, citations and full text)
social networks(in science? – ResearchGate, Mendeley, etc.)
research data
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 4
Introduction Vision Technology Solutions Conclusion
Linked Open Data: schema.org
initiative of search engines (Google, Yandex, . . . )
structuring web page content (creative works,
events, organisations, persons, places, products)
Example (Movie description)
AvatarDirector: James Cameron (born August 16, 1954)
Science ®ction
Trailer
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 5
Introduction Vision Technology Solutions Conclusion
Linked Open Data: schema.org
initiative of search engines (Google, Yandex, . . . )
structuring web page content (creative works,
events, organisations, persons, places, products)
Example (Movie description)
<div class="movie"><h1>Avatar</h1><div class="director">Director: James Cameron(born August 16, 1954)
</div><span class="genre">Science fiction</span><a href="../movies/avatar-theatrical-trailer.html"Trailer</a></div>
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 5
Introduction Vision Technology Solutions Conclusion
Linked Open Data: schema.org
initiative of search engines (Google, Yandex, . . . )
structuring web page content (creative works,
events, organisations, persons, places, products)
Example (Movie description)
<div itemscope itemtype="http://schema.org/Movie"><h1 itemprop="name">Avatar</h1><div itemprop="director" itemscopeitemtype="http://schema.org/Person">Director: <span itemprop="name">James Cameron</span>
(born <span itemprop="birthDate">August 16, 1954</span>)</div><span itemprop="genre">Science fiction</span><a href="../movies/avatar-theatrical-trailer.html"itemprop="trailer">Trailer</a></div>
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 5
Introduction Vision Technology Solutions Conclusion
Linked Open Data: schema.org
initiative of search engines (Google, Yandex, . . . )
structuring web page content (creative works,
events, organisations, persons, places, products)
Example (Movie description)
Movie Avatar Person
James Cameron
August 16, 1954Science ®ction../movies/. . .
type
nam
e
director
genre
trailer
type
namebirth
Date
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 5
Introduction Vision Technology Solutions Conclusion
Social Data with schema.org
review or rating of a creative work, organization or
product (written by a person)
social network of a person: “knows”, “works for”, “is
colleague of”, “has parent/sibling/spouse/child/relative”
Example (Reviews of a movie)
Movie type
Avatar
name
reviews
authorreviewRatin
g
reviewsauthor
reviewRating
6
ratingValue
8.5
ratingValue
Pünktchen
name
Antonname
Person
type
type
kn
ow
s
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 6
Introduction Vision Technology Solutions Conclusion
schema.org in a Search Engine
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 7
Introduction Vision Technology Solutions Conclusion
Workshop Quality
Author: “What is a good workshop to discuss my latest
idea?”
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 8
Introduction Vision Technology Solutions Conclusion
Workshop Quality: Examples
Low-quality workshop
1st
International Workshop on Applied Networking
(but all non-invited submissions are from authors from
the same institution as the chairs)
High-quality workshop
focused topic, 10 editions so far, balanced continuity and
renewal in organising committee, number of
submissions not decreasing, international participation,
part of a high-pro®le conference
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 9
Introduction Vision Technology Solutions Conclusion
Workshop Quality: Data
Semantic Publishing Challenge [DL14]
@ Extended Semantic Web Conference 2014
One task focused on extracting Linked Data from
CEUR-WS.org workshop proceedings volumes
1,200 workshops since 1995
open access
most important publisher for computer science
workshops
semi-structured HTML tables of content
unstructured PDF full-text
A team from Saint-Petersburg (ITMO University)
won the award for the best-performing tool [KK14]
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 10
Introduction Vision Technology Solutions Conclusion
Conference Quality
Senior Researcher: “Should I accept an invitation to the
programme committee of this conference?”
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 11
Introduction Vision Technology Solutions Conclusion
Conference Quality in the Past: Ranking
CORE (Computing Research and Education Association
of Australasia) and ERA (Excellence in Research for
Australia) rankings of 2008, 2010 and 2013:
infrequent and intransparent
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 12
Introduction Vision Technology Solutions Conclusion
Paper Quality in the Past: Impact Factor
PhD Student: “What are the best publications I should
read to get started?”
Impact Factor
Average number of
citations of recent articles
journals only
not comparable across
disciplines
can be in°uenced by
journal editors
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 13
Introduction Vision Technology Solutions Conclusion
Paper Quality in the Future
Multidimensional, context-sensitive analysis:
trend detection, topic analysis, expert search,
community dynamics, research performance at
diªerent levels (e.g. [OM14])
context-sensitive citation analysis
e.g. 2014 Semantic Publishing Challenge task 2 (using
PubMedCentral XML metadata) [DL14]
“good citation”: B’s contribution is based on A’s
methodology
“bad citation”: A cited in a footnote in the “related work”
section
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 14
Introduction Vision Technology Solutions Conclusion
Data Quality
Reviewer: “Is this paper based on high-quality data?”
Quality metrics of an evolving dataset [DLA14]
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 15
Introduction Vision Technology Solutions Conclusion
Data Quality Assessment
Quality := “®tness for use” – categories [Zav+13]:
Relevancy Conciseness
Timeliness
Rep.-Conciseness
Interoperability
Consistency
Interpretability
Understandability
Versatility*
Availability
Performance* Interlinking*
SyntacticValidity
Representation
ContextualIntrinsic
Accessibility
Trustworthiness
Two dimensionsare related
Licensing*
Semantic Accuracy
Completeness
Security*
Dim1 Dim2
Enable authors to upload data with their papers!
Give peer reviewers access to data quality metrics
Starting collaboration with GESIS (social science)
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 16
Introduction Vision Technology Solutions Conclusion
Directions: Jailbreaking the PDF
“exploring ways to
access scholarly
data in modern
ways”
free peer-reviewed
scienti®c knowledge
from being locked
up in PDF
documents
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 17
Introduction Vision Technology Solutions Conclusion
Directions: Pact with the Devil
Openness vs. impact
Springer:
conference linked
data
Elsevier: executable
paper challenge
ResearchGate: open
reviews
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 18
Introduction Vision Technology Solutions Conclusion
Conclusion
Scientists need help with assessing the quality of
scienti®c output.
Having PDF documents peer-reviewed by human
experts is not su«cient.
We need better quality metrics than the impact
factor.
Not just paper quality matters, but also data quality.
Semantic Web/Linked Data technology helps to
provide complementary machine support. . .
. . . and is a gate into openness.
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 19
References
References I
S. Auer and C. Lange. “Interlinking Data and
Knowledge in Enterprises, Research and Society
with Linked Data”. In: Proceedings of the 11th
International Baltic Conference on Databases andInformation Systems (Baltic DB&IS). (Tallinn, Estonia,
June 8–11, 2014). Ed. by H.-M. Haav, A. Kalja, and
T. Robal. Invited paper. Tallinn, Estonia: Tallinn
University of Technology Press, 2014, pp. 3–12.
A. Di Iorio and C. Lange, eds. (Anissaras, Greece,
May 25, 2014). 2014. URL: http://2014.eswc-conferences.org/program/semwebeval.
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 20
References
References II
J. Debattista, C. Lange, and S. Auer. “Representing
Dataset Quality Metadata using
Multi-Dimensional Views”. 2014. Submitted.
M. Kolchin and F. Kozlov. “Unstable markup: A
template-based information extraction from web
sites with unstable markup”. In: SemanticPublishing Challenge (Extended Semantic WebConference, Semantic Web Evaluation Track).(Anissaras, Greece, May 25, 2014). Ed. by A. Di Iorio
and C. Lange. 2014. URL: http://2014.eswc-conferences.org/program/semwebeval.
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 21
References
References III
M. Kerber, C. Lange, and C. Rowat. ForMaRE.Formal Mathematical Reasoning in Economics. URL:
http://cs.bham.ac.uk/research/projects/formare/ (visited on 2013-02-10).
C. Lange. “Enabling Collaboration on Semiformal
Mathematical Knowledge by Semantic Web
Integration”. PhD thesis. Jacobs University
Bremen, 2011.
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 22
References
References IV
F. Osborne and E. Motta. “Understanding Research
Dynamics”. In: Semantic Publishing Challenge(Extended Semantic Web Conference, Semantic WebEvaluation Track). (Anissaras, Greece, May 25, 2014).
Ed. by A. Di Iorio and C. Lange. 2014. URL:
http://2014.eswc-conferences.org/program/semwebeval.
OntoIOp (Ontology, Model and Speci®cationIntegration and Interoperability), an OMG StandardDevelopment Initiative. 2013. URL:
http://ontoiop.org (visited on 2013-10-09).
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 23
References
References V
A. Zaveri, A. Rula, A. Maurino, R. Pietrobon,
J. Lehmann, and S. Auer. “Quality Assessment
Methodologies for Linked Open Data”. In:
Semantic Web Journal (2013). This article is still
under review. URL: http://www.semantic-web-journal.net/content/quality-assessment-linked-open-data-survey.
Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 24