machine support for interacting with scientific publications improving information retrieval, and...

28
Introduction Vision Technology Solutions Conclusion Machine Support for Interacting w. Scientic Publications, Improving Information Retrieval, and Assessing Quality of Scientic Output th German-Russian Young Researchers Forum Christoph Lange , Enterprise Information Systems, Institute for Applied Computer Science, University of Bonn Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Sankt Augustin http://langec.wordpress.com/about -- Lange (Bonn) Interacting with Scientic Publications; Assessing Quality of Scientic Output --

Upload: christoph-lange

Post on 18-Nov-2014

116 views

Category:

Technology


2 download

DESCRIPTION

4th German-Russian Young Researchers Forum Saint-Petersburg, August 2014

TRANSCRIPT

Page 1: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Machine Support for

Interacting w. Scienti®c Publications,

Improving Information Retrieval, and

Assessing Quality of Scienti®c Output

4th

German-Russian Young Researchers Forum 2014

Christoph Lange1,2

1Enterprise Information Systems, Institute for Applied Computer Science, University of Bonn

2Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Sankt Augustin

http://langec.wordpress.com/about

2014-07-07

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 1

Page 2: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Machine Support for

Assessing Quality of Scienti®c Output

4th

German-Russian Young Researchers Forum 2014

Christoph Lange1,2

1Enterprise Information Systems, Institute for Applied Computer Science, University of Bonn

2Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Sankt Augustin

http://langec.wordpress.com/about

2014-07-07

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 1

Page 3: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Hello, World!

2011 PhD at Jacobs Univ. Bremen, Germany: software for

collaborating on mathematical documents [Lan11]

2011/12 Univ. Bremen, Germany: making knowledge of

diªerent complexity manageable for computers

[OntoIOp13]

2012/13 Univ. Birmingham, UK: enabling domain

experts to make mathematical models

machine-veri®able [KLR]

2013– Enterprise Information Systems @ Univ. Bonn,

Germany / Organized Knowledge @ Fraunhofer IAIS:

enterprise information integration [AL14], data

quality assessment, . . .

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 2

Page 4: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Assess Quality of Scienti®c Output (I)

Vision: answer the following questions about the quality

of scienti®c output:

Author “What is a good workshop to discuss my latest

idea?”

Senior Researcher “Should I accept an invitation to the

programme committee of this conference?”

PhD Student “What are the best publications I should

read to get started?”

Reviewer “Is this paper based on high-quality data?”

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 3

Page 5: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Assess Quality of Scienti®c Output (II)

How? – Semantic Web / Linked Open Data technology

weak arti®cial intelligence – does not aim at

replacing, but at supporting humans

practically applicable, and scalable to the size of

the Web (→ search engine example)

suitable for connecting data from heterogeneoussources:

scienti®c publications(bibliographic metadata, citations and full text)

social networks(in science? – ResearchGate, Mendeley, etc.)

research data

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 4

Page 6: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Linked Open Data: schema.org

initiative of search engines (Google, Yandex, . . . )

structuring web page content (creative works,

events, organisations, persons, places, products)

Example (Movie description)

AvatarDirector: James Cameron (born August 16, 1954)

Science ®ction

Trailer

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 5

Page 7: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Linked Open Data: schema.org

initiative of search engines (Google, Yandex, . . . )

structuring web page content (creative works,

events, organisations, persons, places, products)

Example (Movie description)

<div class="movie"><h1>Avatar</h1><div class="director">Director: James Cameron(born August 16, 1954)

</div><span class="genre">Science fiction</span><a href="../movies/avatar-theatrical-trailer.html"Trailer</a></div>

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 5

Page 8: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Linked Open Data: schema.org

initiative of search engines (Google, Yandex, . . . )

structuring web page content (creative works,

events, organisations, persons, places, products)

Example (Movie description)

<div itemscope itemtype="http://schema.org/Movie"><h1 itemprop="name">Avatar</h1><div itemprop="director" itemscopeitemtype="http://schema.org/Person">Director: <span itemprop="name">James Cameron</span>

(born <span itemprop="birthDate">August 16, 1954</span>)</div><span itemprop="genre">Science fiction</span><a href="../movies/avatar-theatrical-trailer.html"itemprop="trailer">Trailer</a></div>

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 5

Page 9: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Linked Open Data: schema.org

initiative of search engines (Google, Yandex, . . . )

structuring web page content (creative works,

events, organisations, persons, places, products)

Example (Movie description)

Movie Avatar Person

James Cameron

August 16, 1954Science ®ction../movies/. . .

type

nam

e

director

genre

trailer

type

namebirth

Date

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 5

Page 10: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Social Data with schema.org

review or rating of a creative work, organization or

product (written by a person)

social network of a person: “knows”, “works for”, “is

colleague of”, “has parent/sibling/spouse/child/relative”

Example (Reviews of a movie)

Movie type

Avatar

name

reviews

authorreviewRatin

g

reviewsauthor

reviewRating

6

ratingValue

8.5

ratingValue

Pünktchen

name

Antonname

Person

type

type

kn

ow

s

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 6

Page 11: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

schema.org in a Search Engine

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 7

Page 12: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Workshop Quality

Author: “What is a good workshop to discuss my latest

idea?”

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 8

Page 13: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Workshop Quality: Examples

Low-quality workshop

1st

International Workshop on Applied Networking

(but all non-invited submissions are from authors from

the same institution as the chairs)

High-quality workshop

focused topic, 10 editions so far, balanced continuity and

renewal in organising committee, number of

submissions not decreasing, international participation,

part of a high-pro®le conference

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 9

Page 14: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Workshop Quality: Data

Semantic Publishing Challenge [DL14]

@ Extended Semantic Web Conference 2014

One task focused on extracting Linked Data from

CEUR-WS.org workshop proceedings volumes

1,200 workshops since 1995

open access

most important publisher for computer science

workshops

semi-structured HTML tables of content

unstructured PDF full-text

A team from Saint-Petersburg (ITMO University)

won the award for the best-performing tool [KK14]

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 10

Page 15: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Conference Quality

Senior Researcher: “Should I accept an invitation to the

programme committee of this conference?”

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 11

Page 16: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Conference Quality in the Past: Ranking

CORE (Computing Research and Education Association

of Australasia) and ERA (Excellence in Research for

Australia) rankings of 2008, 2010 and 2013:

infrequent and intransparent

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 12

Page 17: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Paper Quality in the Past: Impact Factor

PhD Student: “What are the best publications I should

read to get started?”

Impact Factor

Average number of

citations of recent articles

journals only

not comparable across

disciplines

can be in°uenced by

journal editors

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 13

Page 18: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Paper Quality in the Future

Multidimensional, context-sensitive analysis:

trend detection, topic analysis, expert search,

community dynamics, research performance at

diªerent levels (e.g. [OM14])

context-sensitive citation analysis

e.g. 2014 Semantic Publishing Challenge task 2 (using

PubMedCentral XML metadata) [DL14]

“good citation”: B’s contribution is based on A’s

methodology

“bad citation”: A cited in a footnote in the “related work”

section

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 14

Page 19: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Data Quality

Reviewer: “Is this paper based on high-quality data?”

Quality metrics of an evolving dataset [DLA14]

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 15

Page 20: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Data Quality Assessment

Quality := “®tness for use” – categories [Zav+13]:

Relevancy Conciseness

Timeliness

Rep.-Conciseness

Interoperability

Consistency

Interpretability

Understandability

Versatility*

Availability

Performance* Interlinking*

SyntacticValidity

Representation

ContextualIntrinsic

Accessibility

Trustworthiness

Two dimensionsare related

Licensing*

Semantic Accuracy

Completeness

Security*

Dim1 Dim2

Enable authors to upload data with their papers!

Give peer reviewers access to data quality metrics

Starting collaboration with GESIS (social science)

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 16

Page 21: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Directions: Jailbreaking the PDF

“exploring ways to

access scholarly

data in modern

ways”

free peer-reviewed

scienti®c knowledge

from being locked

up in PDF

documents

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 17

Page 22: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Directions: Pact with the Devil

Openness vs. impact

Springer:

conference linked

data

Elsevier: executable

paper challenge

ResearchGate: open

reviews

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 18

Page 23: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

Introduction Vision Technology Solutions Conclusion

Conclusion

Scientists need help with assessing the quality of

scienti®c output.

Having PDF documents peer-reviewed by human

experts is not su«cient.

We need better quality metrics than the impact

factor.

Not just paper quality matters, but also data quality.

Semantic Web/Linked Data technology helps to

provide complementary machine support. . .

. . . and is a gate into openness.

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 19

Page 24: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

References

References I

S. Auer and C. Lange. “Interlinking Data and

Knowledge in Enterprises, Research and Society

with Linked Data”. In: Proceedings of the 11th

International Baltic Conference on Databases andInformation Systems (Baltic DB&IS). (Tallinn, Estonia,

June 8–11, 2014). Ed. by H.-M. Haav, A. Kalja, and

T. Robal. Invited paper. Tallinn, Estonia: Tallinn

University of Technology Press, 2014, pp. 3–12.

A. Di Iorio and C. Lange, eds. (Anissaras, Greece,

May 25, 2014). 2014. URL: http://2014.eswc-conferences.org/program/semwebeval.

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 20

Page 25: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

References

References II

J. Debattista, C. Lange, and S. Auer. “Representing

Dataset Quality Metadata using

Multi-Dimensional Views”. 2014. Submitted.

M. Kolchin and F. Kozlov. “Unstable markup: A

template-based information extraction from web

sites with unstable markup”. In: SemanticPublishing Challenge (Extended Semantic WebConference, Semantic Web Evaluation Track).(Anissaras, Greece, May 25, 2014). Ed. by A. Di Iorio

and C. Lange. 2014. URL: http://2014.eswc-conferences.org/program/semwebeval.

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 21

Page 26: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

References

References III

M. Kerber, C. Lange, and C. Rowat. ForMaRE.Formal Mathematical Reasoning in Economics. URL:

http://cs.bham.ac.uk/research/projects/formare/ (visited on 2013-02-10).

C. Lange. “Enabling Collaboration on Semiformal

Mathematical Knowledge by Semantic Web

Integration”. PhD thesis. Jacobs University

Bremen, 2011.

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 22

Page 27: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

References

References IV

F. Osborne and E. Motta. “Understanding Research

Dynamics”. In: Semantic Publishing Challenge(Extended Semantic Web Conference, Semantic WebEvaluation Track). (Anissaras, Greece, May 25, 2014).

Ed. by A. Di Iorio and C. Lange. 2014. URL:

http://2014.eswc-conferences.org/program/semwebeval.

OntoIOp (Ontology, Model and Speci®cationIntegration and Interoperability), an OMG StandardDevelopment Initiative. 2013. URL:

http://ontoiop.org (visited on 2013-10-09).

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 23

Page 28: Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and Assessing Quality of Scientific Output

References

References V

A. Zaveri, A. Rula, A. Maurino, R. Pietrobon,

J. Lehmann, and S. Auer. “Quality Assessment

Methodologies for Linked Open Data”. In:

Semantic Web Journal (2013). This article is still

under review. URL: http://www.semantic-web-journal.net/content/quality-assessment-linked-open-data-survey.

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 24