capturing the ineffable: collecting, analysing, and automating web document quality assessments

Post on 23-Jan-2018

601 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Capturing the Ineffable:

Collecting, Analysing, and Automating

Web Document Quality Assessments

Davide Ceolin, Julia Noordegraaf, Lora Aroyo

• Introduction

• Nichesourcing Web Document Quality Assessments

• User studies

• Conclusion and Future Work

Outlin

e

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Introduction

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Web Document Quality Assessment

• Source criticism• Methodological practice from the humanities

• e.g., from the American Library Association:• How was the source located?

• What type of source is it?

• Who is the author and what are the qualifications of the author in regard to the topic that is discussed?

• When was the information published?

• In which country was it published?

• What is the reputation of the publisher?

• Does the source show a particular cultural or political bias?.

• How does it apply to Web sources?

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Web Document Quality Assessment

What is the quality of each of these documents?

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Authoritative source ✓

Accurate ✓

Precise ✓

Complete ✓

Neutral (?)

Blog Post (?)

Accurate (?)

Precise (?)

Complete (?)

Neutral ✗

• We adapt source criticism to Web documents & aim at automating the process of quality estimation by:• Gathering quality assessments (mostly from experts).

• Looking for markers (document features) that correlate with them.

Quality and Quality

Markers

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Objectives

• Analyse the consistency of quality assessments.

• Are quality assessments consistent among users, over time, etc.?

• Analyse user ability to interpret document features.

• Can the users estimate the quality of a document from its sentiment or trustworthiness level?

• Analyse the predictability of quality assessments.

• Can we automatically estimate the quality of a document?

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Nichesourcing Web Document Quality Assessments

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

• Dataset: documents about vaccinations• Initially, 50 docs, various sources (blogs, authorities, etc.)

• Features• Information (automatically) extracted from documents

using AlchemyAPI & Web of Trust.• Entities, Topics, Sentiment, Emotions, Trustworthiness.

• Quality dimensions• Overall quality, accuracy, completeness, precision,

trustworthiness, readability, neutrality.

Dataset, Features, and Quality

Dimensions

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

• Setup:• 6 documents per participant.• Random selection.• Even distribution of assessments.• Scenario:

Suppose you are asked to write an article about debate on vaccinations triggered by the measles outbreak in 2015 at Disneyland in California.

WebQ: Nichesourcing Web Quality Assessments

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

• Documents are anonymized.• Users choose documents that meet their quality

criteria based on features only.• All feature values are shown, alone and together.

WebQ: Task 1

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

• Read each of the 6 articles.• Assess it.

• Rate completeness, accuracy, etc. • Likert scale 1-5.

• Annotate the article to explain the ratings• Articles are proxied & annotated through AnnotatorJS.

WebQ: Task 2

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

User Studies

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

• User Study 1

• Participants: 20 last-year UvA journalismstudents.

• Duration: 60’.

• User Study 2

• Participants: 20 RMA media scholars.

• Duration: 45’.

• Improvements (learnt from user study 1).

Setup

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

• Data collected:

• 104 (US1) + 47 (US2) assessments.

• 238 (US1) + 89 (US2) annotations.

• No significant difference between Use Cases (Wilcoxon signed-rank test).

• Assessments are assimilable.

• Assessment predictability (SVC)

• Up to 63% accuracy (5-classes)

• Up to 89% accuracy (2-classes)

• Promising predictability. We will try other algorithms.

Results

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

• Highest correlation with overall quality:• Accuracy

• Trustworthiness

• Precision

• Completeness

• Given the task at hand, neutrality is not relevant.

• Weak correlation task 1 - overall quality (task 2).

• Users were mostly unable to interpret those features.

Results

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Conclusion

Capturing the Ineffable

• We collected Web document quality assessments.• WebQ – Nichesourcing application.• 2 user studies with experts.• Clear defined task.• Controlled dataset.

• We analysed the assessments, and automatedtheir prediction.• The task matters more than subjectivity.• Assessments are quite uniform and coherent.• Features in isolation are not very meaningful.• The application setup is important.

Conclusion

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

• We plan to and are currently working on:• Extending the dataset (currently ~1,500 documents).

• Scaling up the experiments and gathering more assessments.

• Involving laymen via crowdsourcing.

• Extending the analyses.

• Utilising other automated reasoning approaches.

(Current and) Future Work

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

https://qupid-project.net/

d.ceolin@vu.nl

Thank you!

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

top related