![Page 1: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/1.jpg)
www.polimedia.nl
Building the PoliMedia system; data- and user-driven
![Page 2: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/2.jpg)
eHumanities group - PoliMedia 2
Who are we?Laura Hollink• Assistant professor at VU• Modeling, linking and enrichment
of data• Data-driven research• @laurahollink
Max Kemman• Junior researcher at EUR• Human-Computer Interaction• User-driven research• @MaxJ_K
PoliMedia teamHenri Beunders (EUR)Jaap Blom (NISV)Laura Hollink (VU)Geert-Jan Houben (TU Delft)
Funded by CLARIN-NL
Damir Juric (TU Delft)Max Kemman (EUR)Martijn Kleppe (EUR) Johan Oomen (NISV)
![Page 3: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/3.jpg)
eHumanities group - PoliMedia 3
Linking Politics to Media
![Page 4: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/4.jpg)
eHumanities group - PoliMedia 4
The research questions• How is a person, subject or process covered & visualised by the media?• How do debates and arguments develop over a longer period of time?• Analysing the changing ideas, arguments and presentation in different
media
![Page 5: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/5.jpg)
eHumanities group - PoliMedia 5
Issues with current approach
![Page 6: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/6.jpg)
eHumanities group - PoliMedia 6
Issues with current approach
![Page 7: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/7.jpg)
eHumanities group - PoliMedia 7
Goal: explicit links to different media types in one system
![Page 8: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/8.jpg)
eHumanities group - PoliMedia 8
PoliMedia systemPoliMedia
Portal
- Browse: debate and date
- Search: debate and person
NewspapersKB
TelevisionSound and Vision
RadioKB
Staten Generaal Digitaal
KB
Data-driven (Laura) & user-driven (Max)
![Page 9: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/9.jpg)
eHumanities group - PoliMedia 9
Data
![Page 10: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/10.jpg)
Debate dataHandelingen der Staten-General or Dutch Hansard from 1945-1995
Some provenance:1. Transcripts are made of the complete debates of the Dutch
parliament.2. Published online by the government on
http://www.statengeneraaldigitaal.nl/ (1818 1995) and http://officielebekendmakingen.nl/ (from 1995)
3. PoliticalMashup project has translated government pdf and txt files into XML, incl URI’s as identifiers, see http://politicalmashup.nl/
4. We build on that.
![Page 11: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/11.jpg)
eHumanities group - PoliMedia 11
Structure of the debate data
Including:• who, when, what • identifiers for subparts o
f the debate• chronological order of
speakers
![Page 12: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/12.jpg)
Media data• Newspaper articles
– at the National Library of the Netherlands
– Many newspapers 1950- 1995– Text + images of newspaper
layout
• Radio bulletins
– Transcripts of ANP news
• Newscasts
– in the Academia collection of the Netherlands institute for Sound and Vision
![Page 13: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/13.jpg)
Semantic model
![Page 14: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/14.jpg)
Semantic model
Reuse of vocabularies:
Simple Event Model (SEM), Dublin Core, FOAF, links to ISOCAT data categories.
![Page 15: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/15.jpg)
15
Linked Data
eHumanities group - PoliMedia
• Data openly accessible in a semantic Web standard• Easy to combine with other semantic Web data• E.g. DBpedia data on politicians and parties.
![Page 16: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/16.jpg)
eHumanities group - PoliMedia 16
Linking Debates to Newspaper articles that cover them
• Challenges:– How to link documents that are so different in
nature?– Can we use the structure of the debates: people,
chronologic order of speeches, introductions to each new topic, etc.
– How can we do this efficiently, using the access mechanisms of the archives?
![Page 17: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/17.jpg)
eHumanities group - PoliMedia 17
Linking approach
![Page 18: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/18.jpg)
Detect topicsThe MALLET topic model package• Unsupervised analysis of text• “a Topic consists of a cluster of words that frequently occur together”• [see http://mallet.cs.umass.edu/topics.php]• Input:
– Text– Number of iterations– Number of topics/clusters
• Output:– Words that cluster around one topic.
• Example:– Text: a speech in a debate from 1975– number of iterations: 2000– number of topics: 1
![Page 19: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/19.jpg)
eHumanities group - PoliMedia 19
Create Queries
![Page 20: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/20.jpg)
eHumanities group - PoliMedia 20
Evaluation
• Experiment 1: NEs in speech• Experiment 2: NEs + topics in speech• Experiment 3: NEs + topics in speech and debate
![Page 21: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/21.jpg)
eHumanities group - PoliMedia 21
Results
• A linked open data set of Dutch parliamentary debates.
• With links to URL’s of news paper articles and radio bulletins at the Royal Library.
• A system that supports researchers in finding the data to answer their questions.
![Page 22: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/22.jpg)
eHumanities group - PoliMedia 22
User-driven What do scholars want?
• Why user research?• Understanding the user [1, 2]
– Acceptance– Performance– Capabilities– Weaknesses
• Goal– Creating a system that is intuitive and helpful to the users
[1] Y. Liu, A. Osvalder, and M. Karlsson, “Considering the importance of user profiles in interface design,” no. May, 2010[2] J. Preece, Y. Rogers, and H. Sharp, “Interaction Design: Beyond Human-Computer Interaction,” Design, vol. 18, no. 1, pp. 68-68, 2002
![Page 23: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/23.jpg)
eHumanities group - PoliMedia 23
User research in the development process
• Examine search behaviour of users– Survey regarding search strategies– Interviews
• User wishes → user requirements• Wireframes → Prototype• Evaluation →New prioritization of remaining
user requirements• Final version
![Page 24: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/24.jpg)
24
SurveyGeneral search strategies
• N=294• Popular search engines
Very often
Often
Regularly
Sometimes
Never
Don’t know it Goo
gle
Goo
gle
Imag
esG
oogl
e Sc
hola
rYo
uTub
eJS
TOR
KB Flic
krEB
SCO
Nati
onaa
l Arc
hief
Web
of K
now
ledg
e
Uitz
endi
ng G
emis
t
Yaho
o!Bi
ngAc
adem
ia.n
lEu
rope
ana
Scop
usM
icro
soft
Aca
dem
ic S
earc
hEU
scre
enAr
kyve
s
![Page 25: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/25.jpg)
eHumanities group - PoliMedia 25
SurveyGeneral search strategies
1. Keywords 4,752. Advanced search 3,363. Related terms 2,524. Boolean 2,425. Browsing subject
categories 2,296. Filters 2,197. Thesaurus 1,878. Visualization 1,22
![Page 26: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/26.jpg)
eHumanities group - PoliMedia 26
SurveyConclusions
• Google is the dominant search engine• This has two consequences
1. People compare other search systems to their experience with Google
2. The search task is mainly performed by using keywords
![Page 27: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/27.jpg)
eHumanities group - PoliMedia 27
Interviews
• N=5• Quantitative (n=2) as well as qualitative (n=4)• Main themes
– How do people search currently?– What could be improved about current search systems?– What should PoliMedia offer, given its goals?
• Results– 39 user wishes– Prioritized internally
• 19 user wishes deemed out of scope• 20 user requirements
![Page 28: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/28.jpg)
eHumanities group - PoliMedia 28
Interviews Findings
• Key issue is to provide a good overview of data – Why are search results retrieved– How are search results ranked
• Assumptions of relevance– Higher frequency of keywords indicated higher relevancy to query?– Longer segments (speeches and articles) indicate higher
importance?• Many more or less out-of-scope wishes to make current
research easier– Sentiment-metadata– Context metadata– Ability to export to own software
![Page 29: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/29.jpg)
eHumanities group - PoliMedia 29
• Clear and immediate keyword-search
• Support for Booleans and (some) Google-search operators
• Separate advanced-search
WireframesSearch interface
![Page 30: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/30.jpg)
eHumanities group - PoliMedia 30
WireframesSearch results
• Keyword search remains prominent
• User chosen ranking of results
• Keyword highlighting
• Overview of related media
• Support for filtering
![Page 31: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/31.jpg)
eHumanities group - PoliMedia 31
WireframesDebate page
• Keyword search remains prominent
• Overview of people in debate
• Easy access to related material
![Page 32: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/32.jpg)
eHumanities group - PoliMedia 32
Prototype v1.0
![Page 33: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/33.jpg)
eHumanities group - PoliMedia 33
Evaluation
• Eye tracking evaluation of the search system– Search system was still in development
• N=24– History– Political communication
• Goals– Gain understanding of distribution of attention– Collect general feedback on interface
![Page 34: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/34.jpg)
eHumanities group - PoliMedia 34
Evaluation Eye tracking
• Viewing Duration
• Search bar received little attention after search results were displayed
• Facets received a lot of attention• Page-search (CTRL+F) mainly received
attention on debate page view
Tasks Search bar Facets Search results Page-search
Known Item 17% 22% 60% 2%
Exploratory 6% 12% 80% 2%
![Page 35: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/35.jpg)
eHumanities group - PoliMedia 35
Evaluation Usability feedback
• The ranking of search results was an issue for users
• The year-filter should be a slider• The debate page should be greatly improved– Better identification for speaker, party, topic,
relevance to query– Provide filters on debate-page as well
![Page 36: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/36.jpg)
eHumanities group - PoliMedia 36
Prototype v2.0
![Page 37: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/37.jpg)
eHumanities group - PoliMedia 37
Prototype v2.0 - query
![Page 38: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/38.jpg)
eHumanities group - PoliMedia 38
Prototype v2.0 – filter speaker
![Page 39: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/39.jpg)
eHumanities group - PoliMedia 39
Prototype v2.0 - filter role
![Page 40: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/40.jpg)
eHumanities group - PoliMedia 40
Prototype v2.0 - debate
![Page 41: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/41.jpg)
eHumanities group - PoliMedia 41
Prototype v2.0 - highlight speech
![Page 42: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/42.jpg)
eHumanities group - PoliMedia 42
Prototype v2.0 - link newspaper
![Page 43: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/43.jpg)
eHumanities group - PoliMedia 43
Prototype v2.0 - newspaper
![Page 44: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/44.jpg)
eHumanities group - PoliMedia 44
Prototype v2.0 - link radio
![Page 45: Building the PoliMedia search system; data- and user-driven](https://reader036.vdocuments.site/reader036/viewer/2022062709/559137961a28ab07498b4600/html5/thumbnails/45.jpg)
eHumanities group - PoliMedia 45
Conclusion
• PoliMedia; data- or user-driven?• Continuous interplay– Users gave input for usefulness of links– Data limits what features we can offer to users
• Collection quality and usability are both critical to users [3]
[3] Xie, I. (2006). Evaluation of digital libraries: Criteria and problems from users’ perspectives. Library & Information Science Research, 28(3), 433–452. doi:10.1016/j.lisr.2006.06.002