when search becomes research and research becomes search
DESCRIPTION
SIGIR'13 Workshop on Exploration, Navigation and Retrieval of Information in Cultural Heritage (ENRICH).TRANSCRIPT
![Page 1: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/1.jpg)
When Search becomes Research and Research becomes Search
SIGIR’13 Workshop on Exploration, Navigation and Retrieval of Information in Cultural Heritage (ENRICH)
August 1, 2013, Dublin, Ireland
Jaap KampsUniversity of Amsterdam
![Page 2: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/2.jpg)
(Re)search(Re)searchers
• My current main interest is search related to/supporting research (amongst a few dozen other things)
• So what’s different if your searchers are researchers, and their search is (part of) their research?
• This talk is rather speculative -- no iron-clad formal results -- but I hope to convince you that this is (at least) an interesting use case
• And an area with great opportunities to work in...
![Page 3: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/3.jpg)
Outline
• DATA: The Web and Online Heritage
• Issues: Archival Silence
• USERS: Digital Heritage -- Digital Humanities
• Challenges: Digital Methods
• TOOLS: Supporting Complex Search Tasks
• (Re)search: Digital Methods <-> Complex Search
![Page 4: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/4.jpg)
Lot’s of CH online
![Page 5: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/5.jpg)
CH is digitized on a massive scale
![Page 6: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/6.jpg)
Europeana: millions of objects from 1000s of providers
![Page 7: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/7.jpg)
The UK Web Archive
8
Permission-based selective archiving since 2004 30% success rate 131,164 websites, 54,604
instances, ~14TB WARCs
Domain crawl from 12 April 2013 to implement non-print legal deposit Expected to crawl
between 4-5 million UK websites
Access in reading rooms only
http://www.webarchive.org.uk
Terabytes of Archived Web Data
(From: Hockx-Yu, Web Archiving and Scholarly Use of Web Archives, 2013)
![Page 8: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/8.jpg)
What’s the problem?
![Page 9: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/9.jpg)
Not really that much traffic...
![Page 10: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/10.jpg)
Europeana Web Traffic Report – Q4 2012 - 5 -
Month by Month Overview
Visits Unique Visitors Page Views Time on site/visit (mm:ss)
Bounce rate
October 2012 534,830 441,096
2,017,751 00:02:17 50.27%
November 2012 612,902 505,177
2,299,244 00:02:16 49.79%
December 2012 530,747 439,919 2,079,335 00:02:19 48.80%
Europeana Web Traffic Report – Q4 2012 - 7 -
2. Portal Search
338,574 Visits with Search 36.10% Increase from Q3 2012 52.89% Increase from Q4 2011
Visits with Search is the number of visits during which at least one portal search occurred
743,292 Total Unique Searches 37.82% Increase from Q3 2012 31.67% Increase from Q4 2011
Total Unique Searches is the number of times a search is performed on Europeana (duplicate searches within a single visit are excluded)
3. Object Views, Social Actions & Click-throughs 2,361,589 Object Views 8.10% Increase from Q3 2012 45.55% Increase from Q4 2011
The number of times Europeana object pages have been viewed. Repeated views of a single page are counted
778,046 Search Result Views 10.06% Increase from Q3 2012 8.32% Decrease from Q4 2011
The number of time Europeana search results pages have been viewed. Repeated views of a single page are counted
2,975 Social Actions 46.19% Increase from Q3 2012 22.27% Increase from Q4 2011
The number of times a user has clicked on a social share icon within the portal
KPI 27: 30,000 object shares in 2012 Jan – Dec 2012 – 9,609 shares (from portal)
Let’s say: less traffic than we hoped for...
![Page 11: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/11.jpg)
How often are web archives used?
6
Archiving institutions’ focus on data collection, not usage
19 of 29 IIPC members’ archives (listed on website) have full or partial online access, often permission-based
Large scale national web archives have restricted access – dark archives eg Danish National Web Archive, over 280TB
online access for researchers with PhD or higher level 20 users since 2005
“Document-centric” access methods
No agreed way of calculating / benchmarking access statistics
Little evidence of scholarly use of web archives, making it difficult to understand requirements
(From: Hockx-Yu, Web Archiving and Scholarly Use of Web Archives, 2013)
![Page 12: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/12.jpg)
Archival Silence
• Many online collections suffer from low traffic...
• After years of hard work, the data is there
• But the users aren’t queuing up to come and explore the data
• Why is that happening?
![Page 13: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/13.jpg)
Digital Heritage online are incunabula
![Page 14: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/14.jpg)
Our infrastructure changed in a revolutionary way
![Page 15: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/15.jpg)
Our technology changed in a revolutionary way
![Page 16: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/16.jpg)
How radical did information access methods change?
![Page 17: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/17.jpg)
![Page 18: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/18.jpg)
Think outside the box?
• Are we too “framed” by the type of systems that had before?
• And by those that emerged on the Web?
• (cmp. Diane Kelly’s, Contours and Convergence, KSJ lecture at ECIR’13.)
![Page 19: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/19.jpg)
Wrap Up (1)
• We have made wonderful progress: CH data is out there in huge volume
• More, better, richer, ... every day
• Use of the data is often lagging behind
• We should learn from “the Web”
• But also do really different things!
• (This takes time -- at least a generation)
![Page 20: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/20.jpg)
Right, something really different -- but what?
![Page 21: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/21.jpg)
CH as Web search?
• Should we really try to “copy” the Web?
• Web search optimizes fast, shallow search
• on highly dynamic data with massive #s of user signals
• Could we be *ahead* of the Web (rather than following them)?
![Page 22: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/22.jpg)
Let’s do the obvious :)
• Look seriously at the scholarly use of the CH information we have accumulated?
• Get in touch with researchers and find out how they (want to) use the data and why they are *not* using our tools
• (In fact, heritage institutions traditionally focused on scholars, emphasis on the general public is quite recent...)
![Page 23: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/23.jpg)
Digital Heritage Digital Humanities
e-Humanities
![Page 24: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/24.jpg)
The Times They are a-Changin’ ?
![Page 25: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/25.jpg)
Something exciting is happening!
• Digital Humanities emerging fast in response to massive volume of data
• Digitization of historic sources
• Heritage of the future is digital
• User-generated content in new media
• In short: for many research questions a lot of relevant data is available!
![Page 26: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/26.jpg)
Change in Character1.0 2.0
Collection-centered User-centered
Supply-driven Demand-driven
Professionals Amateurs
Individual scholar Team or lab
Small scale Large scale
Qualitative Quantitative
![Page 27: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/27.jpg)
Change = Radical!
• Change in research paradigm?
• Traditional humanities based on interpretative paradigm
• Empirical sciences based on a truth-finding paradigm
• Did the “success criterion” change?
• Use tools of the exact science for the benefit of traditional paradigm?
![Page 28: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/28.jpg)
(Actual empirical science is also less rigorous)
![Page 29: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/29.jpg)
DH requires new data-driven research methods
![Page 30: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/30.jpg)
"Google and the politics of tabs" by Govcom.org, Amsterdam, 2008.
Website historiography
![Page 31: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/31.jpg)
Innovat ion and Evaluat ion of Informat ion
A CHI98 Workshop Gene Golovchinsky and Nicholas J. Belkin
Abstract
This report summarizes a workshop held at CHI 98 that focused on several aspects of information exploration, including user interfaces, theory, and evaluation. Information exploration is a common activity that spans a variety of media and is an integral component of many information seeking behav- iors that people engage in. The com- plexity of this activity, and the need to support it appropriately, led us to pro- pose this workshop. Over the course of two days, we examined several aspects of this problem, struggled with a few definitions, and came away with a bet- ter understanding of the design space. Here we summarize those efforts.
Introduction
Traditional Information Retrieval is concerned with improving effective- ness of indexing and retrieval mecha- nisms, and with supporting one information seeking behavior: speci- fied searching through query formula- tion. This has been predicated on support for one kind of user popula- tion, with one kind of information need. But the networked information environment has resulted in a shift in the user population of information retrieval systems. This change has introduced new classes of users, in the sense of levels o f expertise, and has also made clear that there are different kinds of information needs and differ- ent kinds of information seeking behaviors than those supported by tra- ditional IR systems and techniques. This workshop focused on developing understanding of one such information seeking behavior, Information Explo- ration, on interface design for support- ing this behavior, and on evaluation
methods and measures for assessing such interfaces.
Information Exploration addresses the goal of refining a vague concept into a more thorough understanding of the problem which led to the information interaction. We believe that informa- tion exploration research falls squarely in the domain of human-computer interaction with some emphasis on information retrieval, rather than vice versa. Thus one of the thrusts o f this workshop was to attempt to character- ize the activities users engage in, to design for those activities, and to iden- tify evaluation techniques and mea- sures that provide appropriate insights into users ' behavior and performance.
Organization
About 20 people participated in the workshop. They were chosen on the basis of initial brief submitted position papers, and represented a broad spec- trum of industry and academia. Partic- ipants came from France, Canada, Germany, and the U.S. After accep- tance, participants were asked to sub- mit longer (4-5 page) position statements that described relevant research and perspectives a few weeks prior to the workshop. These papers were made available through the workshop web site, and participants were encouraged to review and com- ment on them.
Submissions were organized into three categories: Interface, Evaluation and Theory. Each category was further subdivided into themes that suggested themselves. Thus a number of inter- face submissions concerned informa- tion visualization; three of five evaluation-related submissions focused on expertise, and the theory section split evenly between frame-
works and representation of informa- tion.
On the morning of the first day, work- shop activities were organized based on the three topics we had initially defined. After the morning introduc- tory session, we split the workshop into three new working groups, based on the results of that discussion.
J. - ©
Figure 1. Information exploration (gray box) situated in the broader task. The black
"method" box may involve a recursive information exploration step to identify
information sources.
Discussion Highlights
It seems obligatory for a workshop to debate the definition of the concept that brought people together; we embraced this orthodoxy with a ven- geance. One of the recurring themes of
SIGCHI Bulletin Volume 31, Number 1 January 1999 22
Essentially these are complex search strategies!
![Page 32: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/32.jpg)
Wrap Up (II)
• Digital Humanities is emerging fast and leads to new data driven research methods
• Motivated by hum. research questions
• Essentially they are crawling, cleaning, tokenizing, ranking, exploring, visualizing
• Basically the stuff *we* are experts in
• Can we build tools that support their research task from begin to end?
![Page 33: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/33.jpg)
(Re)search?
• Interactively construct complex strategy
• data sources, selections, processing, back-and-forth, ...
• Explore all results using facets/aspects
• explore whole data set -- no 10 links
• Store, share, and refine search strategies
• “Session” may take minutes, hours, days, ...
![Page 34: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/34.jpg)
How to get there?
![Page 35: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/35.jpg)
(1) Intensive collaborations with CH institutions
![Page 36: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/36.jpg)
(2) Include researchers: Co-creation, Living Lab, ...
![Page 37: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/37.jpg)
(3) Build not a tool, but the toolmaker’s tools
![Page 38: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/38.jpg)
Team up with Arjen de Vries and Spinque :)
![Page 39: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/39.jpg)
Search strategy from building blocks
![Page 40: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/40.jpg)
Strategy Builder Each block = data or manipulations
Build dedicated search engine “on the fly”
![Page 41: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/41.jpg)
Research methods become search strategies
Store, refine, reuse, share strategies
(Re)search!
![Page 42: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/42.jpg)
Web Archive (New Media scholars)
![Page 43: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/43.jpg)
Thaer SamarPhD/programmer
Hugo HuurdemanPhD researcher
Anat Ben-DavidPostdoc
Arjen de Vries Jaap Kamps Richard Rogers
Paul DoorenboschRené Voorburg
Victor-Jan Vos
![Page 44: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/44.jpg)
WebART Goals
• Evaluating current curation and selection procedures of Web archives
• Getting insights into current use of Web archives
•Developing new methods and tools for research using Web archives
![Page 45: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/45.jpg)
Flickr: koninklijkebibliotheek
KB: Web archive since 2007
Statistics:•4,000+ websites
•17,000+ harvests
•7+ TerabyteSelective approach
![Page 46: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/46.jpg)
KB: Web archive since 2007
Statistics:•4,000+ websites
•17,000+ harvests
•7+ TerabyteSelective approach
![Page 47: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/47.jpg)
![Page 48: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/48.jpg)
![Page 49: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/49.jpg)
”Wayback Machine” interface
![Page 50: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/50.jpg)
• WebARTist (pilot - beta 1)
• Initial dataset (corpus)• 432 crawls, 16 months (13.64 GB)
Full-text search engine
KB CommonCrawl+nu.nl
(Dutch news aggregator)
![Page 51: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/51.jpg)
![Page 52: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/52.jpg)
![Page 53: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/53.jpg)
![Page 54: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/54.jpg)
![Page 55: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/55.jpg)
![Page 56: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/56.jpg)
![Page 57: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/57.jpg)
![Page 58: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/58.jpg)
![Page 59: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/59.jpg)
WebARTist: Use case
• Digital Methods Winter School (Jan. ’13)
• Co-design workshop (“Living Lab”)
• researchers & developers
• first use WebARTist
![Page 60: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/60.jpg)
Word frequency analysis
0
100
200
300
400
500
600
700
800
17/05/2011 25/08/2011 03/12/2011 12/03/2012 20/06/2012 28/09/2012 06/01/2013
![Page 61: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/61.jpg)
Co-Word Analysis
![Page 62: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/62.jpg)
1
abcnews.go.com1
brucespringsteen.net1
theverge.com1
sportamerika.nl
1
reuters.com1
ebird.org
1
googleblog.blogspot.co.uk
1
presscentre.sony.eu
1
project.wnyc.org
1
bbc.com
1
poynter.org
1
abclocal.go.com
1
en.wikipedia.org
1
nhc.noaa.gov
1
nypost.com
2
earthcam.com
2
maps.google.com
3
hp.com
4
google.org
4
edition.cnn.com
Syria
Sandy
7wired.com
7allthingsd.com
7abcnews.go.com
7thesun.co.uk
7allesoversterrenkunde.nl
8volkskrant.nl
9fd.nl
9nos.nl
9mobiel.nuvideo.nl
9guardian.co.uk
10bit.ly
10billboard.biz
10cbsnews.com
11
usmagazine.com
11
variety.com
12
theverge.com
12
people.com
13
Rutte en Verhagen leggen schuld bij PVV
13
telegraaf.nl
14
washingtonpost.com
18
edition.cnn.com
19
bbc.co.uk
20
youtube.com
20
nytimes.com
21
styletoday.nl
21
bloomberg.com
24
thesistools.com
26
hollywoodreporter.com
30
online.wsj.com
30
deadline.com
33
poll.nupubliek.nl34
spaarrente.nl
39
gamer.nl
48
reuters.com
52
tmz.com
57
open.spotify.com
78
peil.nl
93
gezondheidsnet.nl
US Election
4
1blogs.aljazeera.net
1youtube.com
1worldpressphoto.org
1wikileaks.org1
washingtonpost.com
1eubusiness.com
1vesti.bg
1trouw.nl
1#NAME
1en.wikipedia.org
1l
1sana.sy
1hosted.ap.org
1shariah4belgium.com
1nrc.nl
1guardian.co.uk
1geopolicity.com
1nctb.nl
1rt.com
1kaspersky.com
2
todayszaman.com
2
volkskrant.nl
2
spaarrente.nl
2
reuters.com
2
peil.nl
2
hrw.org
2
uk.reuters.com
2
cbsnews.com
3
telegraph.co.uk
3
maps.google.nl
4
bbc.co.uk
5
edition.cnn.com
5
aljazeera.com
english.alarabiya.net
7
maps.google.com
Outlink Analysis
![Page 63: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/63.jpg)
![Page 64: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/64.jpg)
![Page 65: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/65.jpg)
![Page 66: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/66.jpg)
Geomapping location Wire service
![Page 67: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/67.jpg)
Temporal Image Analyses
![Page 69: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/69.jpg)
Pilot Tools: Scalable Full Text Search++
User interface!
Zoekmachine!
Inverted Index!
Hadoop Distributed Filesystem!
![Page 70: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/70.jpg)
Some Lessons (pilot)
• Fun, creative (but hard for control freaks)
• unexpected really new ideas!
• It is really co-design -- a dialog:
• researchers keep talking in “solutions”
• unaware of the full potential?
• Search engine used to explore
• Then want to use their own tools
• Emphasis on aggregates, visualizations
![Page 71: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/71.jpg)
Ongoing• Started to designing the whole task support
• Want folks to stay in the system!
• Connect source data to later “information graphics”
• For the research prototype: no polished graphics
• Volume/Hadoop slow things down
• 1. Port “search by strategy” to Hadoop (slow, asynchronous)
• 2. After (complex) selection on Hadoop, instantiate a dedicated environment (fast, interactive, bounded size)
![Page 72: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/72.jpg)
Projects with museums, archives, libraries, archaeology
![Page 73: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/73.jpg)
Wrap Up (III)
• How far can we push this to support research in a generic way?
• Working on many sources, processing components and way to combine them into search strategies
• Working on richer data (also from research use)
• Working on scale
• Data is still a crucial issue/factor
• Researchers always want what isn’t there
• Data quality/noise/completeness issues
![Page 74: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/74.jpg)
Work on (Re)search?
• (Re)search leads to radically different modes of information access!
• (NB: Recall the panel!)
• Digital humanities is happening right now
• No shortage of data, dedicated users, ...
• Still lot’s of low hanging fruit
• Great opportunities for young researchers!
![Page 75: When Search becomes Research and Research becomes Search](https://reader033.vdocuments.site/reader033/viewer/2022051411/547ab1ceb4af9fcf498b458c/html5/thumbnails/75.jpg)
Questions?
• We’re hiring!
• 2 PhD (4y), 2 Postdocs (6m/1y).
• WebART: http://webarchiving.nl/
• ExPoSe: http://staff.science.uva.nl/~kamps/expose/
• Thank you to all collaborators: Arjen de Vries, Richard Rogers, Hugo Huurdeman, Thaer Samar, Anat Ben David, Maarten Marx, Wouter Alink, ...