Aruna Balasubramanian, Yun Zhou, W Bruce Croft,Brian N Levine and Arun Venkataramani
Department of Computer Science, University of Massachusetts, Amherst
Web Search From a Bus
Why web search from a bus?
Open access point commonly available
Intermittent internet connectivity from vehicles possible• no subscription cost• useful when no other connectivity is available
Web search 2nd most common web activity (survey by pewinternet.org)
Connectivity characteristics of testbeds
Goal: Build web search in the presence of frequent disconnections and small connectivity duration
Web search process<your favorite search engine>
Retrievin
g web….
Retrievin
g images…
Retrieving….
Adapting to vehicular network
Why challenging?
Interactive• several exchanges between user and search engine
needed
Results imprecise• response may not be relevant• difficult to measure relevance
Thedu: Proxy Architecture: sustain interactionIR contribution: increase usefulness of returned response
Thedu proxy
Between vehicle and search engine
When proxy receives query request from vehicle• retrieves urls and snippets• prefetches URL contents including images• stores responses and maintains state
When vehicle connects to proxy• downloads pending responses
Client and proxy architecture
USER
Web interface
Store query
Process response
Client-side Vehicle Server-side Proxy
Queries for vehicle
Fetch URL/images
Prioritize response
Pending responses
Search engine
Web site
Inte
rmitt
ent
conn
ectiv
ity
New queries
Queries
ResponsebundlesResponses
How to prioritize?
Search engines use relevance scores to rank responses• scores not comparable across queries
Even if response is relevant it may not be useful• Query “chants 2007” needs only one response
Thedu• Normalize relevance scores: Comparable across queries• Classify query-type: To capture user intent
http://www.netlab.hut.fi/chants-2007/
Query-Type classification
Query-type classification• Homepage query: “cnn”, “chants 2007”• Non-homepage query: “Harry potter review”
Thedu classifies using URL, snippet and title field• E.g., “chants 2007” on Google• <url> http://www.netlab.hut.fi/chants-2007 </url>• <snippet> Welcome to the home page of the ACM MobiCom
workshop on Challenged Networks (CHANTS 2007). </snippet>• <title> chants workshop </title>
Homepage Non HomepageQuery terms occur in URL Query is in question form
All query terms occur in title or snippet
Top URL is wikipedia
Less than 3 words Length greater than 3 words
URL is root
Relevance score normalization
Modified language model framework
D: Document, Q: Query, C: Collection
Normalized score
Kullback-Leibler divergence (distance between Q and D)
Probability of word occurring in document
Probability of word occurring in collection
Thedu protocol
1. Sort responses in the order of normalized score
2. For response r for query q,
2a. Update
2b. If q is homepage query and do not send
2c. Else send response to vehicle
: expected relevance of all response sent for a query q
: probability that r is relevant for q
Evaluation goals
What is the delay in getting search results?
How many results were relevant to the user?
Evaluation Tools
DieselNet
Indri search engine
TREC (Text Retrieval Conference)• Predefined web data collection (10G)• Predefined set of queries (100 homepage + 50 content)• Relevance judgments (which documents are relevant for query)
Thedu’s query-type classifier accuracy: 88%
Deployment on DieselNet
Thedu vs Proxy-less server
Thedu• March 26 to March 30• Bundle responses• Returns responses in
prioritized order• Maintains state
Proxy-less server• April 30 to May 5• Bundle responses• Returns responses as
FIFO• No state
Connectivity duration
Mean connection duration: 35 secMean disconnection duration: 8 min
Thedu vs Proxy-less architecture
Thedu Stateless proxy
Delay until first relevant response
Extending Thedu
Can we use connectivity among buses to improve throughput?
Are we limited to academic search engines?• Convince commercial search providers to provide
relevance scores• Or, assign scores based on ranking
Are users really happy with search results and delay?
traces.cs.umass.edu
Simulation Results
Inter-meeting times