aruna balasubramanian, yun zhou, w bruce croft, brian n levine and arun venkataramani department of...

Post on 08-Jan-2018

222 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Connectivity characteristics of testbeds Goal: Build web search in the presence of frequent disconnections and small connectivity duration

TRANSCRIPT

Aruna Balasubramanian, Yun Zhou, W Bruce Croft,Brian N Levine and Arun Venkataramani

Department of Computer Science, University of Massachusetts, Amherst

Web Search From a Bus

Why web search from a bus?

Open access point commonly available

Intermittent internet connectivity from vehicles possible• no subscription cost• useful when no other connectivity is available

Web search 2nd most common web activity (survey by pewinternet.org)

Connectivity characteristics of testbeds

Goal: Build web search in the presence of frequent disconnections and small connectivity duration

Web search process<your favorite search engine>

Retrievin

g web….

Retrievin

g images…

Retrieving….

Adapting to vehicular network

Why challenging?

Interactive• several exchanges between user and search engine

needed

Results imprecise• response may not be relevant• difficult to measure relevance

Thedu: Proxy Architecture: sustain interactionIR contribution: increase usefulness of returned response

Thedu proxy

Between vehicle and search engine

When proxy receives query request from vehicle• retrieves urls and snippets• prefetches URL contents including images• stores responses and maintains state

When vehicle connects to proxy• downloads pending responses

Client and proxy architecture

USER

Web interface

Store query

Process response

Client-side Vehicle Server-side Proxy

Queries for vehicle

Fetch URL/images

Prioritize response

Pending responses

Search engine

Web site

Inte

rmitt

ent

conn

ectiv

ity

New queries

Queries

ResponsebundlesResponses

How to prioritize?

Search engines use relevance scores to rank responses• scores not comparable across queries

Even if response is relevant it may not be useful• Query “chants 2007” needs only one response

Thedu• Normalize relevance scores: Comparable across queries• Classify query-type: To capture user intent

http://www.netlab.hut.fi/chants-2007/

Query-Type classification

Query-type classification• Homepage query: “cnn”, “chants 2007”• Non-homepage query: “Harry potter review”

Thedu classifies using URL, snippet and title field• E.g., “chants 2007” on Google• <url> http://www.netlab.hut.fi/chants-2007 </url>• <snippet> Welcome to the home page of the ACM MobiCom

workshop on Challenged Networks (CHANTS 2007). </snippet>• <title> chants workshop </title>

Homepage Non HomepageQuery terms occur in URL Query is in question form

All query terms occur in title or snippet

Top URL is wikipedia

Less than 3 words Length greater than 3 words

URL is root

Relevance score normalization

Modified language model framework

D: Document, Q: Query, C: Collection

Normalized score

Kullback-Leibler divergence (distance between Q and D)

Probability of word occurring in document

Probability of word occurring in collection

Thedu protocol

1. Sort responses in the order of normalized score

2. For response r for query q,

2a. Update

2b. If q is homepage query and do not send

2c. Else send response to vehicle

: expected relevance of all response sent for a query q

: probability that r is relevant for q

Evaluation goals

What is the delay in getting search results?

How many results were relevant to the user?

Evaluation Tools

DieselNet

Indri search engine

TREC (Text Retrieval Conference)• Predefined web data collection (10G)• Predefined set of queries (100 homepage + 50 content)• Relevance judgments (which documents are relevant for query)

Thedu’s query-type classifier accuracy: 88%

Deployment on DieselNet

Thedu vs Proxy-less server

Thedu• March 26 to March 30• Bundle responses• Returns responses in

prioritized order• Maintains state

Proxy-less server• April 30 to May 5• Bundle responses• Returns responses as

FIFO• No state

Connectivity duration

Mean connection duration: 35 secMean disconnection duration: 8 min

Thedu vs Proxy-less architecture

Thedu Stateless proxy

Delay until first relevant response

Extending Thedu

Can we use connectivity among buses to improve throughput?

Are we limited to academic search engines?• Convince commercial search providers to provide

relevance scores• Or, assign scores based on ranking

Are users really happy with search results and delay?

traces.cs.umass.edu

Simulation Results

Inter-meeting times

top related