harvesting collective intelligence - hp labs · harvesting collective intelligence temporal...

Harvesting Collective Intelligence

Temporal Behavior in Yahoo Answers

Christina AperjisSocial Computing Lab

HP [email protected]

Bernardo A. HubermanSocial Computing Lab


Fang WuSocial Computing Lab


ABSTRACTWhen harvesting collective intelligence, a user wishes to

maximize the accuracy and value of the acquired information

without spending too much time collecting it. We empiri-

cally study how people behave when facing these conflicting

objectives using data from Yahoo Answers, a community

driven question-and-answer site. We take two complemen-

tary approaches.

We first study how users behave when trying to maximize

the amount of the acquired information, while minimizing

the waiting time. We identify and quantify how question

authors at Yahoo Answers trade off the number of answers

they receive and the cost of waiting. We find that users are

willing to wait more to obtain an additional answer when

they have only received a small number of answers; this im-

plies decreasing marginal returns in the amount of collected

information. We also estimate the user’s utility function

from the data.

Our second approach focuses on how users assess the qual-

ities of the individual answers without explicitly considering

the cost of waiting. We assume that users make a sequence

of decisions, deciding to wait for an additional answer as long

as the quality of the current answer exceeds some threshold.

Under this model, the probability distribution for the num-

ber of answers that a question gets is an inverse Gaussian,

which is a Zipf-like distribution. We use the data to validate

this conclusion.

1. INTRODUCTIONWhen searching for an answer to a question, people gen-

erally prefer to get both high quality and speedy answers.

The fact that it usually takes longer to find a better an-

swer creates a speed-accuracy tradeoff which is inherent in

information seeking. Stopping the search for information

early gives speed, while stopping the search for information

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00.

late gives accuracy. This paper studies how people behave

with respect to this tradeoff. A natural setting to study the

speed-accuracy tradeoff is Yahoo Answers.

Yahoo Answers is a community-driven question-and-answer

site that allows users to both submit questions to be an-

swered by the community and answer questions posed by

other users. With more than 21 million unique users in the

United States and 90 million worldwide, it is the leading

social Q&A site.

At Yahoo Answers, users post questions seeking to har-

vest the collective intelligence of others in the system. Once

a user submits a question, the question is posted on the site.

Other users can then submit answers to the question, which

are also posted on the site. When a question author is sat-

isfied with the answers he received, he closes the question,

and thus terminates his search for answers. The user then

uses information from the answers he received to build his

own final answer to his question.

There are two aspects that users value with respect to the

final answers they obtain: accuracy and speed, and thus they

try to maximize the accuracy of their final answers without

waiting too long. The accuracy of the final answer depends

on the accuracy of all individual answers that the question

received.

Anyone posting a question faces the following tradeoff at

any given point in time. He can either build his final answer

or wait. If he waits, he may achieve a higher accuracy in the

future, but also incurs a cost for waiting. The user wishes

to build his final answer at the optimal stopping time. We

take two complementary approaches to study users’ behavior

with respect to this stopping problem.

Our first approach studies the speed-accuracy tradeoff by

using the number of answers as a proxy for accuracy. In

particular, we assume that the user approximates the accu-

racy of his final answer by the number of answers that his

question gets. Thus, the user faces the following tradeoff:

he prefers more to less answers, but does not want to wait

too long. We analyze Yahoo Answers data to identify and

quantify this tradeoff. Our first finding is that users are will-

ing to wait more to obtain one additional answer when they

have only received a small number of answers; this implies

decreasing marginal returns in the number of answers. For-

mally, this implies a concave utility function in the amount

of information. We then estimate the utility function from

the data.

Our second approach considers the qualities of the individ-

ual answers without explicitly computing the cost of wait-

ing. We assume that users decide to wait as long as the

value of the current answer exceeds some threshold. Under

this model, the probability distribution for the number of

answers that a question gets is an inverse Gaussian, which

is a Zipf-like distribution. We use the data to validate this

conclusion.

The rest of the paper is organized as follows. In Section 2

we review related work. In Section 3 we describe Yahoo An-

swers focussing on the rules that are important for our anal-

ysis. In Section 4 we empirically study the speed-accuracy

tradeoff by using the number of answers as a proxy for ac-

curacy. In Section 5 we focus on how users assess quality.

We conclude in Section 6.

2. RELATED WORKThis paper studies behavior with respect to stopping when

people face speed-accuracy tradeoffs using Yahoo Answers

data. Three streams of research are thus related to our

work: (1) behavior with respect to stopping, (2) speed-

accuracy tradeoffs, and (3) empirical studies on Yahoo An-

swers. These are briefly discussed below.

The behavior of people with respect to stopping problems

has been studied extensively in the context of the secretary

problem [5]. In the classical secretary problem applicants are

interviewed sequentially in a random order, and the goal is

to maximize the probability of choosing the best applicant.

The applicants can be ranked from best to worst with no

ties. After each interview, the applicant is either accepted

or rejected. If the decision maker knows the total number of

applicants n, for large n the optimal policy is to interview

and reject the first n/e applicants (where e is the base of

the natural logarithm) and then to accept the next who is

better than these interviewed candidates [5].

Experimental studies of both the classical secretary prob-

lem and variants show that people tend to stop too early

and give insufficient consideration to the yet-to-be-seen ap-

plicants (e.g., [3]). On the other hand, when there are search

costs and recall (backward solicitation) of previously in-

spected alternatives is allowed, people tend to search longer

than the optimum [16]. We note a key difference with the

setting of information seeking: while in the secretary prob-

lem only one secretary can be hired, an information seeker

can combine information from multiple sources to build a

more accurate answer for his question. Moreover, in the

secretary problem the decision maker does not face a speed-

accuracy tradeoff because time does not affect his payoff.

The speed-accuracy tradeoff has been considered in vari-

ous settings. One example is a setting where a group coop-

erates to solve a problem [8]. In psychology, on the other

hand, the speed-accuracy tradeoff is used to describe the

tradeoff between how fast a task can be performed and how

many mistakes are made in performing the task (e.g., [15]).

There has been a number of empirical studies that use

data from Yahoo Answers and other question answering

communities. Data from Yahoo Answers have been used

to predict whether a particular answer will be chosen as the

best answer [1], and whether a user will be satisfied with the

answers to his question [11]. Content analysis has been used

to study the criteria with which users select the best answers

to their questions [10]. Shah et al. study the effect of user

participation on the success of a social Q&A site [13]. Aji

and Agichtein analyze the factors that influence how the Ya-

hoo Answers community responds to a question [2]. Finally,

various characteristics of user behavior in terms of asking

and answering questions have been considered in [6]. To the

best of our knowledge, there have been no studies that con-

sider user behavior in terms of the speed-accuracy tradeoff

in either Yahoo Answers or any other question answering

communities.

3. YAHOO ANSWERSIn this section we describe Yahoo Answers focussing on

the rules that are important for our analysis. We also briefly

describe the data we use.

Yahoo Answers is a question-and-answer site that allows

users to both submit questions to be answered and answer

questions asked by other users. Once a user submits a ques-

tion, the question is posted on the Yahoo Answers site.

Other users can then see the question and submit answers,

which are also posted on the site. According to standard Ya-

hoo Answers terminology, the user that asks the question is

called the asker, and a user answering is called an answerer.

In this paper we study the behavior of the asker, and thus

the word user is used to describe the asker.

Once the user starts receiving answers to his question, he

can choose the best answer at any point in time. After the

best answer to a question is selected, the question does not

receive any additional answers. We thus say that a user

closes the question when he chooses the best answer. Clos-

ing the question is equivalent to terminating the search for

answers to the question.

We expect that the user is satisfied with the answers he

received, when he closes the question. The user then uses

information from these answers to build his own final answer

to his question. Throughout the paper, we use the term final

answer to refer to the conclusion that the question author

draws by reading the answers to his question. The final

answer is not posted on the Yahoo Answers site, and is often

not recorded.

Questions have a 4-day open period. If a question does not

receive any answers within the 4-day open period, it expires

and is deleted. However, before the question expires, the

asker has the option to extend the time period a question

is open by four more days. The time can only be extended

once. In both datasets, most questions are open for less

than four days (96 hours). Throughout the paper we only

consider questions that were open for less than 100 hours.

If the asker does not choose a best answer to his ques-

tion within the 4-day open period, then the question is up

for voting, that is other Yahoo Answers users can vote to

determine the best answer to the question.

For the purposes of this paper we only consider questions

for which the best answer was selected by the asker. The

reason is that we are interested in the time that the asker

terminates his search for information by closing the question.

If the asker selects the best answer, this is the time that

the best answer was selected. On the other hand, if the

asker does not select a best answer, we have no relevant

information (we do not know when and whether the asker

built his final answer).

In this paper we use two Yahoo Answers datasets. Dataset

A consists of 81,832 questions and is a subset of a dataset

collected in early 2008 by Liu et al. [11]. Dataset B consists

of 1,536 questions and is a subset of a dataset crawled in

October 2008 by Aji and Agichtein [2]. We use subsets of the

originally collected data because we only consider questions

that were closed by the asker in less than 100 hours. For each

question in these datasets we know the time the question was

posted, the arrival time of each answer to the question, and

the time that the asker closed the question by selecting the

best answer.

One could argue that there is no reason for a user to close

his question before the 4-day open period is over. In par-

ticular, he could use the information from the answers he

has received up to now, and still wait until the four days

are over. However, users often want to get a final answer

and not have to rethink the question again. Figure 1 shows

histograms of the number of hours that questions were open

(before the asker chose the best answer). Notice that a high

percentage of questions closes within one day after the ques-

tion was posted: 38% for Dataset A and 29% for Dataset

B.

4. SPEED-ACCURACY TRADEOFFThere are various reasons for which users post questions at

Yahoo Answers. Users may need help in performing a task,

seek advise or support, or may be merely trying to satisfy

their curiosity about some subject. In any case, a user is

trying to use information from the answers he receives to

build his own final answer to his question. A user wants to

get an accurate final answer without waiting too long. The

accuracy of the user’s final answer is subjective and hard

to measure. In this section we use the number of answers

as an approximation of accuracy. Thus, we expect that a

user’s utility increases in the total number of answers that

his question receives, and decreases in the time he waits for

answers to arrive.

Section 4.1 develops our hypotheses drawing on a utility

model. In Section 4.2 we test our first hypothesis. In Section

4.3 we introduce a discrete choice model, which we estimate

in Section 4.4 to test the remaining hypothesis. Finally, in

Section 4.5 we discuss the form of the utility function.

4.1 Utility ModelLet n be the total number of answers at the time the

user builds his final answer. We assume that the user gets

utility u(n). Furthermore, we assume that the user incurs a

cost c(t) for waiting for time t. Thus, the user is seeking to

0 50 1000

5000

10000

Dataset A

question duration (in hours)

freq

uenc

y

0 50 1000

50

100

150

Dataset B

freq

uenc

y

question duration (in hours)

Figure 1: Histograms of the number of hours thatquestions were open (before the asker chose the bestanswer) for Dataset A and B.

If u(n+ 1)− u(n) < E[c(T )], then close the questionIf u(n+ 1)− u(n) > E[c(T )], then wait

Figure 2: Myopic decision rule.

maximize u(n)− c(t).Suppose that n answers have arrived. The user can either

terminate his search by choosing the best answer, or wait

for additional answers. If he terminates his search now, he

can build his final answer using the n answers that he has

received, and thus get utility u(n). If he chooses to wait and

a new answer arrives t time units later, then he will have

n + 1 answers, but will have also incurred a cost c(t) for

waiting. His utility will then be u(n+ 1)− c(t). The user is

better off stopping if u(n+1)−u(n) < c(t), and continuing if

u(n+1)−u(n) > c(t). In words, the user decides to close the

question if the cost of waiting for one more answer exceeds

the incremental benefit. On the other hand, the user decides

to wait for one more answer if the cost of waiting is smaller

than the incremental benefit of having one more answer.

Our previous description assumes that the user knows

when the next answer will arrive, which is not the case in

reality. More realistically, we can assume that the user uses

an estimate to calculate his cost. Thus, the user closes the

question if u(n + 1) − u(n) < c(τ), and waits for the next

answer if u(n+ 1)− u(n) > c(τ), where τ is now the user’s

estimate on how long he will have to wait until the next

answer arrives. More generally, let T be a random variable

that describes the user’s belief on how long it will take until

the next answer arrives. Then, the user is better off closing

the question if u(n+ 1)− u(n) < E[c(T )], and continuing if

u(n+ 1)− u(n) > E[c(T )].

The strategy we just described is myopic, since it assumes

that a user decides whether to wait (i.e., not to close the

question) by only considering whether he is better off wait-

ing for one more answer. Alternatively, if the user knew

when each answer is going to arrive in the future, we could

consider a global optimization problem: if the i-th answer

is expected to arrive at time ti, the user would choose to

close the question at the time tj that maximizes u(j)−c(tj).

However, in the context of Yahoo Answers it is impossible

for users to know when all future answers will arrive. It is

thus more realistic to assume that users myopically optimize

as randomness is realized.

We summarize the myopic decision rule in Figure 2. It

implies that a user is more likely to close the question when

u(n+ 1)− u(n) is small and/or E[c(T )] is large.

We next develop our hypotheses building on the myopic

decision rule. Our hypotheses can be grouped in two cate-

gories. The first category is based on the assumption that

the marginal benefit of having one more answer decreases

as more answers arrive; the second considers how users esti-

mate when the next answer will arrive.

The user’s valuation for having n answers is u(n). We

expect that u(n) is concave, i.e., the marginal benefit of

having one more answer decreases as the number of answers

increases. According to the myopic decision rule (Figure

2), the user is more likely to close his question when u(n+

1) − u(n) is small. Since we expect that u(n + 1) − u(n) is

decreasing in n, the user is more likely to close his question

when n is large, i.e., when he has already received a large

number of answers. We test this in two ways, outlined in

Hypotheses 1 and 2.

Hypothesis 1. The amount of time that a user waits be-

fore closing his question is decreasing in the number of an-

swers that the question has received.

Hypothesis 2. A user is more likely to close his question

if the question has received many answers.

The user believes that the time until the next answer ar-

rives is described by some random variable (which can be

degenerate if he is only using an estimate). It is reasonable

to assume that a user forms his belief using the information

available to him: the arrival times of previous answers and

the time he has waited since the last answer arrived.

A particularly important summary statistic is the last

inter-arrival time, i.e., the time between the arrivals of the

two most recent answers. The last inter-arrival time is an

estimate of the inverse current arrival rate of answers. Thus,

the user may use the last inter-arrival time as an estimate of

the next inter-arrival time, i.e., the time between the arrival

of the last answer and the next answer. More generally, the

user may form a belief on the next inter-arrival time that

depends on the last inter-arrival time on some increasing

fashion. Then, if the last inter-arrival time is large, the user

expects to wait a long time until he receives another answer,

thus incurring a large waiting cost. This encourages the user

to close the question now. This is the context of Hypothesis

3.

Hypothesis 3. A user is more likely to close his question

if the last inter-arrival time is large.

This hypothesis is based on the assumption that the last

inter-arrival time may be used as an estimate for the next

inter-arrival time. However, if a long time has elapsed since

the last answer arrived (e.g., a longer period than the last

inter-arrival time), the user becomes less certain about this

estimate. The increased uncertainty may lead him to expect

a greater waiting cost until the next answer arrives. In turn,

this encourages the user to close the question, as is outlined

in Hypothesis 4.

Hypothesis 4. A user is more likely to close his ques-

tion if a long time has elapsed since the most recent answer

arrived.

Hypothesis 1 is tested in Section 4.2. Then, in Section 4.3

we introduce a discrete choice model, which we estimate in

Section 4.4 to test Hypotheses 2, 3, and 4.

4.2 Time Between Last Arrival and ClosureIn this section we test whether a user waits longer before

closing his question when the question has received a small

number of answers (Hypothesis 1).

Dataset A Dataset BCorrelation coefficient -0.126*** -0.148***

95% conf interval [-0.132, -0.119] [-0.196, -0.098]Observations 81,832 1,536

Table 1: Correlation between the number of answers(TotalAnswers) and the time that the user waits be-fore closing the question (ElapsedTime). *, ** and*** denote significance at 1%, 0.5% and 0.1% re-spectively.

0 5 10 15

20

30

40

50

mea

n[E

laps

edT

ime]

TotalAnswers

Figure 3: Mean elapsed time between the time thatthe last answer arrived and the time that the ques-tion was closed (in hours) for Dataset A. The hori-zontal axis shows the total number of answers at thetime that the question was closed.

For every question we consider the following variables:

• TotalAnswers: the total number of answers that the

question received. This is the number of answers at

the time that the asker closed the question.

• ElapsedTime: the time between the arrival of the last

answer and the time the user closed the question.

We test for correlation between TotalAnswers and Elapsed-

Time. The results are presented in Table 1. For both

datasets, we find that TotalAnswers and ElapsedTime are

negatively correlated, bringing support for Hypothesis 1.

Figure 3 shows the mean elapsed time between the time

that the last answer arrived and the time that the user chose

the best answer in hours (i.e., the mean value of Elapsed-

Time) as a function of the total number of answers at the

time (TotalAnswers).1 We observe that the time between

the last answer and the closure of the question decreases as

the number of answers increases. For instance, users wait

more than 50 hours on average before closing their questions

91We do not include questions with more than 18 answers inFigure 3, because for popular questions we have less observationsin the dataset, resulting in more noise in the mean value of Elapsed-Time. We note however that the correlation test of Table 1 is basedon all the questions in the datasets.

when they have only received one answer, while they wait

less than 35 hours on average before closing their questions

when they have received five answers.

Both Table 1 and Figure 3 suggest that the user is will-

ing to wait more (and incur more cost from waiting) for

an additional answer if only a few answers have arrived up

to now. This implies that the marginal benefit of having

one additional answer decreases as the number of answers

increases.

4.3 Model SpecificationIn this section we introduce a logit model, which is esti-

mate in Section 4.4 to test Hypotheses 2, 3, and 4.

A user posts a question. Then, at various points in time

he revisits Yahoo Answers to see the answers that his ques-

tion has received, and decides whether to close the question

by selecting the best answer. We are interested in the prob-

ability that the user closes the question during a given visit.

For every visit we consider the following variables:

• p: the probability that the user closes the question

during the visit.

• n: the number of answers that the question has re-

ceived by the time of the visit.

• l: the last inter-arrival time, i.e., the time between the

arrivals of the two most recent answers (at the time of

the visit).

• w: the time since the last answer arrived, i.e., the time

that the user has been waiting for an answer since the

last arrival. This is equal to the difference between the

time of the visit and the arrival time of the most recent

answer.

The number of observations per question depends on the

inter-arrival times between answers.

Recall the utility model introduced in Section 4.1. If the

user closes the question at n answers, his utility is u(n).

Suppose that the user believes that the next answer will ar-

rive in time T , where T is some random variable. Moreover,

we assume that the user uses the last inter-arrival time l and

the time since the last answer w to form his belief; that is

T depends on l and w, and we write T (l, w). Then, the user

expects to obtain utility u(n+ 1)−E[c(T (l, w))] from wait-

ing. According to the myopic decision rule (Figure 2), the

user decides whether to close the question or not depending

on which of the expressions u(n), u(n+ 1)−E[c(T (l, w))] is

larger.

We now perturb u(n) and u(n + 1) − E[c(T (l, w))] with

some noise. In particular, we assume that the user’s utility

is

u(n) + ε0

if he closes the question, and

u(n+ 1)− E[c(T (l, w))] + ε1

if he waits for the next answer. Suppose that ε0 and ε1 are

assumed to be independent type 1 extreme value distributed.

Then, the difference ε1− ε0 is logistically distributed. Thus,

the probability of closing the question is

p = Pr[u(n) + ε0 > u(n+ 1)− E[c(T (l, w))] + ε1]

= Pr[ε1 − ε0 < E[c(T (l, w))]− (u(n+ 1)− u(n))]

= Λ(E[c(T (l, w))]− (u(n+ 1)− u(n))),

where

Λ(z) =1

1 + e−z

is the logistic function.

The previous argument gives rise to the logit model, a

standard discrete choice model in microecomics (see e.g.,

[4]).

In Section 4.4 we estimate the following model:

p = Λ(α+ β1n+ β2l + β3w), (1)

so that

E[c(T (l, w))]− (u(n+ 1)− u(n)) = α+ β1n+ β2l + β3w.

This implies that the marginal benefit of having one more

answer when n answers have arrived is

u(n+ 1)− u(n) = αu − β1n (2)

and the expected cost of waiting for the next answer is

E[c(T (l, w))] = αc + β2l + β3w (3)

such that

αc − αu = α.

Equations (2) and (3) are used in Section 4.5 to interpret

the estimated parameters of (1).

4.4 Model EstimationIn this section we use logistic regression to estimate (1),

i.e., we estimate the probability that a user closes his ques-

tion as a function of (i) the number of answers (n), (ii) the

last inter-arrival time (l), and (iii) the time that the user has

waited since the last answer arrived (w). We find that the

probability of closing the question increases with all three

variables, supporting Hypotheses 2, 3, and 4 respectively.

We estimate (1) assuming that users visit Yahoo Answers

to check for new answers to their question every hour after

the last answer arrived. The maximum likelihood estima-

tors are given in Tables 2 and 3. All parameter estimates

are statistically significant at the 0.001 level. We also used

a generalized additive model [7] to fit the data, which sug-

gested that the assumed linearity in (1) is a good fit for the

data.

It is worth noting that our results do not heavily depend

on our assumption that users check for new answers every

hour. We get similar estimates for β1, β2 and β3, if we

assume that users check for answers every 2 hours, every 5

hours, or every 30 minutes. For instance, if we assume that

Dataset A Dataset Bα -4.006*** (0.0116) -4.469*** (0.0064)β1 0.031*** (0.0016) 0.036*** (0.0006)β2 0.025*** (0.0005) 0.029*** (0.0002)β3 0.018*** (0.0002) 0.021*** (0.0001)

Observations 1,104,568 53,413

Table 2: The effect of the number of answers, thelast inter-arrival time, and the time since the lastanswer on the probability of closing the questionwith p as the dependent variable. The questionsof Datasets A and B with less than 20 answers areconsidered in these regressions. *, ** and *** de-note significance at 1%, 0.5% and 0.1% respectively.Standard errors are given in parenthesis.

Estimateα -4.408*** (0.0603)β1 0.027*** (0.0005)β2 0.028*** (0.0002)β3 0.021*** (0.0001)

Observations 54,914

Table 3: The effect of the number of answers, thelast inter-arrival time, and the time since the lastanswer on the probability of closing the questionwith p as the dependent variable for Dataset B. *,** and *** denote significance at 1%, 0.5% and 0.1%respectively. Standard errors are given in parenthe-sis.

users check for new answers every 2 hours, we get

(β1, β2, β3) = (0.026, 0.027, 0.022)

instead of

(β1, β2, β3) = (0.027, 0.028, 0.021)

for Dataset B.

Due to a sampling bias, Dataset A contains dispropor-

tionately many questions with more than 20 answers. To de-

crease the effect of this bias in the logistic regression we only

considered questions with less than 20 answers for Dataset

A. For Dataset B we run two logistic regressions: in one we

only use questions with less than 20 answers (for the sake

of comparison with Dataset A in Table 2); in the other we

consider the whole dataset. The latter is shown in Table 3.

We observe that the estimates we get for the two datasets

are close to each other. This suggests that our regressions

are capturing how users behave with respect to when they

close their questions, and that this behavior did not signifi-

cantly change in the months between the times that the two

datasets were collected.

We can now draw qualitative conclusions by considering

the signs of the estimated coefficients. We use the fact that

the sign of a coefficient gives the sign of the corresponding

marginal effect (since the logistic function is increasing).

First, the probability of closing the question is greater

when more answers have arrived, which supports Hypothesis

2. This implies that the marginal benefit of having one addi-

tional answer decreases as the number of answers increases,

and is consistent with Table 1 and Figure 3 of Section 4.2.

Second, the probability of closing the question is greater

when the last inter-arrival time is greater, which supports

Hypothesis 3. The inverse of the inter-arrival time gives

the rate at which answers arrive. Thus, when the last inter-

arrival time is large, i.e., there is a time gap between the last

answer and the answer before it, the user may expect that

he will have to wait a long time until he receives the next

answer. This perceived high cost of waiting may encourage

the user to close the question sooner when the last inter-

arrival time is large.

Third, the probability of closing the question is greater

when more time has elapsed since the last answer, which

supports Hypothesis 4. As more time elapses since the last

answer, the uncertainty increases, since the user does not

know when the next answer will arrive. The increased un-

certainty may lead him to expect a greater waiting cost until

the next answer arrives. In turn, this encourages the user to

close the question.

4.5 Utility and CostThe following lemma establishes a specific quadratic form

for the utility function.

Lemma 1. If (2) holds, then

u(n) =

(αu +

β1

2

)n− β1

2n2 + u(0). (4)

Proof. If (2) holds, then

u(n) = u(0) +

n−1∑i=0

(αu − β1i)

=

(αu +

β1

2

)n− β1

2n2 + u(0)

We observe that the utility function given in (4) is con-

cave on [0,∞) for any value of αu as long as β1 > 0, which

is the case for all the logistic regressions we run in Section

4.4. Moreover, for any fixed β1 > 0 and αu > 0, the utility

is unimodal: it is initially increasing (for n < bαu/β1 +0.5c)and then decreasing (for n > dαu/β1+0.5e). The latter may

occur due to information overload; after a very large num-

ber of answers, the benefit of having one more answer may

be so small that the cost of reading it exceeds the benefit,

thus creating a disutility to the user.2 Nevertheless, since

questions at Yahoo Answers rarely get a very large number

of answers, the utility function given by (4) may be increas-

ing throughout the domain of interest if α/β1 is sufficiently

large.

We note that from (1) we estimate β1 and α, but cannot

estimate αu. For the sake of illustration we plot the esti-

mated utility u(n) from Dataset A (i.e., with β1 = 0.031)

92The myopic decision rule is consistent with a utility functionu(n) that is decreasing for large values of n. For those values of nthe cost of waiting clearly exceeds the benefit, and the user decidesto close the question.

0 10 20 30 40 500

50

100

150

n

u(n)

Estimated u(n) for Dataset A

α

u = 1

αu = 2

αu = 3

αu = 4

Figure 4: Estimated u(n) from Dataset A for αu ∈{1, 2, 3, 4}.

for various values of αu in Figure 4. Since αu = −α + αc

and in this case α ≈ −4, we consider αu ∈ {1, 2, 3, 4} in the

plots; we also assume that u(0) = 0. A reasonable domain

to consider is [0, 50], since questions rarely get more than 50

answers. We observe that when αu is small (e.g., αu = 1),

then the estimated u(n) is decreasing for large values of n

within the [0, 50] region; suggesting an information overload

effect. On the other hand, for αu ∈ {2, 3, 4}, the estimated

utility function is increasing throughout [0, 50]. Moreover,

as αu increases, the curvature of the estimated utility de-

creases, something we can also conclude from (4).

We next consider the cost of waiting. Equation (3) sug-

gests that the expected cost increases linearly in both the

last inter-arrival time l and the time since the last answer

arrival w. However, it is not possible to get a specific form

for the function c(t) in the way we did for u(n) in Lemma

1, because we do not have any information on T (l, w). If

we make assumptions on T (l, w), we can draw conclusions

about c(t). For instance, if we assume that the user is using

a single estimate τ(l, w) on the time until the next answer

arrives, then (3) implies that

c(τ(l, w)) = αc + β2l + β3w.

Moreover, if the estimate τ(l, w) is linear in l and w we con-

clude that the cost of waiting is linear. On the other hand,

a concave cost would be consistent with a convex estimate,

and a convex cost would be consistent with a concave esti-

mate.

5. ASSESSING QUALITYOur previous analysis considers how the decision problem

of the user depends on the number of answers and time.

There is a third aspect that affects the user’s decision to close

his question: the quality of the answers that have arrived

up to now. In this section, we use an alternative model that

incorporates quality, but does not incorporate time and the

number of answers in the detail of Section 4. Our approach

here is inspired by [12, 9].

Let Xn be the value of the n-th answer. This is in general

subjective, and depends on the asker’s interpretation and

assessment. We assume that the value of an answer depends

on both its quality and on the time that the user had to

wait to get it. For instance, Xn may be negative if the

waiting time was very large and the answer was not good

(according to the user’s judgement). We model the values

of the answers as a random walk, and assume that

Xn+1 = Xn + Zn, (5)

where the random variables Zn are independent and iden-

tically distributed. For instance, if the user just got a high

quality answer, he believes that the next answer will most

likely also have high quality. Similarly, if the user did not

have to wait long for an answer, he expects that the next

answer will probably arrive soon. We note that (5) is con-

sistent with the availability heuristic [14].

Every time that a user sees an answer, he derives utility

that is equal to the answer’s value. We assume that the

user discounts the value he receives from future answers ac-

cording to a discount factor δ. Let V (x) be the maximum

infinite3 horizon value when the value of the last answer is

x. Then,

V (x) = x+ max{0, δ · E(V (x+ Z))}.

In particular, the user decides to close the question if the

value of closing exceeds the value of waiting for an addi-

tional answer. If he closes the question, the user gets no

future answers, and thus gets future value equal to 0. On

the other hand, if the user does not close the question, he

gets value E(V (x + Z)) in the future, which he discounts

by δ. Depending on which term is greater, the user decides

whether to close the question or not.

We observe that V (x) is increasing in x, which implies

that E(V (x+Z)) is increasing in x. We conclude that there

exists a threshold x∗ such that it is optimal for the user

to stop (i.e., close the question) when the value of the last

answer is smaller than x∗ and to continue when the value of

the last answer is greater than x∗. The threshold x∗ satisfies

E(V (x∗ + Z)) = 0.

From an initial answer value, the user waits for additional

answers, with values following a random walk as specified

by (5), until the value of an answer first hits the threshold

value. Thus, the number of answers until the user terminates

the search is a random variable. In the limit of true Brown-

ian motion, the first passage times are distributed according

to the inverse Gaussian distribution. Then, the probability

density of the number of answers to a question is given by

f(x) =

√λ

2πx−3/2 exp

(− λ

2µ2x(x− µ)2

), (6)

where µ is the mean and λ is a scale parameter. We note

that the variance is equal to µ3/λ.

93It is straightforward to extend the results to a finite horizonproblem.

0 10 20 300

0.2

0.4

0.6

0.8

1

answers

cdf

empiricalfitted

Figure 5: Empirical and inverse Gaussian fitted cu-mulative distributions for Dataset B. The points arethe empirical cumulative distribution function of thenumber of answers. The curve is the cumulativedistribution function of the maximum likelihood in-verse Gaussian.

We use Dataset B to test the validity of (6). We find that

the maximum likelihood inverse Gaussian has µ = 6.1 and

λ = 5.8. Figure 5 shows the empirical and fitted cumulative

distribution functions. We observe that the inverse Gaussian

distribution is a very good fit for the data.

An important property of the inverse Gaussian distribu-

tion is that for large variance, the probability density is well-

approximated by a straight line with slope -3/2 for larger

values of x on a log-log plot; thus generating a Zipf-like dis-

tribution. This can be easily seen by taking logarithms on

both sides of (6). In Figure 6 we plot the frequency distribu-

tion of the number of answers on log-log scales. We observe

that the slope at the tail is approximately -3/2.

6. CONCLUSIONThis paper empirically studies how people behave when

they face speed-accuracy tradeoffs. We have taken two com-

plementary approaches.

Our first approach is to study the speed-accuracy tradeoff

by using the number of answers as a proxy for accuracy. In

particular, we assume that the user approximates the ac-

curacy of his final answer by the number of answers that

his question gets. Thus, the user faces the following trade-

off: he prefers more to less answers, but does not want to

wait too long. We analyze Yahoo Answers data to identify

and quantify this tradeoff. We find that users are willing

to wait longer to obtain one additional answer when they

have only received a small number of answers; this implies

decreasing marginal returns in the number of answers, or

equivalently, a concave utility function. We then estimate

the utility function from the data.

Our second approach focuses on how users assess the qual-

ities of the individual answers without explicitly considering

0 1 2 30

2

4

6lo

g(fr

eque

ncy)

log(answers)

Figure 6: The frequency distribution of the numberof answers on log-log scales for Dataset B.

the cost of waiting. We assume that users make a sequence

of decisions to wait for another answer, deciding to wait as

long as the current answer exceeds some threshold in value.

Under this model, the probability distribution for the num-

ber of answers that a question gets is an inverse Gaussian,

which is a Zipf-like distribution. We use the data to validate

this conclusion.

It remains an open question how to combine these two

approaches in order to study the speed-accuracy tradeoff by

jointly considering the number of answers, their qualities,

and their arrival times.

We conclude by noting that our results could be used by

Yahoo Answers or other question answering sites to priori-

tize the way questions are shown to potential answerers in

order to maximize social surplus. The key observation is

that a question receives answers at a higher rate when is

it shown on the first page at Yahoo Answers. On the other

hand, the rate at which answers are received also depends on

the quality of the question. Using appropriate information

about these rates as well as the utility function estimated

in this paper, the site can position open questions with the

objective of maximizing the sum of users’ utilities.

7. ACKNOWLEDGEMENTSWe gratefully acknowledge Eugene Agichtein for providing

the datasets [11, 2], as well as detailed information on how

the data was collected.

8. REFERENCES[1] L. A. Adamic, J. Zhang, E. Bakshy, and M. S.

Ackerman. Knowledge sharing and yahoo answers:

Everyone knows something. In WWW2008, 2008.

[2] A. Aji and E. Agichtein. Exploring interaction

dynamics in knowledge sharing communities. In

Proceedings of the 2010 International Conference on

Social Computing, Behavioral Modeling and Prediction

(SBP 2010), 2010.

[3] J. N. Bearden, A. Rapoport, and R. O. Murphy.

Sequential observation and selection with

rank-dependent payoffs: An experimental study.

Manage. Sci., 52(9):1437–1449, 2006.

[4] C. A. Cameron and P. K. Trivedi. Microeconometrics :

Methods and Applications. Cambridge University

Press, 2005.

[5] E. B. Dynkin. The optimum choice of the instant for

stopping a markov process. In Sov. Math. Dokl.,

volume 4, 1963.

[6] Z. Gyongyi, G. Koutrika, J. Pedersen, and

H. Garcia-Molina. Questioning yahoo! answers. In

Proceedings of the First Workshop on Question

Answering on the Web, 2008.

[7] T. Hastie, R. Tibshirani, and J. Friedman. The

Elements of Statistical Learning: Data Mining,

Inference, and Prediction, Second Edition (Springer

Series in Statistics). Springer, 2nd ed. 2009. corr. 3rd

printing edition, September 2009.

[8] B. A. Huberman. Is faster better? A speed-accuracy

law for problem solving. Technical report.

[9] B. A. Huberman, P. L. T. Pirolli, J. E. Pitkow, and

R. M. Lukose. Strong regularities in world wide web

surfing. Science, 280(5360):95–97, April 1998.

[10] S. Kim, J. S. Oh, and S. Oh. Best-answer selection

criteria in a social q&a site from the user-oriented

relevance perspective. In Proceedings of the 70th

Annual Meeting of the American Society for

Information Science and Technology (Milwaukee),

2007.

[11] Y. Liu, J. Bian, and E. Agichtein. Predicting

information seeker satisfaction in community question

answering. In Proceedings of the 31st Annual

International ACM SIGIR Conference on Research

and Development in Information Retrieval (SIGIR

2008), 2008.

[12] R. M. Lukose and B. A. Huberman. Surfing as a real

option. In Proceedings of the first international

conference on Information and computation

economies, pages 45–51, Charleston, South Carolina,

USA, 1998. ACM Press.

[13] C. Shah, J. S. Oh, and S. Oh. Exploring

characteristics and effects of user participation in

online social Q&A sites. First Monday, 13(9), 2008.

[14] A. Tversky and D. Kahnenman. Availability: a

heuristic for judging frequency and probability.

(5):207–232.

[15] W. A. Wickelgren. Speed-accuracy tradeoff and

information processing dynamics. Acta Psychologica,

41(1):67–85, February 1977.

[16] R. Zwick, A. Rapoport, A. K. C. Lo, and A. V.

Muthukrishnan. Consumer search: Not enough or too

much? Experimental 0110002, EconWPA, Oct. 2001.

harvesting collective intelligence - hp labs · harvesting collective intelligence temporal...

Documents