costly sequential product search and the in uence of rankings · the pros and cons of continuing...
TRANSCRIPT
Costly Sequential Product Search and the Influence of
Rankings
Jack-William Barotta
December 17, 2019
Introduction
In the following paper, I will discuss the effects of product rankings on consumer behavior.
Namely, I will be discussing both the theoretical prediction and observed behavior of a
consumer in an effort to quantify the effect of ranking relevant products on overall click
through and buy-rates. I will first begin with motivation behind the idea of costly search.
I will then discuss the theoretical model proposed by Weitzman applied to ordered lists
on e-commerce sites such as Expedia. I will then show the main empirical results that
have been found in relation to consumer spending patterns and searching. Finally, I will
conclude by demonstrating how the theoretical predictions of the Weitzman model motivate
the results that are observed empirically in a natural field experiment conducted by the
company Expedia on their online platform. In doing so, we will potentially be able to pinpoint
the effect of search costs on the consumer’s utility whilst also discussing the limitations of the
model and potential improvements. I will now begin introducing the theoretical framework
of the problem that follows in close derivation to that of Weizmann 1979[7] and Ursu 2017[6].
1
The Mathematical Model
In the following section of this paper, I will set up the mathematical model that will be
utilized for a sequential search as we see in many online platforms that list their products.
The most common website that do this are Amazon and Ebay [2]. After the mathematical
model is set up however, we will apply the formalism in the model with the field experiment
done over at Expedia in an effort to advance the theory with the data. Note that the model
that I will be following here is largely based off of the model presented in Ursu 2017 [6].
We will consider the single consumer problem. Suppose the consumer visits some website,
S, that organizes each of their search results as an ordered list that contains, L products.
With this organizational structure, each product, l ∈ L, has some information displayed in
the list and an additional amount of information available to the consumer if the consumer
decides to go ahead and click the product. This is exactly how Amazon search results are
classified. On Amazon, consumers are generally offered a picture of the product, pricing,
and small amounts of information on the initial search result page that allows the consumer
to make some judgement on the exact valuation of the product. In addition, the consumer
can click on the item to learn more information, browse customer reviews, or view more
images to gain an even sharper opinion on the product. We will formalize this approach to
the online shopping model by denoting the prior level of utility that the object appears to
offer via the initial search results as vl [6]. In addition, we will denote the level of utility
that the object offers to the consumer after viewing more information as the posterior utility
level, ul. As such, we have the following relationship:
ul = vl + εl where εl ∼ N(0, σ2l ) (1)
We capture the change in valuation from the prior to posterior utility for the lth good
by εl, which is normally distributed with mean zero and standard deviation σl. This is
a modeling assumption here [6]. The motivation behind this is that after learning more
information on product l, the utility gained from purchasing item l can go up or down due
2
to either positive/negative reviews or helpful/unhelpful additional information. As such,
on average, we expect that the utility change is zero, in expectation, but with appropriate
fluctuations about the mean [1]. As such, the decision to open the additional information
tab, click on the product, the consumer gains εl utility.
The final component of our modeling is that of the search cost. We have that opening
each of the additional pages opens makes the consumer incur some search cost, cl for the lth
for the lth good. In addition, in spirit of the paper by Chen, I will model the search cost
more generally as a function of the position of the item, pl. Therefore, we have that in order
for the consumer to gain the additional information and more importantly, the additional
utility of the lth good, εl, the consumer incurs cost cl(pl) [1]. The search cost will satisfy:
cl(pl) > 0 (2)
c′l(pl) > 0 (3)
The justification for the first one is there in order to not make the following solution degen-
erate. Namely, we have that the consumer actually incurs some positive search cost from
opening an additional information tab in order to give incentive to the consumer to weighing
the pros and cons of continuing the search process for the item. With a zero or even negative
search cost, the consumer could search all day in order to find the item with the highest
posterior utility leading to a degenerate solution.
In addition, we have the second condition yielding that the search costs are increasing as
a function of position. This can be justified due to the fact that the goods with positions at
the start of search results are easily accessible and require little to no scrolling or additional
time to get to compared to items in positions further down the page. The search cost turns
out to only be a function of position [6]. The authors ruled out the effects of other parameters
in our problem such as the utility gained from accessing the additional information, εl of
even the prior utilities, vl. Finally, in order to stay in line with Weitzman’s paper, we will
assume there is the outside option or simply selecting no product, and as such, incurring
3
no search cost of looking for this outside option [7]. We will give this outside option some
constant utility, say v0. I will now move on to quantify the optimal search that this model
leads to.
Quantifying the Optimal Search
We will consider the option the consumer faces after already having made some non-
negative amount of searches [6]. Suppose that the consumer has searched some amount of
products on the page, and she is now deciding whether or not she should choose to select one
of the searched options or instead continue searching. Thus, the consumer faces the trade-off
between stopping the search and choosing an ”opened”, searched, item or incurring the fixed
cost and looking at another [1]. The guiding quantity in this scenario actually turns out to
be exactly what was derived in Weitzman’s 1979 paper, namely we turn to the reservation
utility.
Consider good l with reservation utility, zl. The reservation utility is defined isomorphic
to a reservation price, namely the reservation utility is the utility level of good l such that
the consumer is indifferent between searching for additional information on good l or not
search at all [7]. As such, we can represent this as:
cl(pl) =
ˆ ∞zl
(ul − zl)f(ul)dul (4)
where f(ul) is the probability density function of uj. As a note, we actually know this
quantity given our modeling assumption. We have that, ul = vl+εl which leads to E [ul] = vl
and var(ul) = σ2l . As such, we can make a transformation of our utility, ul to a standard
normal distribution by defining:
ζ =ul − vlσl
(5)
This quantity will come up again as we solve the integral utilizing both the PDfs and CDfs of
the normal distribution. As such, we can proceed in solving the integral through clever tricks
as demonstrated in Kim et. al. 2010 [4]. Throughout the next sequence of mathematics,
4
I will utilize standard notation for random variables, with F (·) denoting the CDF of ul,
f(·) denoting the PDF of ul, with corresponding Φ(·) and φ(·) as the CDF and PDF of the
standard normal distribution, ζ ∼ N(0, 1). Deriving for an expression of the search cost, cl1
cl =
ˆ ∞zl
(ul − zl)f(ul)dul (6)
I can now multiply and divide by a constant, 1− F (zl) which captures the probability that
that the posterior utility is greater than or equal to the reservation utility level, zl.
cl = (1− F (zl))
ˆ ∞zl
(ul − zl)f(ul)
1− F (zl)dul (7)
Now computing the integral, and substituting in my standardized forms of the normal dis-
tribution, we have that:
cl =
(1− Φ
(zl − vlσl
))vl − zl + σlφ(zl−vlσl
)1− Φ
(zl−vlσl
) (8)
We can now deploy a change of coordinates in order to better interpret the result present.
We can define, χl = zl−vlσl
, xl = clσl
, and λ = φ(χl)1−Φ(χl)
. Doing so, we can re-express equation 12
as:
xl = (1− Φ(χl))(λ(χl)− (χl)) (9)
We can now utilize this expression to characterize the solution. Our term λ cropping up in
the expression for the effective search cost, xl, is analogous to hazard rate. The hazard rate
measures the rate of death, or stoppage of such a contract, namely searching for additional
information as a function of our nondimensionalized number χ [6]. As such, we now have
an equation that is solvable for xl. Thus, we could, if necessary, either with the assistance
of computational resources or if analytic, could solve this equation either for xl or χl. Since
we have xl = g(χl) where g is some invertible function. This is incredibly helpful. Since,
1Note that the argument is dropped for ease of exposition. The choice for the argument will be chosenas position as mentioned before.
5
we have such an invertible equation, we can actually solve for our reservation utility, zl by
solving for χl and then noting that:
zl = vl + σlχl (10)
As such, we have shown an analytical way of deriving a reservation utility for the lth good.
We can now take our result for the reservation wage as solvable and apply the optimal search
theorems that are formulated in Weitzman’s 1979 work.
Now having a way of deriving a reservation utility for the ordered goods available to the
consumer, we can state two more rules that come as a direct consequence. First, we have that
a given consumer will stop searching through remaining objects when the maximum utility
observed already, namely the posterior utilities available, exceeds the reservation utility of
any non-searched options [7]. We have this in place, as a means to stop an infinite search.
Since all potential goods that could be realized have smaller levels of utility in comparison
to posterior utilities realized, the consumer can be no better off by searching the remaining
goods. As such the consumer decides to stop the search. Second, we have a rule for once
the search has been stopped. Given the same consumer, that has now stopped searching
through goods, and has a set of posterior utilities of goods available to her, she will choose
the options with the highest utility among those where the posterior utility has been realized
[7]. This is intuitive and similar to the aforementioned rule in the sense that let S denote
the set of posterior utilities known to the consumer. The consumer will then choose s ∈ S
such that:
ps = maxs∈S
us (11)
where ps is the product that is bought/chosen from the set of searched goods.
Outside the mathematical formalism, we simply have developed the notion that the
consumer will 1. have some reservation utilities known to her, 2. compute a search through
goods available until we have reached our first rule of search, and 3. choose the best product
from those searched. We have now developed the mathematical framework necessary to
6
theoretically characterize the behavior of the individual when searching products on an
online website such as Amazon or Expedia. We will now enter an empirical discussion on
the field experiment conducted by Expedia before coming full circle to relate the empirical
regularities and the theoretical model.
Application to Expedia
In a field experiment, Expedia was able to test the effect of position on the click rate and
buy-rate on their hotels. As motivation, suppose as an online retailer, you are interested
just exactly how to rank products on your website. There are many potential ideas such as
most popular, highest rating, prices, or relevance. It happens that most websites utilize a
relevance ranking where they place the most relevant products at the top of search results
with those of less relevance following sequentially. However, in order to test the relevance
of the products being displayed, a natural question that arises is that are products relevant
because of their position on the website or rather intrinsic details of the product itself. The
online retailer, Expedia, sought to find the causal reasoning for click-through rates of objects
in the top positions by utilizing a field experiment [3].
Expedia is a travel company that has a main purpose is to aggregate travel-related
purchasing and display it on one consolidated platform [5]. Expedia features hotels, airlines
other forms of transportation tickets, and offers the customers on the website to even bundle
multiple aspects of their travel expenses. The website features a design that allows an
individual to type in locations, types of travel deals, and dates of travel in order to generate
the best possible deals for the customer. Doing so, an individual is displayed results as seen
below:
7
Figure 1: A screenshot of a typical search result. Source: Expedia.com
The experimental setup followed by Expedia was quite simple. Each customer that
visited the site over an 8-month period would be randomly assigned to a control group or a
treatment group[6]. Customers come to Expedia to check out hotels, and as such, customers
in each of the two groups would receive different search results for the same exact query. The
control group received the regular Expedia ranking, whereas the treatment group received
a random ranking. The treatment group was utilized to see the effect that the position had
on the click-through rate and the buy-rate of the hotels.
Throughout the 8 months of the field experiment that lasted between 2012 and 2013,
166,036 queries were recorded into the Expedia dataset. It should be noted that while
completing this expository project, I did not have access to the dataset, and as such, I will
be displaying the results as found in Ursu [6]. In addition, 13
of the time individuals were
put into the treatment group. Interestingly, a third of the individuals were exposed to the
treatment, which seems quite large given the fact that there could potentially have been a
large revenue loss. However, due to the large amount of samples that were taken for the
8
treatment group, we have a more complete data set, allowing us to remain more confident
in the results presented later. The results are now presented in the following the conclusion
of the field experiment.
Results
The following two graphs demonstrate both the click through rate and the purchase rate
conditioned on a click versus the position of the product.
Click-Rate versus Position. Source: Ursu 2017.
There are two large patterns in the graphs displayed. I will first discuss the click through
rate as a function of position. What we see is a decreasing click through rate as a function
of the position of the product. Namely, we expect that the products in the first few slots to
receive a larger amount of click in comparison to heavy tail of products remaining a smaller
amount of clicks. We see that the difference between the first position and the second
even differs in almost double the amount of clicks received. If there was more time to add
extensions to this project, it would be nice to utilize some inverse problem methods found
9
in applied mathematics, to try and back out the type of distribution of clicks that is leading
to such a graph being produced. It seems that we have a form of a Poisson distribution, or
exponential distribution, that has the majority of the data clustering around the leftmost
interval of the domain. As such, we see that the hotels that are in the top positions are
clicked more irregardless of the products being offered.
This is a strong effect observed in the data. Namely, we are seeing that rankings are
playing a critical role in the click through rate, and as such, are having a profound effect
on the customer’s search. However, let us now turn to the second graph to gain a greater
insight into the decisions of the consumer. Now we will turn to the second graph displaying
the buy-rate conditional on a click versus the position of the good.
Buy-Rate versus Position conditional on Click. Source: Ursu 2017.
What we see in the second graph, is that there is essentially a flat conversion rate be-
tween clicking through a product to gain additional information and actually purchasing the
product. We see that the buy-rate conditional on clicking through to the product is roughly
uniform. As such, we can conclude that the fraction of clicks that lead to a purchase is not
a function of position. I will now link the motivation for these results with the theoretical
results provided earlier.
10
The key notions are here is that the click-through rate decreases as a function of position
yet the buy-rate, conditional on the click through, is constant. We note that the Expedia
example is no different than what we would have expected to have seen in the theoretical
model. First, the consumer decides to go ahead and search through some of the products.
In our specific case, the searching the consumer does is analogous to clicking through on
the products that are being seen. Since, the cost incurred for clicking through the products
at the top of the page, with high-placed positions, are less in value than those in lower
positions down the page, one would expect individuals to open more of the products’ pages
towards the top of the page. In the empirical results, we see that irrespective of the product’s
prior utility level, products towards the top of the page will be clicked more. As such, the
theoretical motivation provided by that of Weitzman and advanced with the work of Kim
[4] and Ursu [6], are in close concord with the empirical results.
In addition to the first graph being carefully described by theory, we can also motivate
the second graph as well. Give the secondary rules of the Weitzman search, namely that the
individual will eventually stop searching when the cost of searching the remaining is more
than the reservation utilities and that the maximum posterior utility achieving product will
be chosen, supports the notion of the products buy-rates being independent of positions2.
Again, we gain a careful insight into the empirical evidence through the theory. Weitzman’s
theory of price search predicts just this. The fact that once the objects are searched, i.e.
clicked through, one would not expect the position to play anymore role[7]. This problem
is considered is parallel to the Pandora’s box. After searching all of the boxes that should
be searched, given the criteria, the algorithm in place to select the item only depend on the
posterior utility. Since, for random ranking the posterior utilities should be independent of
ranking, one would expect a uniform buy-rate for each position. As such, we see a synthesis of
both the theoretical model proposed over 30 years ago with the empirical evidence gathered
recently.
2conditional on the individual clicking through
11
Conclusion
Throughout the paper, we have introduced the notions of a costly sequential search, the
application of price search to online retail, and the causal effects on the click-through and
buy rates for ranked products on an online platform. In doing so, notions of a theoretical
model, based mainly off the work of Ursu [6], was utilized as motivation to capture the
essential elements of costly price search as derived in Weitzman’s landmark paper in 1979.
To couple the theoretical argument, we also discussed the field experiment completed by
Expedia as a means to justify the theoretical predictions with empirical evidence.
We see that while the click through rate is indeed a decreasing function of position,
items with positions on the top of the page are inherently clicked more. However, we also
note that conditional on a click-through to the product description and other additional
information, the buy-rate is independent of the position. This shows how the simple model
provides powerful insight to what drives revenue of online retail that featured ranked results
for queries. I thought that this project offered a nice blend of theoretical models, like
those discussed in the beginning of class, and applied it to a contemporary problem. I was
able to utilize skill learnt in maths courses whilst also motivating variables, concepts, and
equations through economics argument that allowed me to synthesize what I had the fun
learning throughout the course. an interesting potential future work would be to pose a
choice modeling problem that looks at if there is ways to better maximize revenue for places
like Expedia and Amazon by changing the ordering of their products on their respective
websites. It would be interesting to see whether there was potential to move lesser relevant
products into higher positions and move best sellers to ternary positions, for example, and
note the effects of revenue.
12
References
[1] Y. Chen and S. Yao. Sequential search with refinement: Model and application with
click-stream data. Management Science, 63(12):4345–4365, 2016.
[2] X. Jiang, Y. Xiao, and S. Li. Personalized expedia hotel searches. 2013.
[3] N. Khantal, V. Kroshilina, and D. Maini. Rank hotels on expedia. com to maximize
purchases. Technical report, 2013.
[4] J. B. Kim, P. Albuquerque, and B. J. Bronnenberg. Online demand under limited con-
sumer search. Marketing science, 29(6):1001–1023, 2010.
[5] R. Law and F. Chen. Internet in travel and tourism-part ii: Expedia. Journal of Travel
& Tourism Marketing, 9(4):83–87, 2000.
[6] R. M. Ursu. The power of rankings: Quantifying the effect of rankings on online consumer
search and purchase decisions. Marketing Science, 37(4):530–552, 2018.
[7] M. L. Weitzman. Optimal search for the best alternative. Econometrica: Journal of the
Econometric Society, pages 641–654, 1979.
13