arxiv:2012.06009v1 [cs.cv] 10 dec 2020

Vision-based Price Suggestion for Online Second-hand Items

Liang Han *

Stony Brook [email protected]

Zhaozheng Yin †

Stony Brook [email protected]

Zhurong XiaAlibaba Group

[email protected]

Li GuoAlibaba Group

[email protected]

Mingqian TangAlibaba Group

[email protected]

Rong JinAlibaba Group

[email protected]

Abstract

Different from shopping in physical stores, where peo-ple have the opportunity to closely check a product (e.g.,touching the surface of a T-shirt or smelling the scent ofperfume) before making a purchase decision, online shop-pers rely greatly on the uploaded product images to makeany purchase decision. The decision-making is challeng-ing when selling or purchasing second-hand items onlinesince estimating the items’ prices is not trivial. In this work,we present a vision-based price suggestion system for theonline second-hand item shopping platform. The goal ofvision-based price suggestion is to help sellers set effectiveprices for their second-hand listings with the images up-loaded to the online platforms.

First, we propose to better extract representative visualfeatures from the images with the aid of some other image-based item information (e.g., category, brand). Then, wedesign a vision-based price suggestion module which takesthe extracted visual features along with some statistical itemfeatures from the shopping platform as the inputs to de-termine whether an uploaded item image is qualified forprice suggestion by a binary classification model, and pro-vide price suggestions for items with qualified images bya regression model. According to two demands from theplatform, two different objective functions are proposed tojointly optimize the classification model and the regressionmodel. For better model training, we also propose a warm-up training strategy for the joint optimization. Extensiveexperiments on a large real-world dataset demonstrate theeffectiveness of our vision-based price prediction system.

*Liang Han is a PhD student at Department of Computer Science, StonyBrook University. He participated in this project as a research intern.

†Corresponding author. Zhaozheng Yin is an associate professor at De-partment of Computer Science and Department of Biomedical Informatics,Stony Brook University. He participated in this project during his sabbati-cal.

1. Introduction

Benefiting from the prevalence of smartphones, laptopsand internet, the E-commerce has been on an exponen-tial growth curve during the last couple of years. Tradi-tionally, the E-commerce is mainly Business-to-Customer(B2C), and products sold on the online platforms are brandnew. In recent years, inspired by the sharing economy, moreand more people are trying to cash in their second-handidle assets, which is heating up the Customer-to-Customer(C2C) second-hand trading. Moreover, a number of onlineplatforms such as eBay, Letgo, Xianyu and Mercari, haveemerged to help facilitate the online second-hand market.

Usually, the second-hand platforms do not make restric-tions on how the sellers set prices for their listings. How-ever, it is a great challenge for the sellers to set accurate orreasonable prices for their second-hand items. Comparedwith brand new products, most of the second-hand itemslisted on the platforms are unique, which means price refer-ences can be seldomly found, though the second-hand itemtransaction records can be saved. Accordingly, it can bea great help for sellers if the platforms can provide someprice suggestions for their listings. Considering the onlinebuyers rely heavily on the item images to make purchasedecisions, we believe that there is a strong relationship be-tween the item price and its image. In this work, we try todesign a price prediction system, which can provide effec-tive price suggestions for second-hand items listed on theonline platforms with the uploaded images.

When trying to solve this real-world problem, we con-front multiple challenges:

Nonstandard image shooting: In order to predict anitem’s price with its image, we need to extract representa-tive visual features first. However, unlike the images shot byprofessional photographers in a controlled photography stu-dio, the second-hand item images are usually taken by thesellers in their daily lives, and thus are with a wide rangeof varieties. The item in the image may be surrounded by

1

arX

iv:2

012.

0600

9v1

[cs

.CV

] 1

0 D

ec 2

020

Figure 1: Some challenges of vision-based price predictionfor online second-hand items. (a) Complex background, (b)blurred image, (c) uneven illumination, (d) inadequate iteminformation, (e,f) rare items, (g,h) brand impact.

a complex background (Figure 1(a)), blurred (Figure 1(b)),or shot with a bad illumination (Figure 1(c)), which greatlyincreases the difficulties of precisely extracting the usefulfeatures of the item from the image.

Inadequate image information: A picture is worth of athousand words. However, for E-commerce, an image maynot be able to provide the complete information of an item.For the Macbook in Figure 1(d), even a human expert canhardly predict a reasonable price for this Macbook withoutany additional specification information, such as the model,storage, memory, etc.

Numerous fine-grained categories: Unlike Airbnbwhere people mainly share their rooms, or Zipcar that fo-cuses on helping people share their cars, items listed ononline second-hand platforms may fall into numerous fine-grained categories, ranging from common categories suchas clothes (Figure 1(a)), cosmetics (Figure 1(c)), laptops(Figure 1(d)), to uncommon categories such as householdappliances (Figure 1(e)), fitness equipments (Figure 1(f)),etc. The price distribution can vary greatly for different cat-egories.

Brand impact: The influence of the item brand shouldnot be overlooked. Figure 1(g) and Figure 1(h) show twopairs of shoes with different brands. Though they are bothmen shoes, the prices of them are quite different. The brandimpact is quite common for many kinds of items such asclothes, cosmetics, phones, etc.

In spite of these challenges, we still have some strengthsto help us attack this challenging vision-based price predic-tion problem. First, each listed second-hand item containsan image, from which we can try to extract visual features to

perform vision-based price prediction. Second, some otherimage-related features of the item can be collected to as-sist the visual feature extraction, such as category, brand,etc. Third, with the increasing prevalence of online second-hand market, we can easily collect tons of data over time todevelop the price prediction system and keep refining it.

With these challenges and strengths in mind, we design avision-based price suggestion system for second-hand itemslisted on online platforms. First, we use some product infor-mation 1 to guide the visual feature extraction from images.With the extracted visual features and some statistical fea-tures 2, we then jointly train a classification model to deter-mine whether the uploaded image of an item is qualified forprice prediction, and a regression model to perform priceprediction for items with qualified images. For items withunqualified images, we encourage the sellers to either up-date the uploaded images (i.e., the current image may havebad qualities such as blur, uneven illumination, etc.) or addsome complementary item information such as text descrip-tion (for those items whose images have inadequate infor-mation). Then, we can provide price suggestions for theseitems with the updated images and/or the complementarytext description. The solution for items with unqualified im-ages involves human interaction, which is beyond the scopeof this work. In this paper, we focus on the price suggestionusing image information only. Our main contributions aresummarized as follows:

• We proposed a vision-based price suggestion sys-tem for second-hand items listed on online platforms,which is able to provide accurate or reasonable pricesuggestions for these second-hand items with their up-loaded images.

• An effective feature extraction method was presentedto better extract visual features from the images withthe assistance of some other image-related informationsuch as item attribute, brand, category, and specifica-tion.

• A joint optimization framework was designed to si-multaneously train (1) a binary classification model todetermine whether an uploaded image is qualified forprice prediction, and (2) a regression model to providethe price suggestions for items with qualified images.

1In this work, the image-related information we used include attribute,brand, category, and specification. For example, “a white cotton campus-style adidas T-shirt with size of XL”, attribute: white, cotton; brand: adi-das; category: T-shirt; specification: XL, campus-style.

2The statistical features used in this work include the first quartile, sec-ond quartile, third quartile and mean of the sold price of all historicalsecond-hand items; the first quartile, second quartile, third quartile andmean of the sold price of all historical second-hand items in the same cat-egory as the item whose price to be predicted; the first quartile, secondquartile, third quartile and mean of the sold price of all historical second-hand items sold by this seller.

2

Figure 2: Flowchart of the proposed vision-based price suggestion system.

In this module, according to different requirements ofimage quality for vision-based price prediction (prede-fined by the platform operator), two specific loss func-tions were designed for the training of the vision-basedprice suggestion module which consists of a binaryclassification model and a regression model.

• We proposed a training procedure for the joint opti-mization framework, which can train the classificationmodel and the regression model more stably and witha better performance.

2. Related WorkThe goal of price prediction for online second-hand

items is to suggest a listing price to the seller which willbe accepted by both the seller and the potential buyer witha high likelihood.

A number of previous works focus on developing effec-tive pricing strategies for many different kinds of products.Kusan et al. [1] use the fuzzy logic theory to develop agrading model for house-building price prediction, whichtakes the city plans, the nearness to medical, educationaland cultural buildings, the public transportation systems andsome other environmental factors as the influence factorsof the house price. In [2], the author proposes to predictthe price of second-hand cars based on some car characterssuch as model, make, mileage, volume of cylinder, withseveral different supervised machine learning algorithms.Schlosser and Boissier [4] propose to derive price reactionstrategies with a dynamic programming model in competi-tive markets with multiple offer dimensions, such as quality,rating and price, which is evaluated by a seller on Amazonfor the sale of second-hand books.

Recently, deep neural networks have shown great poweron various computer vision tasks such as semantic segmen-tation [10, 11], object detection [12, 13], generative mod-eling [14, 15, 7], and also price prediction [16, 17]. In [9]an optimized back-propagation neural network algorithm isused to build a price evaluation model, which is based oncirculated vehicle data along with a huge number of vehi-cle transaction data, to evaluate prices of second-hand cars.You et al. [3] employ a recurrent neural network to es-

timate the prices of online real estates by taking both thedeep visual features extracted from house pictures and thelocation information into consideration, in which the visualattributes play a major role. Law et al. [18] propose topredict house prices with traditional housing features suchas size, age and style as well as visual features extractedfrom both street images and satellite images by a deep neu-ral network model, which is considered to be able to bettercapture the urban qualities and improve the price estima-tion. Ye et al. [19] present a customized regression modelfor the dynamic pricing at Airbnb by considering listing fea-tures (room type, capacity, locations. etc), temporal infor-mation (seasonality), and supply and demand dynamics inthe model. Maestre et al. [8] propose to use reinforce-ment learning to solve a dynamic pricing problem, in whichprices are maximized while a balance between the revenueand fairness is well kept. These pricing algorithms workwell for a certain item category such as cars or houses, whilein our scenarios, we try to provide price suggestions for on-line second-hand items in numerous fine-grained categories.

Mercari, a Japanese E-commerce company, held a pricesuggestion challenge 3 recently to provide price suggestionsto online sellers by estimating the prices for products theyare selling. This challenge does the very similar task asours. However, no visual feature of the product was in-volved in their competition.

3. Vision-based Price Suggestion

The proposed vision-based price suggestion system aimsat providing effective price suggestions for online listedsecond-hand items based on the visual features extractedfrom the uploaded images along with some statistical fea-tures. Figure 2 presents the overall price suggestion pro-cess. When a seller takes an image for the second-handitem and uploads it to the mobile application for item list-ing, the uploaded image and some historical statistical fea-tures obtained from the platform will be collected by themobile App. Then, visual features will be extracted fromthe image (Section 3.2). The platform operator can enforcedifferent constraints on second-hand item price suggestion:

3https://www.kaggle.com/c/mercari-price-suggestion-challenge

3

Figure 3: Distribution of the original price (a) and the priceafter the logarithm operation (b).

(1) percentile constraint (i.e., the percent of second-handitems on the platform whose images are qualified for vision-based price suggestion), and (2) threshold constraint (i.e.,the second-hand item whose image is qualified for vision-based price suggestion should have its price prediction er-ror less than a given threshold). With these two differentoperation strategies, two price suggestion modules are de-signed: (1) vision-based price suggestion with percentileconstraint (Section 3.3.1) and (2) vision-based price sug-gestion with threshold constraint (Section 3.3.2). In eachmodule, a vision-based classification model takes as inputthe extract visual features along with the statistical features,and determine whether the uploaded image is qualified forprice suggestion. If yes, a suggested price of this listed itemwill be provided to the user by a vision-based regressionmodel, otherwise, the user will be requested to update theitem image or provide some text descriptions for better pricesuggestion (this is not included in this paper).

3.1. Data Preparation and Analysis

We collected an online dataset which contains 1,016,445second-hand items over three months to develop our vision-based price suggestion system. As the dataset accumu-lated over days contains abnormal data samples, first wedelete the items with irrationally low or high prices fromthe dataset, which removes 5% of the original data. Then,we check the refined data and find that the price distributionof these online second-hand items is heavily long-tailed,as shown in Figure 3(a). It is known that a Gaussian dis-tribution can be better modeled by a model trained withobjective functions such as mean square error loss, com-pared to a long-tailed distribution. Thus, we apply the log-arithm operation on the original second-hand item prices,and transform the long-tailed distribution of online second-hand items close to a Gaussian distribution (Figure 3(b)).

3.2. Visual Feature Extraction

Extracting effective features from the image is the mostfundamental yet crucial step for developing a vision-basedprice prediction model. When collecting the item imagefrom the platform, we find that some additional image-

Figure 4: Network architecture for visual feature extraction.Multiple single-task models are trained to extract differentvisual features. ResNet-50 is adopted to build the archi-tecture of each single-task model (denoted by dashed bluebox).

related information can be collected as well, such as cat-egory, brand, etc. Unfortunately, even though each item hasits own image, not all of these additional characteristics canbe collected for each item in the listing process. Some itemsmay have noisy category label, while some may have miss-ing brand information. Thus, directly taking these image-related characteristics as the input to the prediction modelis not a good idea.

Inspired by the multi-branch feature extraction modelproposed in [5], which has the ability of simultaneouslyperforming object detection from images with backgroundclutter and representative visual feature extraction of the de-tected object, we consider to build a visual feature extrac-tion model to extract informative and representative visualfeatures from the item image with its product information(e.g., category, brand, etc.). In this work, instead of build-ing one single multi-task model, we propose to build multi-ple single-task models to extract visual features. Figure 4shows the idea of the proposed visual feature extraction.First, each single-task model takes as input the item im-age, and performs a specific task (i.e., category prediction,attribute prediction, specification prediction and brand pre-diction). After each single-task model is well-trained, weemploy these single-task models on a new item image, andthe feature vectors generated by the second last layer of each

4

model are picked as the extracted visual features of this im-age. We use ResNet-50 for the architecture building of eachsingle-task model, which is denoted by the dashed blue boxin Figure 4.

3.3. Design of Vision-based Price Suggestion

After extracting the visual features, the next step is totrain a model with the extracted features to perform priceprediction. However, as discussed in the challenges in Sec-tion 1, some uploaded images are with inadequate item in-formation or bad image quality (blur, uneven illumination),and accordingly the features extracted from these imagesare not qualified for offering reasonable price suggestionsfor the items. Thus, instead of providing a suggested pricefor a second-hand item without considering the image qual-ity, first we need to determine whether an uploaded imagecan be used for offering an acceptable price suggestion. Inother words, we need a binary classification model to deter-mine whether an item image is qualified for reasonable priceprediction, and a regression model to perform price predic-tion for items with qualified images. We propose to usedeep neural networks for these two models, and the same ar-chitecture is shared by both the classification model and theregression model except for the output (these two models donot share the weights). Figure 5 presents the architecture ofthe regression/classification model, which is a feed-forwardneural network. The network takes as input the extractedvisual features of the item and some statistical features, andoutputs a predicted price for the second-hand item (regres-sion model) or a judgement of whether the extracted visualfeatures of the item are qualified for price prediction (clas-sification model).

Basically, the training of the binary classification modelrequires some positive samples (images which are quali-fied for reasonable price suggestions) and negative samples(images which are not qualified for reasonable price sug-gestions), and the separation of positive and negative sam-ples is usually determined by the price prediction regressionmodel. In turn, the performance of the binary classificationmodel will greatly influence the price prediction model, asit performs price suggestion only on positive items (itemswith qualified images for reasonable price suggestions).Since the classification and regression tasks are mingled to-gether, instead of training the classification model and re-gression model separately, we consider to jointly train them.

3.3.1 Vision-Based Joint Optimization with PercentileConstraint

There are different criteria to determine whether an item im-age is qualified for offering a reasonable price suggestion.From the platform’s point of view, the operator can use apercentile strategy, e.g., 50% of the item images are qual-

Figure 5: Architecture of the regression/classificationmodel. The model takes as input the extracted visual fea-tures of a second-hand item along with some statistical fea-tures, and provides suggested price for this item (regressionmodel) or determines whether the extracted visual featuresof the item are qualified for price prediction (classificationmodel).

ified for price suggestion, and the other 50% are not. Inthis scenario, the labels of the samples for training the clas-sification model (i.e., qualified or non-qualified for vision-based price prediction) is not manually determined by hu-man beings, but automatically determined by the regressionmodel. In other word, the classification model regards theitems with the top 50% best price suggestions provided bythe regression model as positive, and others as negative.We design a loss function to jointly optimize the classifi-cation model and the regression model with the percentileconstraint:

minΘ1,Θ2

{ 1

N

N∑i=1

C1(xi; Θ1) ·(yi − freg(xi; Θ2)

)2

+ β ·max(

0, δ − 1

N

N∑i=1

C1(xi; Θ1))} (1)

where C1 is an indicator function, which is defined as

C1(xi; Θ1) =

{0 if fcls(xi; Θ1) < 0.5

1 otherwise,(2)

5

where fcls and freg represent the binary classificationmodel and the regression model, respectively, whose archi-tectures are shown in Figure 5; Θ1 and Θ2 are the parame-ters of fcls and freg , respectively; N is the number of train-ing samples; xi is the ith training sample; yi is the sold price4 of the training sample xi; β is a weight to balance thesetwo terms; δ controls the percentage of positive items (itemswith qualified images for reasonable price suggestion), i.e.,how many items can be regarded as positive items.

The first term in Eq.1 tries to achieve two goals: first,training the price prediction regression model freg by min-imizing a square loss with the positive samples labeled bythe binary classification model; and second, training the bi-nary classification model fcls by requiring the classificationmodel to label the samples with the best price suggestions(the differences between the predicted prices given by theregression model and the sold prices are the smallest) aspositive (the item image is qualified for reasonable pricesuggestion). The second term in this equation is a constrainton at least how many items should be regarded as positiveitems.

3.3.2 Vision-Based Joint Optimization with ThresholdConstraint

The percentile-based joint optimization requires the plat-form operator to have some prior knowledge of the qualityof the item images to set the suitable percentile value. An-other criterion for labeling the samples to train the binaryclassification model is to set a threshold ε for the price pre-diction accuracy, i.e., if the difference between the predictedprice given by the regression model and the sold price of anitem is smaller than ε, this item is regarded as a positivetraining sample, otherwise, it is negative. In this scenario,the loss function for the joint optimization of the regressionmodel and the classification model will be

minΘ1,Θ2

{ 1

N

N∑i=1

C1(xi; Θ1) ·(yi − freg(xi; Θ2)

)2

+β ·max(

0, δ − 1

N

N∑i=1

C1(xi; Θ1))

+γ · 1

N

N∑i=1

[− C2(xi, yi; Θ2) · log

(fcls(xi; Θ1)

)−(

1−C2(xi, yi; Θ2))·(log(1−fcls(xi; Θ1)

))]}(3)

where γ is a parameter to scale the cross-entropy term, C1

is defined as the same as in Eq.2, andC2 is another indicator

4The price at which an item is sold. Sold price is usually regarded asthe true price of an item because this price is accepted by both the sellerand the buyer.

function which is defined as

C2(xi, yi; Θ2) =

{0 if |yi − freg(xi; Θ2)| > ε

1 otherwise(4)

The first term in Eq.3 plays the same role as the firstterm in Eq.1. The second term in Eq.3 makes a constraintto avoid a trivial solution that the binary classification modelclassifies all the samples as negative, and the regressionmodel offers very bad price suggestions for all the samples.The third term in Eq.3 is a cross entropy loss to train the bi-nary classification with the training samples labeled by theregression model.

3.4. Warm-up Training

When jointly training the classification model and the re-gression model either with Eq.1 or with Eq.3, the accuracyof the classification model will be very low at the beginningof the training, and the positive sample determined by theclassification model may not exactly be positive, this willfool the training of the regression model, and worsen theperformance of the price prediction. In order to obtain amore stable and better training result of joint optimization,we propose to optimize the regression model with all of thetraining samples at the beginning of the training process.When the classification model is relatively stable, we thenuse the positive samples labeled by the classification modelto further refine the regression model. Specifically, we firstlet C1 in Eq.1 and Eq.3 equal to 1 for all of the trainingsamples, and jointly optimize the two models with Eq.1 orEq.3 (we call this as the first training stage), then we defineC1 as Eq.2 to continue the training for more iterations (wecall this as the second training stage).

In our experiments, we jointly train the classificationmodel and the regression model with a learning rate of0.0005 for 1,700 epochs, then with a learning rate of 0.0002for 850 more epochs in the first training stage. In the secondtraining stage, we training these two models with a learningrate of 0.0005 for 3,400 epochs, then with a learning rate of0.0002 for 1,700 more epochs. We use a batch size of 4096,and the Adam optimizer [6] is adopted for optimization.

4. ExperimentsWe present the experiments in this section. First, we in-

troduce the dataset and evaluation metrics used in our ex-periments (Section 4.1- 4.2). Then, we evaluate the vision-based price suggestion both qualitatively and quantitatively(Section 4.3- 4.4). Thirdly, we perform ablation studies onthe features used in our system (Section 4.5) and the warm-up training mode (Section 4.6).

4.1. Dataset

We collect a large dataset from Xianyu, an onlinesecond-hand platform launched by China’s E-commerce gi-

6

Table 1: Summary of dataset used in our experiments.

class 1 2 3 4 5 6 7 8 9 10 11 12 13 total# training sample 184401 38292 11196 27215 16456 31362 125684 65357 184861 59221 88055 23330 36984 892414

# validation sample 5528 1196 290 817 484 972 3671 1926 5360 1747 2721 710 1029 26451# testing sample 20712 4359 1055 2895 1825 3656 13705 7253 20060 6323 9070 2503 4164 97580

Table 2: Accuracy of vision-based price suggestion withpercentile constraint. Positive items are with qualified im-ages for vision-based price suggestion, and the MALE andRMSLE values of the positive items are calculated.

percentile 30 40 50 60 70

# positive items 29468 39694 48871 60781 69794% positive items 30.20% 40.68% 50.08% 62.29% 71.52%

MALE 0.3573 0.3685 0.3864 0.4012 0.4132RMSLE 0.4828 0.5023 0.5302 0.5502 0.5666

ant, Alibaba, to evaluate our proposed system. The dataconsist of 13 classes: women clothes (1), men clothes(2), children clothes (3), women shoes (4), men shoes(5), children shoes (6), cosmetic (7), phone (8), Com-puter/Communication/Consumer (3C) product (9), house-hold appliance (10), sports equipment (11), baby product(12), and accessories (13). The data are divided into threesets: training, validation, and testing. Table 1 summarizesthe data used in our experiments.

4.2. Evaluation Metric

The goal of our proposed vision-based price suggestionsystem is to provide suggested prices for online second-hand items which will be accepted by both the seller andthe buyer, so the suggested price should be close to the trueprice (sold price) of the item. We adopt the Mean Abso-lute Log Error (MALE) and the Root Mean Square Log Er-ror (RMSLE) to quantitatively evaluate how the suggestedprices are close to the true prices of the second-hand items:

MALE =1

M

M∑i=1

∣∣∣log(pi)− log(pi)∣∣∣ (5)

RMSLE =

√√√√ 1

M

M∑i=1

(log(pi)− log(pi)

)2

(6)

where M is the number of testing samples, pi and pi are thepredicted price and sold price of product i, respectively.

4.3. Evaluation of Vision-based Price Suggestionwith Percentile Constraint

We first test the performance of the vision-based pricesuggestion with percentile constraint. In the training stage,

the binary classification model (fcls in Eq. 1) is trained withthe percentile constraint given by the platform operator (i.e.,how many percents of item samples whose images shouldbe regarded as qualified for vision-based price suggestion).When testing, the classification model classifies the positiveitems (items whose images are qualified for vision-basedprice suggestion) from the negative ones (items with un-qualified images for vision-based price suggestion), thenthe regression model (freg in Eq. 1) provides suggestedprices for the positive items. Table 2 summarizes the results,which shows that the percent of positive items classified bythe binary classification model is almost exactly the samewith the percentile constraint. This demonstrates the effec-tiveness of the classification model. Besides, as the numberof positive items increases (i.e., more items are classified aspositive), the MALE and RMSLE values of the positiveitems increase. This is intuitive, since the quality control forclassifying positive item images decreases with the increaseof percent of positive items.

Some examples of the vision-based price suggestionwith percentile constraint (percentile constraint is 50%) onpositive items are presented in Figure 6. For each second-hand item, the gap between the predicted price and the soldprice is very small, which demonstrates the effectiveness ofthe regression model. Figure 7 shows the images of somenegative samples classified by the classification model un-der the percentile restriction of 50%. Some of the negativesamples have images which do not contain sufficient iteminformation (e.g., images with red boundaries in Figure 7),and some are with poor image qualities such as bad illumi-nation (e.g., images with green boundaries in Figure 7) orimage blur (e.g., images with blue boundaries in Figure 7)so that the extracted visual features are not representative.Figure 6 and Figure 7 also illustrate that the classificationmodel is able to effectively classify the positive items andthe negative items.

4.4. Evaluation of Vision-based Price Suggestionwith Threshold Constraint

In this subsection, we test the vision-based price sugges-tion with threshold constraint. Given a threshold value, ifthe absolute log error of the predicted price and the soldprice of a second-hand item is smaller than this threshold,the item is regarded as a positive training sample, other-wise, the item is negative. Accordingly, the joint training

7

Figure 6: Examples of vision-based price suggestion with percentile constraint for positive items. Sold prices are with bluecolor (left), and predicted prices are with red color (right). All the prices are Chinese Yuan (CHN).

Figure 7: Examples of negative items classified by the clas-sification model in the vision-based price suggestion systemwith percentile constraint. Images with red boundaries haveinadequate item information, images with green boundariesare shot in bad illumination, and images with blue bound-aries are heavily blurred images.

Table 3: Quantitative evaluation of the regression model forprice prediction under different threshold constraints.

value of ε in Eq. 3 0.3 0.4 0.5 0.6 0.7

# positive items 34256 52561 67365 77264 80297% positive items 35.11% 53.86% 69.04% 79.18% 82.29%

MALE 0.3427 0.3776 0.4033 0.4265 0.4309RMSLE 0.4764 0.5223 0.5553 0.5878 0.5917

of the binary classification model and the regression modelis greatly influenced by the given threshold value. Table 3summaries the performance of the regression model underdifferent threshold constraints.

When the given threshold value ε in Eq. 3 increases, i.e.,the online platform looses its restriction on the definition ofpositive items, more second-hand items are considered to bewith qualified images for vision-based price prediction by

the binary classification model, thus the regression modelprovides suggested prices for more second-hand items. Inthis scenario, it is no doubt that the MALE and RMSLEvalues of the positive items (classified by the classificationmodel) will increase. We can imagine that if the classifica-tion model picks an item with a very high-quality image asa positive item, and the regression model offers suggestedprice only for this positive item, the MALE and RMSLEvalues will be very low, but this is not sufficient to demon-strate that the performance of the classification model andthe regression model are good. Thus, when evaluating thesetwo models, we should not only pay attention to theMALEand RMSLE metrics of the positive items, but also keepour eyes on the number of positive items.

Some positive items and negative items classified by thebinary classification model in the vision-based price sug-gestion module with threshold constraint (threshold valueε=0.5) are presented in Figure 8 and Figure 9, with the pre-dicted price by the regression model in this price suggestionsystem listed on the bottom right of each item image. Thesetwo figures also demonstrate the effectiveness of the classi-fication model and the regression model in the vision-basedprice suggestion module with threshold constraint.

4.5. Evaluation of Feature Extraction

To evaluate the effectiveness of the proposed visual fea-ture extraction strategy, we perform two comparative ex-periments. In one experiment, we adopt the proposed vi-sual feature extraction strategy (i.e., extract visual featureswith the assistance of image-related characteristics, we callit multiple single-task feature extraction (MSTFE)) to trainthe vision-based price prediction module with percentileconstraint (percent constraint on positive items is 50%), andthen we train this module again without using the proposedvisual feature extraction strategy (i.e., a ResNet-50 architec-

8

Figure 8: Examples of vision-based price suggestion with threshold constraint for positive items. Sold prices are with bluecolor (left), and predicted prices are with red color (right). All the prices are Chinese Yuan (CHN).

Figure 9: Examples of negative items classified by the clas-sification model in the vision-based price suggestion systemwith threshold constraint. Images with red boundaries haveinadequate item information, images with green boundariesare shot in bad illumination, and images with blue bound-aries are heavily blurred images.

ture is employed to extract visual features from the imageof an item directly to do the price prediction without us-ing the image-related characteristics, we call it single-taskfeature extraction (STFE)). We also train the vision-basedprice prediction module with threshold constraint (thresholdvalue ε in Eq. 3 is 0.5) with and without the proposed featureextraction strategy, respectively. The experiment results arepresented in Table 4, which shows that the proposed visualfeature extraction strategy is more effective on extractingrepresentative visual features for vision-based price predic-tion.

Since we use the extracted feature vector, not the pre-dicted brand, category, attribute and specification, as the in-put of the regression model for price prediction, given theimages of two iPhone4, the model will not output an av-erage price of all iPhone4, but predicts different prices for

Table 4: Quantitative evaluation of proposed visual featureextraction strategy.

constraint feature # positive items MALE RMSLE

percentile STFE 46583 0.4008 0.5421MSTFE 48871 0.3864 0.5302

threshold STFE 58949 0.4104 0.5572MSTFE 67365 0.4033 0.5553

Table 5: Evaluation of proposed warm-up training mode.

constraint training mode # positive items MALE RMSLE

percentile W/O warm-up 48130 0.3940 0.5343W warm-up 48871 0.3864 0.5302

threshold W/O warm-up 55575 0.4058 0.5484W warm-up 67365 0.4033 0.5553

these two iPhone4 according to the different extracted vi-sual features of these two items.

4.6. Evaluation of the Training Mode

Finally, we evaluate the effectiveness of the proposedwarm-up training mode. Two comparative experiments aredesigned. One is training the vision-based price sugges-tion module with percentile constraint (percent constrainton positive items is 50%) with and without warm-up train-ing mode, respectively, and the other is training the vision-based price suggestion module with threshold constraint(threshold value ε in Eq. 3 is 0.5) with and without warm-uptraining mode, respectively. The results are summarized inTable 5, from which we can see that the models trained withthe warm-up training are with better performance.

9

5. ConclusionIn this work, we presented a vision-based price predic-

tion system to provide price suggestions for online second-hand items with the uploaded item images, which can helpthe second-hand item sellers set listing prices for the itemeffectively. In the proposed system, we design different lossfunctions to train the price suggestion system, so that var-ious demands from the platform operator can be satisfied.Experiment results on a large real-world dataset demon-strate the effectiveness of the proposed price suggestion sys-tem.

References[1] Hakan Kusan, Osman Aytekin, and Ilker Ozdemir. “The use

of fuzzy logic in predicting house selling price.” Expert sys-tems with Applications 37, no. 3 (2010): 1808-1813. 3

[2] Sameerchand Pudaruth. “Predicting the price of used cars us-ing machine learning techniques.” Int. J. Inf. Comput. Tech-nol 4, no. 7 (2014): 753-764. 3

[3] Quanzeng You, Ran Pang, Liangliang Cao, and Jiebo Luo.“Image-based appraisal of real estate properties.” IEEEtransactions on multimedia 19, no. 12 (2017): 2751-2759.3

[4] Rainer Schlosser, and Martin Boissier. “Dynamic pricing un-der competition on online marketplaces: A data-driven ap-proach.” In Proceedings of the 24th ACM SIGKDD Interna-tional Conference on Knowledge Discovery & Data Mining,pp. 705-714. 2018. 3

[5] Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, YingyaZhang, Xiaofeng Ren, and Rong Jin. “Visual search at al-ibaba.” In Proceedings of the 24th ACM SIGKDD Interna-tional Conference on Knowledge Discovery & Data Mining,pp. 993-1001. 2018. 4

[6] Diederik P. Kingma, and Jimmy Ba. “Adam: A methodfor stochastic optimization.” arXiv preprint arXiv:1412.6980(2014). 6

[7] Xingping Dong, and Jianbing Shen. “Triplet loss in siamesenetwork for object tracking.” In Proceedings of the Euro-pean Conference on Computer Vision (ECCV), pp. 459-474.2018. 3

[8] Roberto Maestre, Juan Duque, Alberto Rubio, and JuanArevalo. “Reinforcement learning for fair dynamic pricing.”In Proceedings of SAI Intelligent Systems Conference, pp.120-135. Springer, Cham, 2018. 3

[9] Ning Sun, Hongxi Bai, Yuxia Geng, and Huizhu Shi. “Priceevaluation model in second-hand car system based on BPneural network theory.” In 2017 18th IEEE/ACIS Interna-tional Conference on Software Engineering, Artificial In-telligence, Networking and Parallel/Distributed Computing(SNPD), pp. 431-436. IEEE, 2017. 3

[10] Pedro O., Pinheiro, Ronan Collobert, and Piotr Dollar.“Learning to segment object candidates.” Advances in neuralinformation processing systems 28 (2015): 1990-1998. 3

[11] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Gir-shick. “Mask r-cnn.” In Proceedings of the IEEE interna-tional conference on computer vision, pp. 2961-2969. 2017.3

[12] Zhaowei Cai, and Nuno Vasconcelos. “Cascade r-cnn: Delv-ing into high quality object detection.” In Proceedings of theIEEE conference on computer vision and pattern recogni-tion, pp. 6154-6162. 2018. 3

[13] Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. “R-fcn:Object detection via region-based fully convolutional net-works.” Advances in neural information processing systems29 (2016): 379-387. 3

[14] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, BingXu, David Warde-Farley, Sherjil Ozair, Aaron Courville, andYoshua Bengio. “Generative adversarial nets.” In Advancesin neural information processing systems, pp. 2672-2680.2014. 3

[15] Alec Radford, Luke Metz, and Soumith Chintala. “Un-supervised representation learning with deep convolu-tional generative adversarial networks.” arXiv preprintarXiv:1511.06434 (2015). 3

[16] Steven Peterson, and Albert Flanagan. “Neural network he-donic pricing models in mass real estate appraisal.” Journalof real estate research 31, no. 2 (2009): 147-164. 3

[17] Wen Long, Zhichen Lu, and Lingxiao Cui. “Deep learning-based feature engineering for stock price movement predic-tion.” Knowledge-Based Systems 164 (2019): 163-173. 3

[18] Stephen Law, Brooks Paige, and Chris Russell. “Take a lookaround: using street view and satellite images to estimatehouse prices.” ACM Transactions on Intelligent Systems andTechnology (TIST) 10, no. 5 (2019): 1-19. 3

[19] Peng Ye, Julian Qian, Jieying Chen, Chen-hung Wu, YitongZhou, Spencer De Mars, Frank Yang, and Li Zhang. “Cus-tomized regression model for airbnb dynamic pricing.” InProceedings of the 24th ACM SIGKDD International Con-ference on Knowledge Discovery & Data Mining, pp. 932-940. 2018. 3

10

arxiv:2012.06009v1 [cs.cv] 10 dec 2020

Documents