September 2, 2003
Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com
Analyzing Customer Behavior at Amazon.com
Andreas S. WeigendChief Scientist, Amazon.com
Analyzing Customer Behaviorat Amazon.com
KDD: August 2003SAS: October 2003
2
AgendaAnalyzing Customer Behavior at Amazon.com
Andreas S. WeigendChief Scientist, Amazon.com
• 1. DataSources
Characterizations
• 2. ActionsE.g., Personalization, Pricing, Promotions…
• 3. Two data sets for researchShare the Love network
Ratings
• 4. Some reflections
• 5. Questions
3
1. Sources of Data
• Customer behaviorOverall use of the site
Buying vs selling
Community features
Purchase information
Session information
Individual click informationResponses (and non-responses!) to links, ad campaigns, emails, …
Customer service contactsEmail, phone, product returns
…
• Amazon.com performancePage generation time
Search results
Delivery date relative to promised date
…
CustomerSatisfaction
4
How Many Sessions at Amazon.com per Day?
Definition of session (also called visit):Begin: With first http request from that day (state kept via cookie)
End: Midnight (Pacific time)
Q: Number of sessions per day?4 – 5 M
Recognized (know customer ID)1M
Unrecognized (don’t know who)2M
Robots1 – 2M
Q: How long is a “typical session”?What shape of distribution would you expect?
Less than 30% of all sessions are associated
with a specific customer!
5
1
10
100
1000
10000
100000
1000000
10000000
0 0.5 1 1.5 2 2.5
Session length (number of hits, log base 10)
Cou
nts
Session Length Distribution
32% of sessions* have a single hit only,more than expected by smooth continuation
This indicates a mixture of processes
30 hits 100 hits10 hits 300 hits
32%10%
*Non-robot and non-internal sessions onlyFebruary 19, 2003 6
From Individual Sessions to Customers
• Analyze customer behavior over a period spanning 12 monthsFrom Aug 1, 2002 until July 31, 2003
Based on internal research data set created for longitudinal studies100k customers selected randomly via 3 digits of their customer ID
Q: Of the customers who visited in the last 12 months, how many had made a purchase prior to that period?
About 50%
Q’s: What is the number of visits, what is the number of purchases in last 12 months of previous customers
“Previous customer”: To avoid bias due to new accounts, condition on accounts with at least one purchase before Aug 1, 2002
September 2, 2003
Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com
Analyzing Customer Behavior at Amazon.com
7
How often did previous customers visit in the past 12 months?
8 36.5% of previous customers did visit but not purchase in past 12 months
How often did previous customers purchase in the past 12 months?
9
4 purchases
5 purchases
Median number of purchases per year: Between 4 and 5
10
How much does each group contribute to the total purchases?
11
Findings
• A typical typical customer? NO!Traditional approaches: Segmentation / Clustering
• A typical random sample? NO!E.g., sample of sessions ≠ sample of customers
12
Insights into the Shopping Process
Goal: Build models and obtain intuitions, in order to drive actions
• What’s easy? What’s hard?Conceptually easy…: Data a lot richer
Data richer both in quantity, and in quality (e.g., relational data)
… but implementation is hard: Clean data, access data, legacy, …
• An example of building up an intuitions about the online shopping process
Q: How long does it take a customer to make a purchase decision?
September 2, 2003
Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com
Analyzing Customer Behavior at Amazon.com
13
How long ago did a customer first look at the detail page of an item she purchased? (Conditioned on purchase)
14
Creating and Maintaining Product Space Awareness
• Q: Time scale of decision making process in shopping?
• Insights20% of items bought today were looked at before
Shopping process extends significantly across timeSession not a good atomic construct
Other relationships (not presented here)Product group
Price
Gender
15
Levels of Analysis: Time Scales and Amount of Data
• Levels of analysis
Customer level
Purchase level
Session level (daily aggregates)
Click level
Presentation level*
*What was displayed,
whether or not it was clicked on
• New data per day
1MB
10… 100MB
1… 10GB
100GB … 1 TB
10+TBAm
ou
nt
of
data
16
Summary: Dimensions of Models
• Time scales
• Flat vs relational
• Static vs dynamic
• Observable vs hidden
• Multi-scale / multi-level models
• Next: Two examples from the ends of the spectrumNo information from the current session
Only information from the current session
17
No Information from Current Session:Customer Profiling and Segmentation
Navigational style (e.g., Searcher vs browser / clicker)
Level of playfulness, of interest in exploring
Leader vs follower
Degree of focus
Degree of price sensitivity
Degree of time sensitivity
Degree of sophistication
Attitude to complexity
Brand conscious
Early adopter
...18
Information Before the First Click:Where Does a Visitors Come From?
Direct: No HTTP-referrer, no Associates tag
Associates: Companies and individuals (1M) (Associates tag)
Megadeals: AOL, MSN,…
Data: Wednesday February 19, 2003
Percentage of all session by referrer pageReferrer
Other
Search Engines
Megadeals
Associates
Direct
10%
4.7%
1.1%
11%
31%
43%
September 2, 2003
Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com
Analyzing Customer Behavior at Amazon.com
19
Only Information from Current Session:Predict Intentions and Modalities of Current Session
• Examples of sessions
Planned vs impulse session
Personal vs job-related session
At home vs at work
Is-in-a-hurry vs has-time-to-kill
Ready to make a decision
• Task: Make dynamic predictions
Prediction about remaining number of pages
Prob (next page is last page)
Prob (buy in this session with coupon), vsProb (buy in this session without coupon)
Evaluation:• Off-line analysis of past data• On-Line experiments 20
CustomerBehavior
MachineLearning,
GameTheory
CustomerIntentions,
State
CompanyStrategic
Goals
CompanyActions
PredictiveModels
andAlgorithms
from Observation to Action
21
2. From Observation to Action: Remarks on Personalization
• Two “projections”Targeting:
Find customer for product, store, site feature,…
Recommending:Find product etc. for customer
• “Collaborative filtering”E.g., NetPerceptions
Sad news from August 2003…
22
23
How Does Amazon.com Make Recommendations?
Recommendations algorithm now published:
Greg Linden, Brent Smith and Jeremy York: Amazon.com Recommendations: Item-to-Item Collaborative Filtering IEEE Internet Computing (January/February 2003) 7 (1) 76-80
Similarity measure: cosine
24
• Purchase similarities vs Session similaritiesCustomers who bought … also bought …
Customers who shopped for … also shopped for …
• Some examples that use Amazon.com dataVisualization
The Hive Group (Ben Shneiderman)
Clustering, based on “Customers who bought … also bought …”orgnet.com (Valdis Krebs)
Relational probabilistic modelsCleverSet, Inc. (Bruce D’Ambrosio)
• Use Amazon Web Services SOAP/XML interface to extract data, build models, create visualizations, build stores
September 2, 2003
Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com
Analyzing Customer Behavior at Amazon.com
25 26 Source: Valdis Krebs, orgnet.com
Book Network derived from “People who bought …also bought … data”
27 28
Agenda
1. DataSources
Characterizations
2. Some remarks on personalization
• 3. Two data sets for the research communityA: Share-the-Love Network
B: Ratings
• 4. Some reflections
• 5. Questions?
29
3. Two Data Sets for the Research Community
Have you ever received an email like this one:
Subject: Claudia Perlich has sent you a 10% discount
Claudia Perlich (your thoughtful pal) just bought the following item at Amazon.com and is using our Share the Love program to pass along an additional 10% discount to you.Click the links below to see more product information on your discount list and purchase the following item by ...
• Amazon Data Set A: Social Network Each time you place an order for books, music, DVDs, or videos with us, we'll offer you the chance to e-mail your friends and give them an additional 10% off the items you bought. (You select which items, of course.)
If any of those people purchases one of those items within a week, you'll receive a credit to use the next time you shop with us!
Your credit will equal the dollar amount of your friend's 10% discount.”
• Amazon Data Set B: Ratings 30
September 2, 2003
Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com
Analyzing Customer Behavior at Amazon.com
31
Amazon Data Set A: Share-the-Love (STL) Network
• Size4.0M nodes
3.2M edges
1.5M items (distinct)
• FieldsIDs (obfuscated)
Sender ID
Receiver ID
ZIP codesSender ZIP
Receiver ZIP
Date(s)When item was bought by sender and STL email was sent
If purchased, when that item was bought by receiver
ItemProduct ID (“ASIN” = Amazon Standard Item Number)
Product group
Price (as of time STL date)
32
Rating of Item (1 … 5 stars)
Helpfulness of Review(by other customers)
33
Amazon Data Set B: Ratings
• RatingsWhen a customer writes a review about an item, she is also asked to rate the item by giving it between 1 and 5 stars
Amazon.com makes available a random sample of 4M of these ratings
• FieldsRater ID (obfuscated)
Date (when rating was submitted)
ItemProduct ID (ASIN)
Price (as of August 6,2003)
Product group
Rating of itemNumber of stars (e.g., , given by this Rater to this Item)
Helpfulness of reviewFeedback from customers who found this Review “helpful” / “not helpful”, computed from:
34
How prolific are Amazon.com reviewers?
Some reviewers really are prolific!
More than1M customersreviewed a single item
***
35
What is the distribution over the number of reviewsreceived for a item?
3,800 reviews for Harry Potter 5
Ranking /Ordering/Surfacing/Presentingof reviews
What is the Shape of the Distribution of Number of Stars?
1 2 3 4 5
counts
1 2 3 4 5
counts
1 2 3 4 5
counts
1 2 3 4 5
counts
Guess?!
September 2, 2003
Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com
Analyzing Customer Behavior at Amazon.com
37
Amazon Data Set B: Ratings
• Size of training set (Release date: September 30, 2003)3.5M ratings through August 2003
• Sizes of test sets (Release data: January 31, 2004)0.5M ratings from same time period as training set
0.5M ratings after end of training set
• Amazon Cup deadline: March 31, 2004
1 2 3 4 5
counts
Distribution of ratings
Will be revealed at the presentation
38
Tasks and Evaluations
For each test point:Give
Rater ID (obfuscated, consistent with training set)
Date (of submission of rating)
Item (ASIN, Price, Product Group)
• Task 1:
Predict distribution across number of stars
Probabilities of observing 1 star, 2 stars, 3 stars, 4 stars, 5 stars
Evaluated by mean log likelihood of observed data given prediction
1/N Σ log (prob (observed number of stars))
• Task 2:
Predict number of stars (point prediction, e.g., 3.27)
Evaluated by (1) mean absolute error and by (2) mean squared error
39
Discovery and Data Mining
• Your idea here _____________________________________
• Some suggestionsCharacterize / cluster reviewers, items
Find (opinion) leaders, followers
Predict helpfulness of review
Predict which item / product group a customer is likely to review
Understand effect of earlier ratings onto later ratings
Note: Might make the text of reviews available at a later stage
Use Amazon WebServices to access product table and other information
40
Agenda
1. Data sources and characterizations
2. Some remarks on personalization
3. Contest: Two data sets for the research community
• 4. Some reflections
• 5. Questions
41
4. Some Reflections
• DataSynthetic Benchmark Real (not cleaned) Real-time system
• StagesMeasure/Collect Describe/Characterize Predict(+Eval) Act/Control
• Role of experimentsShort-term vs long-term effects
• Goal: Computational marketingRequires multi-disciplinary effort on behavioral analytics
Machine learning, Data mining, Statistics, Control theory
Decision analytics (normative and descriptive)
Behavioral economics
Game theory 42
5. Questions?
• Thank you:Dave Liu (Amazon.com)
Bruce D’Ambrosio (CleverSet, Inc.)
Jimmy Pang (Amazon.com and Stanford)
• Questions?
• Further information, slides, data sets etc.:
Web: www.weigend.com
Email: [email protected], or [email protected]
Mobile phone: +1 (917) 697-3800
This presentation: http://www.weigend.com/WeigendSAS2003.pdf