identifying customer segments through k-means cluster
TRANSCRIPT
TEAM 7 Hana Keiningham
Madhuri Pawar
Qianhe Zhao
Bingchen Wang
Identifying Customer Segments
Through K-Means Cluster Analysis
1
Table of Contents Executive Summary .......................................................................................................................2
1. Introduction ........................................................................................................................3
2. Background ........................................................................................................................5
2.1 Market Structure ...............................................................................................................5
2.2 Industry Trends ................................................................................................................5
2.3 Products & Services .........................................................................................................6
2.4 SWOT analysis .................................................................................................................6
2.5 Google Trend analysis ......................................................................................................7
3. Methodology .......................................................................................................................7
3.1 Overview ........................................................................................................................8
3.2 Data .................................................................................................................................9
3.3 Variables .........................................................................................................................9
3.4 K-Means Cluster Analysis ............................................................................................10
4. Post-Hoc Analysis ............................................................................................................14
4.1 Latest Purchase Analysis ...............................................................................................14
4.2 Purchase Frequency and Volume Analysis ...................................................................15
4.3 Payment Method Analysis .............................................................................................16
4.4 Channel Analysis ...........................................................................................................16
4.5 Geographic Analysis .....................................................................................................17
5. Discussion and Recommendation ....................................................................................18
5.1 Overview .......................................................................................................................18
5.2 Recommendations .........................................................................................................19
6. Limitation and Future Research .....................................................................................21
References .....................................................................................................................................22
Appendixes....................................................................................................................................24
2
Executive Summary
Once a popular destination for American customers to shop for casual clothing, Gap
brand is experiencing a continuous decrease in sales in recent years. With the emerging fast-
fashion brands eating up Gap’s market in the appeal industry, Gap is also perceived as losing its
identity (Monllos, 2015). In this research, we conducted a K-Means Cluster Analysis to identify
meaningful customer segments that shop at Gap to provide more insights in making effective
marketing decisions to retain them and increase their purchases. The analysis was conducted on a
transactional dataset that includes 100,000 samples with 226,129 cases. We applied RFM
variables in performing the analysis and finally divided the customer samples into seven
segments, which covers all the meaningful segments.
In the analysis, we identified the “Champion” customer and the “Potential Loyalist”
customer segments which generate the highest revenues and profits for Gap. These segments
include valuable customers that Gap should invest more marketing resources and efforts on. We
have also identified a “New Customer” segment that represents the most recent customers who
shopped at Gap. Gap also needs to spend time and energy on this segment to develop these
customers into loyal customers. In terms of payment method analysis, Visa, MasterCard, and
American Express are the top three payment methods used by Gap customers, so Gap should
continue its cooperation with these credit cards companies. Customers across all segments tend
to make both online and in-store purchases; however, the most profitable one’s prefer to shop
more in stores.
3
1. Introduction
Gap Inc. is a well-known international specialty retailer, offering clothing, accessories
etc. for all age groups under the Gap, Banana Republic and Old Navy brand names. Gap Inc, has
seen its share of the U.S. apparel market drop from 5.1% to 4.7% over the past five years, which
clearly indicates the diverse nature of the U.S. apparel industry. Higher shares are held by multi-
brand retail chain such as Macy’s (9%) (Forbes, 2015). The remaining share belongs to
numerous specialty and private label brands, fast-fashion players, department stores and pure-
play online retailers such as Amazon (Forbes, 2015). The share is illustrated in the graph below:
Figure 1 U.S. Apparel Market Share (Forbes, 2015)
Fast fashion retailers, which are the key competitors, Zara and H&M lead the list with sales
of $19.7 Billion and $20.2 Billion respectively, Uniqlo $16.6 Billion and Gap $16.4 Billion
globally (Fast Fashion Retailing, annual report 2016). In the US market, the sales figure in Gap
Inc. is $11,989 million. The sales drop in the last three years is depicted in the table below:
Table 1 Gap Annual Sales in 2014-2016 (Gap Annual Report, 2016)
Gap Annual Sales in
the US ($ in Millions)
Overall Gap Old Navy Banana Republic
2016 $11,989 $3,113 $6,051 $2,052
2015 $12,213 $3,303 $5,987 $2,211
2014 $12,672 $3,575 $5,967 $2,405
4
Additionally, Gap has been losing its footing over the past few years with growing
competition. It started losing its touch after the recession when U.S. buyers gradually moved to
fast-fashion players in search of relatively fashion-forward merchandise. The biggest problem for
the brand is: 1) Its lack of clear, unique identity because of its casual and basic offering at relatively
high prices, and 2) The rise of fast-fashion competitors like H&M, Zara, and Forever 21. Millennial
shoppers are not as brand-loyal as they have been in previous cycles, and they are more fashion-
forward and are at the continuous hunt for inexpensive options (Forbes, 2015).
Hence, the reason behind Gap Inc.’s falling market share is:
• The growing buyer affinity towards fast-fashion brands for the search of trendy fashion;
• Move towards departmental stores, as customers are discount hunters for top brands;
• The ongoing online shift, which drives the growth of online retail stores of branded
companies and companies like Amazon.
Study Objective
However, Gap Inc remains among the most preferred destination for casual apparel in the
market, where buyers continue to elude American Eagle, Abercrombie, and Aeropostale. To
address this issue and arrest the decline in market share, the company needs to focus on finding
the best marketing strategies towards different categories of customers that can help attract and
retain them. Therefore, the objective of the study is to explore the different customer segments
that shop at Gap. We do this to understand the characteristics of the key segments of customers
that shop at Gap, by performing K-Means Cluster Analysis on RFM variables, such as recency
and frequency of purchase, the channel used, profit, revenue generated by each segment etc.
Identifying meaningful customer segments will provide insights for the company to make better
business decisions to regain the market share and increase the sales.
5
2. Background
The apparel industry in the U.S. has grown slowly in the past year and Gap has witnessed
a drop in sales especially in the past three years. The following section provides an overview of
the industry and Gap’s background by introducing the market structure, current trends in the
apparel industry, the SWOT analysis and the Google Trend Analysis.
2.1 Market Structure
Table 2 below shows how apparel industry in the US is divided into five different sectors
(Stone, 2015). The apparel industry is mainly driven by the fast-fashion brands, multi-brand
chains & off- price retailers, because consumers are more fashion-forward and look for
inexpensive options.
Table 2 Five Divisions of Apparel Industries
Sectors Examples
Luxury Brands Saks Fifth Avenue, Ralph Lauren, Calvin Klein, Anne Klein
Middle-Market
(private labels)
Gap, Guess, Fast fashion like Zara, H&M, Uniqlo
* Multi-brand chains which sell private labels such as Macy's, Century 21
Downscale J. C. Penney, Kohl's, Sears
Discount Stores Kmart, Meijer, Target, Walmart
Off-Price
Retailers
Burlington Coat Factory, Marshalls, Ross Dress for Less, and T.J. Maxx are
stores that sell designer goods at lower prices, often on a surplus basis.
2.2 The Apparel Industry Trends
Retail sales in the US went up by 5.1% year-on-year to US$1.4 trillion in the first quarter of
2017. The apparel industry in the US experienced slow growth in 2016; the sales of women’s,
men’s and children’s apparel grew by three percent in the US in 2016, to reach $218.7 billion
(Mergent Industry Report, 2017). The apparel retail landscape is highly developed and
competitive with numerous designer brands, fast fashion brands, department stores and multi-
6
brand chains that are competing against each other on design, variety and price. Over the last few
years, affordable fast fashion chains such as Zara, H&M and Forever 21 have taken considerable
market shares from other fashion retailers (Euromonitor International, 2017). Additionally, the
apparel industry in the U.S. is also seeing a major shift from in-store purchases to digital
purchases. This shift towards digital commerce is mainly supported by the increase in digitally
savvy consumers (Euromonitor International, 2017).
2.3 Products and Services
Gap offers “optimistic,” casual, and American style apparel and accessories to its
customers (Gap, 2017). Gap’s products are available online through the company-owned official
website, and offline from the company-operated franchise stores or third-party retailers. For
customers in the U.S., Gap also offers them with omnichannel services, such as order-in-store,
reserve-in-store, and ship-from-store, which connect the brand’s digital stores and physical stores
(Reuters, 2017).
2.4 SWOT Analysis
Table 3 shows the SWOT analysis result of Gap brand in the U.S. market:
Table 3 Gap Brand SWOT Analysis
Strength Weakness
1. Strong product portfolio and brand
recognition 2. Presence of timeless iconic products 3. Strategic supplier relationship
1. Failure to utilize online sales channels
efficiently 2. Declining sales and profits 3. Heavily rely on vendors to sell product
Opportunity Threat
1. Increasing efficiency of online sales 2. Celebrity endorsement 3. Growing use of technology
1. Rapid change in fashion 2. Increased production and operational costs 3. Strong competition in apparel segment
7
2.5 Google Trends
Figure 2 “Gap” Google Trends Chart
The numbers on this chart show the popularity for Gap. This chart shows Gap Inc’s performance
from January 2004 to November 2017. “100” is the peak popularity to have ever happened to the
company. According to the Google Trends chart, December 2005 and December 2006 was Gap’s peak
point. After December 2006, Gap Inc, even at its seasonal high, is still significantly declining. The highest
point Gap Inc has been for the past year was in November 2016, where it hit 63 out of 100, which is still
37% lower than its highest ever peak performance. As you can tell, Gap Inc has been dropping in sales,
but still, Gap Inc’s highest months of performance seem to still be the same each year. Gap Inc has annual
peaks seasonally, around October, November or December due to the holiday season. January and
February of each year tend to be Gap’s lowest performing months. Gap Inc hit its lowest point ever this
past January (2017).
3. Methodology
3.1 Overview
In this research, the K-Means Cluster Analysis is the major method used to identify the
meaningful customer segments for Gap. We first aggregated all the transactional data into a
customer-level dataset for use in our analysis. To prepare the seeding data for the K-Means
8
Cluster Analysis, multiple Hierarchical Cluster Analyses (HCA) were conducted on four
randomly selected subsets, each including 10% of the entire data. For each subset, we conducted
both the Furthest Neighbor Method, and Ward’s Method HCA to find the number of clusters and
the cluster centers. We then used the seeding data to perform K-Means Analysis on the entire
sample to classify each customer data within one segment. The results of extensive analyses
suggest that the most appropriate customer segmentation, covering all meaningful customer
segments, includes seven segments. The complete analysis design is presented in Figure 3.
Figure 3 Overview of the Methodology
9
3.2 Data
This research includes the transactional data of 100,000 random Gap customers in the
U.S. The original dataset presents all orders, revenues, items, order lines, and returns for those
customers in transactions for all channels (stores, web and mobile app). The total 226,129
transactions cover the time from December 16, 2009 to September 17, 2017. The original dataset
was organized by Customer Identification Number with each item purchased listed within each
customer ID. To have a clear understanding of the customer transactions, we aggregated the
original transactional dataset into a new customer-level dataset, where each customer ID has only
one row of its total transactional information.
3.3 Variables
In order to identify the meaningful customer segments for marketing purposes, we chose
to use the RFM model in the analysis. After aggregating the data, we added recency, frequency
and monetary value variables to the dataset. We calculated and chose variables that reflect each
customer’s recency of last purchase, frequency of purchase, and monetary value. Based on the
twenty-three variables that are given, we chose the five variables that are most related to the
RFM model, which are listed below:
Table 4. Five Chosen Variables
Variables RFM Explanation
Revenue Monetary The total revenue that the customer generated
Profit Monetary The total profit that the customer brought
Months Recency The number of months that have elapsed since the
customer’s last purchase
Orders Frequency The number of the orders the customer made
Quantity Frequency The number of the items the customer purchased
We standardized the above five input variables as z-scores to minimize the influence of
different scales on the result.
10
3.4 Analysis
3.4.1 Step 1: Hierarchical Cluster Analysis
We ran multiple Hierarchical Cluster Analyses to identify the initial seeds for the K-
Means Cluster Analysis. Since the HCA requires extensive computational power, we chose to
run it on a randomly selected sample of 10% of the actual data. In total, we created four random
subsets, each containing 10% of the entire data, to run the HCA, referred as Subset 1, Subset 2,
Subset 3, and Subset 4.
To see which method would provide the most appropriate result, we used both Ward’s
Method and Furthest Neighbor Method for each subset, using distance measure of Squared
Euclidean. After running HCA on Subset 1, we identified that a 6-cluster solution is more
appropriate for the Ward’s Method (Table 5), while a 5-cluster solution is more appropriate for
the Furthest Neighbor Method (Table 6).
Table 5 The 6-Cluster Solution in Ward’s Method (Subset 1)
Subset 1 Ward’s Method
1 2 3 4 5 6
Mean Mean Mean Mean Mean Mean ZREVENUE 1.65744 0.19961 -0.13703 -0.28055 6.91241 38.72547
ZPROFIT 1.74353 0.22499 -0.14755 -0.28159 6.99852 32.35639
ZMONTHS -0.56402 -0.81831 0.86331 -0.86787 -0.82083 0.04509
ZORDERS 2.31266 0.38460 -0.20031 -0.36093 7.45791 11.64167
ZQUANTITY 2.03333 0.31712 -0.17605 -0.34999 8.07491 9.80997
Table 6 The 5-Cluster Solution in Furthest Method (Subset 1)
Subset 1 Furthest Method
1 2 3 4 5
Mean Mean Mean Mean Mean
ZREVENUE -0.01267 10.67675 12.21464 41.99720 35.45373
ZPROFIT -0.00956 10.32092 12.85875 32.41938 32.29339
ZMONTHS 0.01226 -0.89887 -1.04348 -1.50140 1.59158
ZORDERS -0.00117 20.28354 5.78440 23.64426 -0.36093
ZQUANTITY -0.01350 13.67901 16.57238 14.01545 5.60449
11
We conducted the same analysis on the other three subsets and got the other three pairs of
HCA tables as Subset 1 (see Appendix A). After running all the HCA, we got eight segmentation
tables in total from four randomly selected data subsets.
3.4.2 Step 2: K-Means Cluster Analysis
Based on Table 5 and Table 6, we created two new seeding data files, including the
number of clusters and the initial cluster centers for the following K-Means Cluster Analyses.
We then performed the K-Means Analyses on the entire sample of over 200,000 cases using the
two seeding data files from Subset 1, and generated two K-Means Tables (See Table 7 and Table
8). K-Means Analyses were conducted on 100% of the data using initial seeds from the other
three subsets as well, so that we generated the other six K-Means tables (See Appendix B).
Table 7 The 6-Cluster K-Means Solution (Initial Seeds from Subset 1)
K-Means Cluster (Subset 1: Ward's)
1 2 3 4 5 6
Mean Mean Mean Mean Mean Mean
REVENUE 1159.33 450.87 121.35 121.61 3717.28 38622.38
PROFIT 616.43 243.31 63.90 67.01 1880.86 20278.65
MONTHS 25 31 62 19 21 37
ORDERS 5 3 1 1 10 4
QUANTITY 12.67 5.34 1.62 1.60 31.64 165.00
COUNT 2277 11250 45947 40218 203 2
Table 8 The 5-Cluster K-Means Solution (Initial Seeds from Subset 1)
K-Means Cluster (Subset 1: Furthest)
1 2 3 4 5
Mean Mean Mean Mean Mean
REVENUE 129.79 626.74 136.07 38622.38 2303.81
PROFIT 68.35 337.02 74.92 20278.65 1189.97
MONTHS 62 29 19 37 22
ORDERS 1 3 1 4 7
QUANTITY 1.71 7.26 1.76 165.00 21.67
COUNT 47513 8360 43313 2 709
12
In the above tables, the 6-cluster solution (Table 7) appears to be more meaningful to us
that the other (Table 8) since it identifies a different segment of customers (Segment 6) with high
revenue (38622), high profit (20278) and large quantity of items purchased (165). This customer
segment has the highest profitability, which distinguishes segment 6 from all the other segments,
so Gap should include it in the marketing plan. Thus, through the comparison of the two K-
Means solutions with initial seeds from Subset 1, we chose the 6-cluster solution (Solution 1).
We then did the same comparison within each pair of K-Means tables using the initial
seeds from the same data subset and got the final four choices (See Appendix C). We chose the
5-cluster solution using Ward’s Method for the second subset pair (Solution 2), the 7-cluster
solution using Furthest Method for the third subset pair (Solution 3) and the 7-cluster solution
using Ward's Method for the fourth subset pair (Solution 4).
3.4.3 Step 3: Chose the Final Segmentation
Among the final four K-Means solutions, we first compared Solution 1(Table 9) with
Solution 2 (Table 10).
Table 9 Solution 1(Ward’s Method)
Solution 1
1 2 3 4 5 6
Mean Mean Mean Mean Mean Mean
REVENUE 1159.33 450.87 121.35 121.61 3717.28 38622.38 PROFIT 616.43 243.31 63.90 67.01 1880.86 20278.65
MONTHS 25 31 62 19 21 37
ORDERS 5 3 1 1 10 4 QUANTITY 12.67 5.34 1.62 1.60 31.64 165.00 COUNT 2277 11250 45947 40218 203 2
13
Table 10 Solution 2 (Ward’s Method) Solution 2
1 2 3 4 5
Mean Mean Mean Mean Mean
REVENUE 535.99 126.17 129.05 1784.40 15364.78 PROFIT 288.89 66.44 71.09 931.61 7752.80
MONTHS 29 62 19 24 20
ORDERS 3 1 1 6 18 QUANTITY 6.33 1.67 1.68 17.54 89.45
COUNT 9907 46872 41866 1241 11
We found that Solution 1 performs better in identifying the meaningful customer
segments since it identifies a most profitable customer segment (Segment 6) with an average
profit of 20,279, while Solution 2 fails to do so. In addition, Solution 1 has a better division of
customers with high profits as seen in segments 1, 5, and 6. On the other hand, in Solution 2, the
highly profitable customers all gathered in Segment 4 (profit=931, count=1241) and Segment 5
(profit=7753, count=11). In sum, we think Solution 1 is a more appropriate segmentation.
We compared Solution 3 and Solution 4 through the same procedure. The two 7-cluster
solutions have similar segmentation results. However, we chose Solution 3 using the Furthest
Neighbor Method initial seeds, since we believe the Furthest Neighbor Method can better tell the
differences among the customer segments.
Finally, we compared Solution 1(reference Table 9) with Solution 3 (Table 11).
Table 11 Solution 3 (Furthest Method)
Solution 3
1 2 3 4 5 6 7
Mean Mean Mean Mean Mean Mean Mean
REVENUE 118.32 400.67 117.09 891.64 2114.33 6937.52 38622.38
PROFIT 62.28 216.55 64.49 477.20 1103.71 3406.11 20278.65
MONTHS 62 32 19 26 22 24 37 ORDERS 1 2 1 4 7 12 4
QUANTITY 1.58 4.79 1.54 10.08 21.41 42.95 165.00
COUNT 45246 11835 39091 3119 562 42 2
14
The tables show that segments 1, 2, 3, 4, 6 from Solution 1 (Table 9) are accordingly
similar to segments 4, 2, 1, 3, 7 from the Solution 3 (Table 11). In addition to the aforementioned
segments, Solution 3 identifies more highly profitable customers in segment 5 (profit=1103,
month=22, count=562,) and segment 6 (profit=3406, month=24, count=42), a total of 604
counts. Yet in Solution 1, there is only 203 counts in segment 5 (profit=1880, month=21). The
customer segments with high profit and more recency are more valuable in Gap’s marketing
plan. Thus, based on the chosen RFM variables, we finally divided the customers into seven
groups using the initial seeds of Furthest Method in the third subset.
4. Post-Hoc Analysis
Table 12 shows our final segmentation of Gap customers. We conducted five post-hoc
analysis on this 7-cluster solution, including the Latest Purchase Analysis, Purchase Frequency
and Volume Analysis, Payment Method Analysis, Channel Analysis, and Geographic Analysis.
Table 12 Final Segmentation Final Solution
1 2 3 4 5 6 7
Mean Mean Mean Mean Mean Mean Mean
REVENUE 118.32 400.67 117.09 891.64 2114.33 6937.52 38622.38
PROFIT 62.28 216.55 64.49 477.20 1103.71 3406.11 20278.65
MONTHS 62 32 19 26 22 24 37
ORDERS 1 2 1 4 7 12 4
QUANTITY 1.58 4.79 1.54 10.08 21.41 42.95 165.00
COUNT 45246 11835 39091 3119 562 42 2
4.1 Latest Purchase Analysis (Recency)
Segment 1(month=62) and segment 3 (month=19) identify the least recent and most
recent Gap customer groups respectively (see Table 12). Nearly half of the customer samples fall
into segment 1 (count=45246), yet it has the least recency and generate less profit. The
15
customers in this segment have not shopped with Gap for nearly five years. On the contrary,
although segment 3 (count=39091) has a large count number and less profits, it identifies the
most recent shopper segment, the group of people who shopped at Gap within two years.
Segments 4 (month=26), 5 (months=22), and 6 (months=24) show the customers shopped at Gap
around two years ago, which are the second most recent shoppers. Those segments with high
recency will be the ones the company should put more marketing efforts on.
4.2 Purchase Frequency and Volume Analysis (Frequency)
Table 12 also shows the number of orders and the quantity of purchase that each segment
has made during the time period from December 2009- September 2017. The table above depicts
that segment 6, one of the top two most profitable customer segments, as the most frequent
purchasers (12 times) with 43 times in the time frame. Followed by segment 5 which is the third
most profitable segment (7 purchases) who bought 21 times in the time frame. In addition, the
most profitable segment (segment 7) has bought the most number of items (165 items), however
their frequency of purchase is less as compared to segment 5 and 6. Other segments (i.e. segment
1) ordered less (1 time and only 2 orders). Similarly, segment 2 (2 times and 5 orders), segment 3
(1 time and only 2 orders) and segment 4 (4 times and 10 orders).
4.3 Payment Method Analysis
Among the 14 payment methods used by GAP customers, we kept four major methods
(American Express, Discover, Mastercard and Visa) and recoded the other less frequently used
methods into a new category named “Others.” Based on the data, nearly half of GAP’s customers
use Visa to make purchases. Respectively, customers selected Mastercard, American Express
and Discover as their second, third and fourth choices for payment methods. The percentages of
each payment method remained relatively constant across each segment (see Table 13).
16
Table 13 Payment Method Used by Gap Customers
Payment Method Structure
1 2 3 4 5 6 7
Count % Count % Count % Count % Count % Count % # %
AX 7847 17% 6163 16% 2247 19% 618 19% 146 24% 18 41% 0 0%
DI 2267 5% 2050 5% 674 6% 183 6% 22 4% 2 5% 0 0%
MC 12586 28% 9156 24% 3049 25% 826 25% 147 24% 10 23% 0 0%
VI 21153 47% 17931 46% 5388 45% 1457 44% 261 43% 12 27% 1 50%
OTHERS 1219 3% 3518 9% 696 6% 221 7% 26 4% 2 5% 1 50%
(AX=American Express; DI=Discover; MC=MasterCard; VI=Visa; Others= Amazon Pay Method; Bill Me Later; Diner’s Club;
Money Order; Multi Credits; Open Account; Personal Check; Prepaid Exchange; PayPal; Invalid CC Number)
4.4 Channel Analysis
Table 14 demonstrated that each cluster has different preferences for purchasing
channels. Among all 7 clusters, “app channel” generates the least customer traffic and its
proportion of all purchases remains at a relatively low level. There is an inverse relationship
between the proportions of “store channel” and “website channel” across all clusters. The
percentage of store purchases increase from segment 1 to segment 7, but the percentage of
website purchases decrease from segment 1 to segment 7.
Table 14 Purchasing Channels of Gap Customers
CHANNEL
1 2 3 4 5 6 7
Count % # % # % # % # % # % # %
APP 1118 2% 1574 4% 584 5% 199 6% 26 4% 1 2% 0 0%
STORE 15834 35% 10818 28% 6496 54% 2227 67% 431 72% 34 77% 2 100%
WEBSITE 28120 62% 26426 68% 4974 41% 879 27% 145 24% 9 20% 0 0%
(# = count, % = percentage)
17
4.5 Geographic Analysis (Zip Codes)
We first wanted to look at the geographical location of most profitable consumers.
For Segment 6, we discovered that it ended up being scattered across the entire country (see
Figure 4). For Segment 7, our second most profitable segment, we saw that the two locations
where our customers were from Mt Vernon, Illinois and Conroe, Texas (See Appendix D).
Figure 4 Map of the Segment 6
From looking at the Zip Codes per each cluster, since there were so many for most of our
final segments, we decided to use Zip Code Demographic Data from the U.S. Census to help
analyze more in depth into these geographical locations, to be able to find the population per zip
code, the average income of the zip codes, the median income of the zip codes, the average age,
and the average female percentage of these locations Using the Match Files syntax command,
this U.S. census data was appended to the existing SPSS file. These zip codes were matched
using the “Tables” subcommand, just in case if the same zip code was used multiple times.
Through the analysis, we found that the mean income seems to be the most scattered (See
Appendix D). Since the household mean income according to the U.S. Census Bureau is 72,641,
all of these, but Segment 7 were above the mean household income. Segment 7 also happens to
be the most profitable segment, which could potentially mean that they are “aspirational buyers,”
18
but since they are only a segment of 2, the understanding of the group is also not clear. Through
our analysis, we were able to see that every segment, but Segment 6 were from an upper-middle
class area.
5. Discussions and Recommendations
5.1 Overview
Through cluster analysis, we divided the customers into 7 segments. Segment 1 generated
the lowest profit (profit=62.28), brought the least quantity of items (quantity=1.58) and had the
least recency (months=62). It will cost a great amount of resources to implement strategies for
customers in segment 1 so we named it the “Lost Customers”. In segment 2 each customer
generates about $216.55 in profits and buys 4.79 items, yet the recency of those customers is low
(month=32). We named segment 2 the “Hibernating Customers” since there is a large gap
between purchases. Segment 3, the “New Customers” has the highest recency among all
segments. Segment 4 represents the “Customers that Need Attention”, since although it has
relatively high profit and quantity, its recency is relatively low (months=26). Segments 5, 6, and
7 are our highly valuable customers that bring the most profits to Gap. Although segment 5 has a
high profit and quantity, they are not as high as segment 6 and 7, so we named it the “Potential
Loyalist”. The segments 6 and 7 generate the highest profits and bring the highest volumes of
purchased products. As a result, we group segment 6 and segment 7 together and named it the
“Champion Customers”.
19
5.2 Recommendations
5.2.1 Segment 3 (Revenue= 117, profit= 64, Month=19, Order=1, Quantity= 1 & count= 39091).
This is a meaningful segment to be focused upon as it represents the largest group of most
recent purchasers. Segment 3 is the “New Customer” segment. Customers who have purchased
from a company recently are more likely to buy from that company again than customers who
have not shopped at Gap for a while. This segment should be leveraged in order to retain them
for more purchases. We recommend the following actions to attract and retain this segment:
1) Take feedback from these customers on their recent purchase to avoid any post purchase
dissonance and understand their intent for future purchase.
2) Send customer special promotion codes through email or mails to encourage them to
shop more in-store or online.
3) Offer special discounts on their second and third purchases from Gap. Customers can
have a discount within 6 months of their next purchase after their first purchase.
4) Provide Gap memberships which customers can use to collect points at every purchase
and redeem the points for monetary discount after a certain value.
5.2.2 Segment 5 (Revenue=2114, Profit=1104, Month=22, Order=7, Quantity=21, count=562)
Segment 5, the “Potential Loyalist”, is also a meaningful segment that worth investing on.
This segment identifies a relatively more recent and high profitable shopper body. On average,
the customers in this segment has been shopped with Gap in the past two years, and have around
7 orders for 21 items. We recommend the following actions to increase their loyalty:
1) Provide “Silver Card” membership program for customers in this segment with special
privileges, such as have preview for sales, free standard shipping on all orders, and
birthday discount (extra 10% off) on the purchase, etc.
20
2) Create a Gap community especially for this segment, organize activities for these
customers, invite those customers to “Yoga with Gap”, “Gap Street Snap Competition”
“Gap Runway Show”.
5.2.3 Segment 6 & 7
The segment 6 & 7, named the “Champion Customers”, are the most valuable customers,
because they generate the most profit for Gap. However, they have relatively low recency
compared with other segments. Along with providing benefits similar to those given to previous
two segments, we recommend the following additional actions to encourage more purchases:
1) Provide a platinum membership to this group which they can upgrade to a gold card
depending on future purchases. The points collected in the cared can be used in exchange
of coupons and discounts. Additionally, all platinum members will be sent the latest look
book and have the advantage to buy products before seasonal new product launch.
2) Offer customization options for special use in certain stores. Customers can print their
own logo or slogan on clothes if they buy many products once.
5.2.4 Payment Method & Shopping Channel
In terms of payment method, Visa is the most popular payment method because nearly
50% of customers use to shop at Gap. Gap can take advantage of this trend by introducing more
promotional and discount campaigns for Gap visa card holders and potential credit card openers.
With regards to the channel choice, the percentage of app usage remains relatively low.
Due to it being so low, we would like to see more Gap app usage in the future. By combining an
in-store experience with the app, it will create a more lively experience for the consumer. For
example, the consumer could scan codes or take photos with app to get potential discount codes
or in-store perks.
21
6. Limitations and Future Research
We realize several limitations in this research and discover further research directions in
the analysis. For this research, we only used Gap’s internal data, including frequency, recency,
and monetary value, to segment Gap customers. In the future research we could include more
data from customers through social media platforms, syndicated data, and other external sources.
The future researchers could focus on psychological factors that affect customers’ purchasing
behavior. The combination of the RFM model and the psychological data analysis could provide
a more comprehensive understanding of customer needs. Through RFM model, researchers can
identify customers segments from the least profitable to the most profitable, while with further
consumer behavior researches, researchers could identify and accommodate more effective and
practical strategies. Additionally, there would be certain moderating factors such as store
location, influencing the RMF variables (recency and frequency of purchase), which Gap should
study more in detail by conducting a consumer survey, in order to strategize on those factors that
are of value.
22
REFERENCES
Dudovskiy. (2016, October 23). Gap Inc. SWOT Analysis: Declining Sales and Profits Despite
Strong Brand Portfolio. Retrieved November 15, 2017, from https://research-
methodology.net/gap-inc-swot-analysis/
Fast Retailing. (2016, August 31). Fast retailing annual report 2016. Retrieved from
https://www.fastretailing.com/eng/ir/library/pdf/ar2016_en.pdf
GAP. (2016). GAP annual report 2015. Retrieved from
http://www.gapinc.com/content/dam/gapincsite/documents/GPS%202015%20Annual%2
0Report.pdf
Google Trends. (2017). Explore search interest for Gap Inc. by time, location and popularity on
Google Trends. Retrieved November 15, 2017, from
https://trends.google.com/trends/explore?date=all&geo=US&q=%2Fm%2F01yfp7
Monllos, K. (2015, June 17). The Gap's Biggest Problem Is That It Lost Its Brand Identity.
Retrieved November 15, 2017, from http://www.adweek.com/brand-marketing/gaps-
biggest-problem-it-lost-its-brand-identity-165367/
Putler. (2017, August 10). RFM Analysis For Successful Customer Segmentation. Retrieved
November 15, 2017, from https://www.putler.com/rfm-analysis/
Reuters. (2017). Gap Inc (GPS) Company Profile. Retrieved November 15, 2017, from
http://www.reuters.com/finance/stocks/companyProfile/GPS
23
Sample Essay on SWOT Analysis of GAP Inc. (n.d.). Retrieved from
http://www.essaysexperts.net/blog/sample-essay-on-swot-analysis-of-gap-
inc/#sthash.V8aDy8DP.X7o2dhoh.dpbs
Trefis Team. (2015, July 15). Gap Inc Is Gradually Losing Its Share In The U.S. Apparel Market
To Fast-Fashion Counterparts. Retrieved November 15, 2017, from
https://www.forbes.com/sites/greatspeculations/2015/07/15/gap-inc-is-gradually-losing-
its-share-in-the-u-s-apparel-market-to-fast-fashion-counterparts/#7c12f7ddb0e1
The Gap, Inc. - Financial and Strategic Analysis Review. (n.d.). N.p.: Business Insights.
Retrieved from
http://bi.galegroup.com.avoserv2.library.fordham.edu/essentials/showpdf?pdfdocid=3037
47_GDRT29527FSA
Wahba, P. (n.d.). Gap brand's sales declines keep getting worse. Retrieved from
http://fortune.com/2015/05/11/gap-sales-declines-worse/
24
Appendices A: HCA Tables
Table 1 Subset 2- Ward’s Method
Subset 2 Ward Method
1 2 3 4 5
Mean Mean Mean Mean Mean
ZREVENUE 0.94205 -0.19971 -0.14097 4.42495 30.06372
ZPROFIT 0.97994 -0.21004 -0.13650 4.46652 26.43433
ZMONTHS -0.20332 0.79516 -0.92219 -0.46824 0.38652
ZORDERS 1.41663 -0.24457 -0.14724 3.61405 3.96001
ZQUANTITY 1.46958 -0.23597 -0.20537 4.40964 10.06230
Table 2 Subset 2- Furthest Neighbor Method
Subset 2 Furthest Method
1 2 3 4 5
Mean Mean Mean Mean Mean
ZREVENUE -0.04674 5.89913 7.91898 29.30347 32.34450
ZPROFIT -0.04559 6.08823 7.11438 25.07690 30.50664
ZMONTHS -0.01019 -0.74564 -0.03391 1.02922 -1.54157
ZORDERS -0.03848 7.81862 1.17540 0.91935 13.08198
ZQUANTITY -0.04419 7.78511 1.99338 4.70732 26.12723
Table 3 Subset 3- Ward’s Method
Subset 3 Ward’s Method
1 2 3 4 5 6
Mean Mean Mean Mean Mean Mean
ZREVENUE 1.61787 -0.27738 -0.15669 0.28409 9.68240 127.35973
ZPROFIT 1.67043 -0.29111 -0.15252 0.28549 9.11473 130.10660
ZMONTHS -0.69649 0.92380 -0.88315 0.61350 -0.81577 0.14551
ZORDERS 2.39766 -0.36093 -0.17674 0.38067 4.87055 1.55949
ZQUANTITY 2.24517 -0.33771 -0.20573 0.40711 8.45841 71.20998
25
Table 4 Subset 3- Furthest Neighbor Method
Subset 3 Furthest Method
1 2 3 4 5 6 7
Mean Mean Mean Mean Mean Mean Mean
ZREVENUE -0.05878 7.40703 2.39440 10.86609 21.18797 35.45373 127.35973
ZPROFIT -0.05976 7.37253 2.50112 10.68498 15.20536 32.29339 130.10660
ZMONTHS 0.01557 -0.80631 -1.01619 -1.47462 -1.13989 1.59158 0.14551
ZORDERS -0.06972 3.39641 5.81495 17.88302 4.92021 -0.36093 1.55949
ZQUANTITY -0.05564 7.65237 3.19972 15.80979 8.12778 5.60449 71.20998
Table 5 Subset 4- Ward’s Method
Subset 4 Ward’s Method
1 2 3 4 5 6 7
Mean Mean Mean Mean Mean Mean Mean
ZREVENUE -0.22654 -0.18702 0.38868 1.03357 3.75694 8.39797 32.34450
ZPROFIT -0.24046 -0.18219 0.38853 1.11095 3.77223 8.29956 30.50664
ZMONTHS 0.89809 -0.86659 0.45040 -0.78704 -0.68087 -0.90557 -1.54157
ZORDERS -0.36093 -0.25660 0.61904 1.97579 4.26777 8.70770 13.08198
ZQUANTITY -0.29427 -0.21981 0.39961 1.60970 3.64912 9.49222 26.12723
Table 6 Subset 4- Furthest Neighbor Method
Subset 4 Furthest Method
1 2 3 4 5 6
Mean Mean Mean Mean Mean Mean
ZREVENUE -0.04825 7.65181 5.41134 9.59474 32.34450 13.97667
ZPROFIT -0.04634 7.62888 5.17289 9.58430 30.50664 12.20633
ZMONTHS 0.01684 -0.78088 -0.11902 -0.99260 -1.54157 -1.42107
ZORDERS -0.02678 7.26072 0.78664 15.32246 13.08198 -0.36093
ZQUANTITY -0.04029 8.92682 1.52620 15.24906 26.12723 -0.45140
26
Appendices B:
The K-Means tables using initial seeds from Subset 2, Subset 3 and Subset 4.
Table 1 The 5-Cluster K-Means Solution (Subset 2-Ward’s)
K-means Cluster ( Subset 2: Ward’s)
1 2 3 4 5
Mean Mean Mean Mean Mean
REVENUE 535.99 126.17 129.05 1784.40 15364.78
PROFIT 288.89 66.44 71.09 931.61 7752.80
MONTHS 29 62 19 24 20
ORDERS 3 1 1 6 18
QUANTITY 6.33 1.67 1.68 17.54 89.45
COUNT 9907 46872 41866 1241 11
Table 2 The 5-Cluster K-Means Solution (Subset 2: Furthest)
K-means Cluster ( Subset 2: Furthest)
1 2 3 4 5
Mean Mean Mean Mean Mean
REVENUE 129.79 626.74 136.07 2303.81 38622.38
PROFIT 68.35 337.02 74.92 1189.97 20278.65
MONTHS 62 29 19 22 37
ORDERS 1 3 1 7 4
QUANTITY 1.71 7.26 1.76 21.67 165.00
COUNT 47513 8360 43313 709 2
Table 3 The 6-Cluster K-Means Solution (Subset 3: Ward’s)
K-means Cluster (Subset 3: Ward’s)
1 2 3 4 5 6
Mean Mean Mean Mean Mean Mean
REVENUE 1159.33 121.35 121.61 450.87 3717.28 38622.38
PROFIT 616.43 63.90 67.01 243.31 1880.86 20278.65
MONTHS 25 62 19 31 21 37
ORDERS 5 1 1 3 10 4
QUANTITY 12.67 1.62 1.60 5.34 31.64 165.00
COUNT 2277 45947 40218 11250 203 2
27
Table 4 The 7-Cluster K-Means Solution (Subset 3: Furthest)
K-means Cluster (Subset 3: Furthest)
1 2 3 4 5 6 7
Mean Mean Mean Mean Mean Mean Mean
REVENUE 118.32 400.67 117.09 891.64 2114.33 6937.52 38622.38
PROFIT 62.28 216.55 64.49 477.20 1103.71 3406.11 20278.65
MONTHS 62 32 19 26 22 24 37
ORDERS 1 2 1 4 7 12 4
QUANTITY 1.58 4.79 1.54 10.08 21.41 42.95 165.00
COUNT 45246 11835 39091 3119 562 42 2
Table 5 The 7-Cluster K-Means Solution (Subset 4: Ward’s)
K-means Cluster (Subset 4: Ward’s)
1 2 3 4 5 6 7
Mean Mean Mean Mean Mean Mean Mean
REVENUE 117.65 115.97 392.91 864.41 2053.97 6807.42 38622.38
PROFIT 61.92 63.90 212.27 463.10 1073.45 3342.76 20278.65
MONTHS 62 19 32 26 22 23 37
ORDERS 1 1 2 4 7 12 4
QUANTITY 1.57 1.53 4.70 9.78 20.98 43.11 165.00
COUNT 45072 38818 12054 3305 602 44 2
Table 6 The 6-Cluster K-Means Solution (Subset 4: Furthest)
K-means Cluster (Subset 4: Furthest)
1 2 3 4 5 6
Mean Mean Mean Mean Mean Mean
REVENUE 122.83 474.71 123.84 4562.60 38622.38 1294.70
PROFIT 64.68 256.23 68.23 2306.44 20278.65 684.10
MONTHS 62 30 19 21 37 24
ORDERS 1 3 1 10 4 5 QUANTITY 1.64 5.61 1.62 35.57 165.00 13.96
COUNT 46255 10878 40715 121 2 1926
28
Appendices C: Final Four Solutions
Table 1 K-Means Solution 1
Solution 1
1 2 3 4 5 6
Mean Mean Mean Mean Mean Mean
REVENUE 1159.33 450.87 121.35 121.61 3717.28 38622.38 PROFIT 616.43 243.31 63.90 67.01 1880.86 20278.65
MONTHS 25 31 62 19 21 37
ORDERS 5 3 1 1 10 4 QUANTITY 12.67 5.34 1.62 1.60 31.64 165.00 COUNT 2277 11250 45947 40218 203 2
Table 2 K-Means Solution 2
Solution 2
1 2 3 4 5
Mean Mean Mean Mean Mean
REVENUE 535.99 126.17 129.05 1784.40 15364.78
PROFIT 288.89 66.44 71.09 931.61 7752.80
MONTHS 29 62 19 24 20 ORDERS 3 1 1 6 18 QUANTITY 6.33 1.67 1.68 17.54 89.45
COUNT 9907 46872 41866 1241 11
Table 3 K-Means Solution 3
Solution 3
1 2 3 4 5 6 7
Mean Mean Mean Mean Mean Mean Mean
REVENUE 118.32 400.67 117.09 891.64 2114.33 6937.52 38622.38
PROFIT 62.28 216.55 64.49 477.20 1103.71 3406.11 20278.65
MONTHS 62 32 19 26 22 24 37
ORDERS 1 2 1 4 7 12 4
QUANTITY 1.58 4.79 1.54 10.08 21.41 42.95 165.00
COUNT 45246 11835 39091 3119 562 42 2
29
Table 4 K-Means Solution 4
Solution 4
1 2 3 4 5 6 7
Mean Mean Mean Mean Mean Mean Mean
REVENUE 117.65 115.97 392.91 864.41 2053.97 6807.42 38622.38
PROFIT 61.92 63.90 212.27 463.10 1073.45 3342.76 20278.65
MONTH_NO 62 19 32 26 22 23 37
ORDER_NO 1 1 2 4 7 12 4
QUANTITY 1.57 1.53 4.70 9.78 20.98 43.11 165.00
COUNT 45072 38818 12054 3305 602 44 2
30
Appendix D: Geographic Analysis
Figure 1. The Map of Segment 7
Table1 The Demographic Information of the Final K-Means Segmentation
Final
Cluster
Population in
Zip Code
Mean
Income
Median
Income
Average Age
in Zip Code
Female
percent
White
percent
1 26,358.66 $ 92,212.70 $ 71,484.54 39.423 51.1% 78.6%
2 25,806.89 $ 91,291.07 $ 70,331.39 39.784 51.1% 78.6%
3 25,956.50 $ 89,054.38 $ 69,230.06 39.484 51.1% 78.4%
4 25,474.86 $ 92,658.61 $ 70,819.11 39.727 51.1% 78.1%
5 24,874.73 $ 98,961.14 $ 74,112.55 40.260 51.2% 77.7%
6 26,831.28 $ 99,722.23 $ 74,611.73 39.799 51.6% 78.6%
7 22,711.00 $ 66,574.44 $ 49,861.57 40.105 51.9% 84.3%
Total 26,099.66 $ 90,915.90 $ 70,456.50 39.504 51.1% 78.5%
GAP1
2
3
4
5
6
7
“Lost
“hibernating Customers”
“NewCustomers”
ATTENTION”THAT NEED
“Customers
“POTENTIALLOYALIST”
“CHAMPIONCustomers”
“POTENTIALLOYALIST”
“CHAMPIONCustomers”
Customers”
PROFIT
62.28
216.55
64.49
477.2
1103.7
3406.1
20278.7
MonthsQuantity
62
32
19
26
22
24
37
1 - 2
10
4 - 5
21 - 22
42 - 43
165
1 - 2
of items Recommendations
membershipPremium
& ExclusiveOffers
ignore
membershipwith offers
take feedback
send promos
promote appusage
customer segments
Increase salesmanagerial Goal:
payment
method ofPreferred
purchaseor
webin-store
method ofPreferred
paymentvisa