Building customer segments using Principal Component Analysis (PCA)

A very common approach to building and understanding customer segments is through the use of clustering techniques such as Principal Component Analysis (PCA). These clustering techniques will analyze your customer data and see if customers tend to cluster by certain features, or combinations of features. Through such an approach, a marketer can use clusters to define specific segments. For example, running a cluster analysis could end up showing two clusters: one with customers who have high values for variables related to “engagement” (e.g., emails, comments, etc.) while another could be a cluster with lower values for engagement variables, but mid-sized value for purchase-related variables (e.g., number of purchases, number of products, etc.). In this case, the marketer can conclude that two segments, “Engaged Leads”, and “Slightly Engaged Purchasers”, exist within the customer base.

The goal of this tutorial is not to discuss PCA in detail, but rather to show a basic example of testing whether clusters of engaged or purchas-oriented customers exist. For those interested in running in-depth clustering algorithms, numerous tutorials exist. Instead, we show how you can perform such analysis in the Canopy Labs R Console.

Step #1: define what customers do

Begin your analysis by defining how you’ll track customer actions and activities. The Canopy Labs platform already tracks customer engagements, comments, and purchases, so in this case we will define a set of variables around purchasing and engagement. In the case of this example, we define the following variables:

  • Total Spend: how much a customer has spent.
  • Average Time Between Purchases: average number of days between purchases.
  • Average Unit Price: average amount spent per product.
  • Total Products: number of products bought.
  • Number of Engagements: number of times a customer has engaged with the company.
  • Sentiment Score: a customer’s sentiment score when leaving feedback or sending emails.

In most cases, this part of the analysis should track as many variables as possible.

Step #2: run the clustering

Once the variables are defined, use your statistical tool to run the clustering algorithm to see if certain variables or groups of variables help define clusters. This can be done using the R console, and we provide sample code for the variables above.

The plot below shows how defining two new types of variables using the data above can actually yield loose clusters of customers. The actual clusters are circled in red.

segmentation-example
Before launching a full campaign, you should still test and ensure that the clusters are significant and that the variables actually explain customer behavior.

Step #3: name your segments and reach out

By doing an analysis of the clusters above and their actual variables, we can yield information about the clusters themselves. Note that in standard PCA and clustering tutorials, this would focus on analyzing the actual factors and seeing which variables play a role in calculating the clusters themselves.

By analyzing the clusters above, we see that the cluster on the left represents people who do not make many purchases but are very engaged — e.g., “Engaged” customers. The cluster on the right represents people who spent a relatively large amount and are also somewhat engaged, “Spenders”. Note that the axes in this chart represent a conglomeration of the variables we listed in Section 1. In this case, pc1 is an index of engagement based on the many engagement variables we analyzed. On the other hand, pc2 is an index of purchasing — note how the left cluster tends to be near values of 0 for pc2 and you can see why this represents customers who rarely make purchases.

Already, with a relatively simple analysis and small amount of scripting, we begin to unravel a market segmentation. By adding more variables and exploring additional clusters, we could generate more segments and ideas around them. If you are interested in using similar strategies around your data, let us know.




Written by Wojciech Gryc

Wojciech Gryc is the CEO of Canopy Labs. Prior to Canopy Labs, Wojciech was a consultant with McKinsey & Co. and a researcher at IBM Research. Wojciech is a Rhodes Scholar and Loran Scholar.

Leave a Reply

Your email address will not be published.