Predicting Customer Profitability Dynamically over Time: An Experimental Comparative Study

In this paper a comparative study is presented on dynamic prediction of customer profitability over time. Customer profitability is measured by Recency, Frequency, and Monetary (RFM) model. A real transactional data set collected from a UK-based retail is examined for the analysis, and a monthly RFM time series for each customer of the business has been generated accordingly. At each time point, the customers can be segmented by using k-means clustering into high, medium, or low groups based on their RFM values. 12 different models have been utilized to predict how a customer’s membership in terms of profitability group could evolve over time, including regression, multilayer perception, and Naïve Bayesian models in open-loop and closed-loop modes. The experimental results have demonstrated a good, consistent and interpretable predictability of the RFM time series of interest.


Introduction
Over the last decades marketing has made a significant shift from traditional product/brand based to customer-centric and data-driven by intensive use of analytical models and tools. One of the important aspects of applying analytics in marketing is to predict customer profitability over time based on customer purchase history and a certain profitability measure, such as customer life-time value (CLV), and recency, frequency, and monetary (RFM) values. With regard to modelling techniques, there are mainly two categories of models [1]: probabilistic models and machine learning models. A fundamental question to be asked here is that if a customer's profitability is predictable, and what models can be best suitable a given prediction problem [2]. The primary aim of this research is to provide a case study for such predictions.
In this paper, a UK-based online retail is examined for customer profitability prediction. A real transactional data set collected from the business is used for the analysis. The RFM values of each customer are employed as a profitability measure, and an associated monthly RFM time series for every customer can be created consequently based on their historical purchasing records. By using k-means clustering, at any giv-en time point, all the customers are grouped into 3 groups based on their RFM values: high, medium, or low profitability groups. The prediction problem concerned here is how a customer's membership in terms of profitability group would evolve over time. For comparison purpose, 12 different types of models are utilized for prediction including both probabilistic and machine learning models in open-loop and closed-loop modes including regression, multilayer perception (MLP), and Naïve Bayesian models. Choosing RFM-based measure is because of its simplicity and easy interpretability in practice, and the models selected are classic, simple and widely used in business for marketing purpose.
A comparative analysis with the given data set and models has demonstrated a good predictability of the chosen measure for the business under consideration in terms of customer profitability. It also shows how to use the certain context of the business to help with the interpretation of the modelling outcomes.
The remainder of this paper is organized as follows. Section 2 gives a brief discussion on the relevant work. Sections 3 describes in detail the methodology adopted in this work including the creation of RFM-based time series, customers grouping and model selection. Detailed experimental settings and the experimental results are provided in Section 4, and a dissection on the outcomes is given in Section 5. Finally concluding remarks are given on Section 6 along with suggested further work.

Related Work
In recent years, predicting customer's profitability over time has been an active, yet very challenging research topic. In general, such predictions mainly involve three interrelated aspects: a) The nature of the business under consideration; b) Which measure(s) to be used to indicate customer's profitability; and c) Which models to be employed to best fit the modelling requirements. The nature of the business under consideration will be directly linked to what measures could be adopted, for example, an on-line business in the retail industry and a marketing consultancy company in the fashion industry. On the other hand, depending the measures to be adopted, a static or a dynamic model could be applied for modelling purpose.
In [3] a RFM score-based time series was created using k-means clustering analysis and it was used to measure and describe a customer's profitability for an on-line retail. Furthermore, multilayer feedforward neural network models were trained to identify the dynamics in terms of how customer profitability evolved over time.
Interestingly in [4], RFM was employed to calculate customer loyalty and Apriori algorithm was used to determine the association rules of product bundles. In addition the work in [5] suggested convolutional neural network structures for predicting the CLV of individual players of video games, and in [6], recurrent neural networks were proposed for customer behavior prediction based on the client loyalty number and RFM values.
Other measures have been also considered, such as Pareto/NBD (negative binomial distribution) [7].
In summary, the main work in this area appears to be subject/domain-specific and has no unified approaches. CLV and RFM are the most popular measures to reflect customer profitability/loyalty. The most diverse aspect of the relevant research is on modelling approaches, and a range of models have been proposed, from very classic regression models to deep learning paradigms.
This research presents a case study for customer profitability prediction in which multiple models are used with a simple yet practically easy-to-implement profitability measure.

Methodology
This Section gives in detail the main approaches, models, and procedures adopted in this research.

Recency, Frequency, and Monetary Model
The RFM model [8] has received much attention and has been widely used in customer relationship management (CRM) and direct marketing due to its simplicity and effectiveness for evaluating a customer's profitability. Given a set of transactional records of a business over a certain period of time, Recency indicates how recently a customer made a purchase with the business; Frequency shows how often a customer has purchased; and Monetary indicates the total (or average) a customer has spent. Therefore, each customer of the business can characterized by a set of RFM values, and further all the customers can be grouped into meaningful segments based on their RFM values so that various marketing strategies can be adopted to different customer groups accordingly.
Note that a time series of RFM values can be generated for each customer if they are calculated at consecutive time points, such as at the end of each calendar month.

k-means clustering
k-means clustering is one of the most popular algorithms in data mining for grouping samples into a certain number of groups (clusters) based on Euclidean similarity measure. Assume 1 , 2 , ⋯ , are a set of vectors, for instance, a vector represents a customer's RFM values in the form a vector, and these vectors are to be assigned to k clusters 1 , 2 , ⋯ , . Then the objective function of the k-means clustering is expressed as represents the centroid of . The k-means clustering algorithm is shown in Table 1. Table 1. The k-means clustering algorithm.

ber of vectors in a cluster
Step 3: Stop if the centroids 1 , 2 , ⋯ , remain unchanged; Otherwise, go back to Step 1.
In this paper, a group of customers are segmented into three segments using the kmeans clustering based on their RFM values: low, medium, and high profitability groups.

Open-loop Model and Closed-loop Model for Time Series Prediction
Time where (•) donate a mapping, and ̂( ) represents the predicted value of variable ( ) at time t using the prior observed values of the variable at time points t-1, t-2,⋯, t-n.
A closed-loop model can be expressed as which uses the previous predicted values ̂( − 1),̂( − 2), ⋯ ,̂( − ) to predict to the value of variable ( ) at time t.

Model Selection
The mapping (•) in an open-loop or a closed model (Eqs. (2) and (3)) can be in different forms. In this paper, three models are considered for comparison purpose: Linear Regression, Multilayer Perceptron (MLP), and Naïve Bayesian. Linear regression is perhaps the simplest model to be considered. Using this model for prediction, Eqs. (2) and (3) can be re-written, respectively, as and where { | = 0, 1, … )} are regression coefficients.
A multi-layer perceptron can be thought of as a regression model on a set of derived inputs via layered and successive non-linear transformations. In this paper, an MLP is used with a single hidden layer and a linear transformation for output nodes, which can be expressed as where and are connection weights between the ℎ input node to the ℎ hidden node, and the ℎ hidden node to the ℎ output node, respectively; 0 and 0 donate the bias to the ℎ hidden node and the bias to the ℎ output node, respectively; and ℎ ( ) and ̂( ) donate the output of the ℎ hidden node and the ℎ output node, respectively. For the closed-loop model the inputs { ( − )} are substituted by {̂( − )}.

Naïve Bayesian Model
A Naïve Bayesian model is based on Bayes' theorem as shown below

Data Set and Data Pre-processing
A UK-based online retail is considered in this study [3] [9]. A data set was collected from the retail which contains all the transactions occurring from December 2010 to November 2011. The data set has 11 variables as described in Table 2. Note that the data set can be found at: https://archive.ics.uci.edu/ml/datasets/online+retail.
It is worth mentioning that, over the years, the business has been functioning as both wholesale and retail, and has maintained a stable and healthy number of customers. Appropriate pre-processing was carried out to address quality issues of the data set. Outliers and extreme values have been removed. The resultant target data set contains 751 valid customers from the UK.

Settings for Modelling
To start the analysis, a time series of RFM values for each customer was first calculated at the end of each calendar month successively from December 2010 to November 2011, and therefore each RFM time series consists of 12 data points.
Further at each time point of the monthly RFM time series, the customers were grouped using the k-means clustering into three profitability groups as shown in Figure 1, where Recency is by month and Monetary is in Sterling, and symbols '*', '+', and 'o' indicate high, medium, and low profitability groups, respectively. The subgraphs are arranged sequentially. As such, each customer belongs to a certain profitability group at a given time point of the time series. Before conducting the clustering, the RFM values have been normalised by using range normalisation.
Next, the three types of predictive models discussed in the previous section were applied to predict each customer's profitability group using openand closed-loop models. The three profitability groups were encoded into three orthogonal unit vectors [1,0,0], [0,1,0] and [0,0,1], and these vectors were used as the desired outputs of all the models for training to indicate mutually exclusive three classes. Both the openand closed-loop linear regression models had two or three terms. The topology of the MLP models were set to: 3 input nodes, 10 hidden nodes and 3 output nodes. The initial connection weights and biases were generated randomly.
All the models were trained and tested 10 times, and each time 70% of the samples in the data set were randomly selected for training and the remaining 30% for testing.
The data in December 2010 and January 2011 was used as initial inputs for closedloop models. Note that, regardless what predictive models to be used, the training procedures for both the open-loop and closed-loop models are the same; However, when applying a trained closed-loop model, the first n observations will be used as the initial inputs to the model, and then the predicted values will be fed back sequentially to the model as inputs to generate predations in an autonomous manner.

Experimental Results
With the given settings, the relevant experiments were conducted accordingly on how well a customer's membership in terms of profitability groups can be predicted over time. The average prediction accuracies generated by different models are given in Tables 3 and 4.

Discussion
From the experiment results obtained, it is evident that the RFM time series under consideration was well predictable, and a customer's profitability group was stable. Under all the experimental conditions, the prediction models using observations at one previous time point performed well and had a similar performance to those using observations at one previous time points. This can be further interpreted as the transit probability of a customer from one profitability group to another at any two consecutive time points was low.
An examination on the transit probability of the customers from one profitability group to another over time has revealed that, on average, the transit probability was not more than 6%. A summary of the average transit probability is given in Table 5, where the element , , = 1,2,3, in the 3 × 3 matrix indicates the average transit probability from the ℎ group to the ℎ group if ≠ , and the average percentage of customers remained the ℎ group if ≠ .
Since the business has been running as wholesale as well, the prediction results are quite interpretable and understandable. As such, the profitability of a customer in month only depended on the profitability of the customer in month − 1. Therefore, it's not necessary to use more past time points in the prediction. In addition, the MLP and the Naïve Bayesian models were slightly more stable than the regression models.
The open-loop prediction models could achieve 84% accuracy and those models were useful for a short-term prediction. The closed-loop prediction models have achieved an accuracy of 79% and they could be applied for a long-term prediction.

Conclusions and Future Work
In this study, a comparative study has been conducted on predicting customer profitability dynamically based on monthly RFM time series using multiple models. The study shows a good predictability of the time series under consideration. The context of the business of interest has helped to interpret the prediction results. Further work includes: a) Using real transactional data collected over a longer period of time, such as two or three years, to examine the predictability of the RFM time series; b) To investigate how prediction accuracy can be affected by the frequency at which the RFM values are calculated with a given transactional data; and c) Using other possible profitability measures to conduct comparative research.