Long memory and crude oil’s price predictability

This paper discusses the usefulness of the long term memory property in price prediction. In particular, the Hurst’s exponents related to a wide set of portfolios generated by three crude oils are estimated by using the detrended fluctuation analysis. To this aim, the daily empirical data on West Texas Intermediate, Brent crude oil and Dubai crude oil for a period of more than 10 years have been considered. It is shown that specific combinations are associated to persistence/antipersistence long-run behaviors, and this highlights the presence of statistical arbitrage opportunities. Such an outcome shows that long term memory can effectively serve as price predictor.


Introduction
A great strand of literature on time series deals with the analysis of the so-called persistence or long term memory property. These two characteristics of the time series lead to an accurate study of the long-run process behaviour with a focus on the autocorrelation. The formalization of the concept of long term memory property was defined by Hurst (1951) for the specific case of hydrological time series. The author analyzed the reservoir control of the Nile flow for the project of a river dam, and identified a parameter H ∈ (0, 1) (the so-called Hurst's exponent) associated to the rate of decay of the autocorrelation as a function of the autocorrelation lag.
If H = 0.5, the current value of the series would not dependent of past values of the series, so that the time series is uncorrelated. When the value of H belongs to the interval [0, 0.5], the series becomes anti-persistent. Anti-persistent series describe 'mean-reverting' processes. If a value in the time series is high in one specific time interval, it is likely to reduce in the following one, dropping toward the mean value. The strength of the mean reverting behavior increases as Hurst's exponent approaches to zero. When the range of H exponent varies between 0.5 and 1, the series is persistent, which means that it is trend reinforcing. The strength of the persistent behavior increases as H approaches 1.
Among others, Corazza and Malliaris (2002), Cajueiro and Tabak (2004), Kyaw et al. (2006), Singh and Prabakaran (2008), Kloeden et al. (2011), Giles (2008) and Potgieter (2009) show that the analysis of the main features of the time series provides some key information on the prediction of related phenomenon. Baillie (1996) suggests that the existence of a long term memory associated with slow decay of autocorrelation functions in asset returns indicates the existence of exploitable market inefficiencies. Booth et al. (2018) propose a novel agent-based simulation for exploring algorithmic trading strategies. They use the Hurst's exponent to identify long memory processes. Zhao et al. (2015) focus on multifractal theory for investigating statistical properties of enormous and irregular datasets. Multifractal structure diagnosis, tendency and singularity analysis are applied to oil prices data and spatial physical data obtaining good performance. Castellano et al. (2018) use the Hurst's exponent to explore the long term memory property of the volatility of a new temperature index that they propose. They find some long-run paths and regularities in the index riskiness.
In the wide spectrum of information brought by the assessment of the long term memory property, it is worthy to mention the presence of the so-called statistical arbitrage (StatArb, hereafter). The StatArb could be described as the attempt to profit from pricing inefficiencies identified using statistical models. According to Burgess (2000) and Bondarenko (2003) a StatArb is a generalization of the traditional zero-risk or pure arbitrage. In the latter case, gains are received with no possibility of losses. Fair-price relationships between asset pairs with identical cash-flows are constructed and pure arbitrage opportunities are identified when prices deviate from these relationships. For Jarrow et al. (2005) a statistical arbitrage is a long horizon trading opportunity that generates riskless profits. It is a natural extension of the trading strategies utilized in the existing empirical literature on anomalies. StatArb is defined without any reference to any equilibrium model, therefore, its existence is inconsistent with market equilibrium and, by inference, market efficiency. We could say that StatArb enables the rejection of market efficiency without invoking the joint hypothesis of an equilibrium model and replacing it with an assumed stochastic process for trading profit. For the concept of StatArb, see also Burgess (1999), Elliott et al. (2005), Do et al. (2006), Bertram (2010), Avellaneda and Lee (2010). The term StatArb was used for the first time in the 1990s and remained widely used by operators in financial markets until 2002. However, by 2000 dramatic changes in market dynamics led to weak performance of existing models and, consequently, StatArbs started to command less attention in the market. According to Pole (2007), renewed interest for them returned only in 2006, when more accurate algorithms secured better results.
We refer to StatArb as a zero-cost trading strategy for which the expected payoff is positive, and the conditional expected payoff in each final state of the economy is nonnegative, in a finite-horizon economy.
The persistence properties of a time series and StatArb are linked through the concepts of strong stationarity and cointegration (Engle and Granger 1987). In fact, it is important to stress that the cointegration of the prices leading to a (strongly) stationary process identifies the presence of statistical arbitrage for some portfolios generated by the assets themselves.
This paper deals with this theme. It is here proposed the employment of the persistence properties as describing the strong stationarity of the price of a portfolio-hence leading to StatArb-or as describing weak stationarity. In the latter case, some words can be properly spent for the prediction of the portfolio price. The paradigmatic case of commodities is analyzed with a peculiar focus on crude oil markets. In the recent literature a lot of authors as, for example, Ortiz-Cruz et al. (2012) and Kristoufek and Vosvrda (2014), investigate the efficiency of crude oil markets, and many others analyze long-run dependence phenomena for crude oil prices. We recall Alvarez-Ramirez et al. (2002) and Serletis and Andreadis (2004) that study the long-run memory mechanism that affects the crude oil price evolution, and Tabak and Cajueiro (2007) that show the temporal movement of the crude oil market towards efficiency. Alvarez-Ramirez et al. (2008) empirically find evidences of long-run autocorrelations in crude oil markets towards efficiencies and they analyze also short-term autocorrelations on the basis of the estimation of the Hurst's exponent dynamics. The authors employ the detrended fluctuation analysis as statistical methodological tool. Wang and Liu (2010) extend the existing literature by testing for the efficiency of WTI crude oil market through observing the dynamic of local Hurst's exponents. They apply the method of rolling window based on multiscale detrended fluctuation analysis, and find that large fluctuations of WTI crude oil market have high instability, both in the short-and long terms, while small fluctuations are persistent. Zhang and Ji (2018) discuss the long term memory property of the oil-gas price relationship in order to clarify whenever such a link is of permanent type or of transitory nature.
This paper adds to this strand of literature. Specifically, we consider one of the typical examples of long-run relations on commodity markets: the relation among three crude oils, quoted in different markets, WTI, Brent and Dubai. Indeed, it is natural thinking that three assets, having same specific features and supply and demand with the same characteristics, have prices that are influenced by market rumors with the same magnitude and incremental direction.
In order to face the problem, we propose a statistical-based analysis of the empirical portfolios obtained by the available commodities data, and discuss the long-run properties of them through the estimation of the Hurst's exponent H .
There is a wide set of quantitative tools for estimating the value of H [see e.g. Kirichenko et al. (2011)]. Among them, the most prominent one is the already mentioned Detrended Fluctuation Analysis [DFA hereafter, Peng et al. (1994)]. Indeed, this procedure overcomes some shortcomings of the rescaled range R/S procedure of Hurst (1951). This explains the popularity of the DFA for estimating H and also our choice to employ DFA for developing the analysis in the present paper. For theory and discussion on DFA, we refer the interested reader to Peng et al. (1995), Hardstone et al. (2012), He and Chen (2011).
Our paper is close to Cerqueti et al. (2018), which deals with the long memory of crude oil portfolios, but departs from it and extends it in many respects. Indeed, the quoted paper considers only a scenario analysis on a few cases of portfolios-eleven of them, to be precise-and makes inference on the statistical hypothesis that their prices have Hurst's exponent H = 0 or H = 0.5. We here implement a global analysis on a very large set of portfolios and derive the Hurst's exponents of their prices. This allows us to obtain simulated paths and Hurst's exponents distributions, to have more insights on the dynamics of the mispricing portfolio prices. Moreover, we are also able to explore the relationships between the Hurst's exponents and the shares of capital involved in the considered crude oils, hence answering to the key question on how the different capital allocation rules affect the long memory property of the corresponding portfolios. Importantly-and differently with Cerqueti et al. (2018)-we are here able to discuss also some stylized facts in commodity finance on the basis of the paramount view of the Hurst's exponents. More in details, some interesting results have been here obtained: first, a wide part of mispricing portfolios exhibits an antipersistent long-run behavior, with Hurst's exponent H < 0.5; second, we have shown the existence of some portfolios following a geometric Brownian motion, which is strongly connected to the presence of statistical arbitrage opportunities; third, the Hurst's exponents of the portfolios vary with an unexpected regularity as the quotes of portfolio change; fourth, in no cases one can observe noteworthy long-run persistence of the related portfolios, hence confirming the mean-reverting nature of the commodities portfolios prices; fifth, the simulated trajectories when H = 0.5 represent a replication of the observed ones with one time lag. This last finding is of particular interest, since it states that the estimation of the Hurst's exponent might lead to an excellent device for price predictability.
The rest of the paper is organized as follows: Sect. 2 contains the formalization of the model; Sect. 3 is devoted to the description of the data and of the employed methodology; Sect. 4 describes and discusses the outcomes of the analysis; last Section offers some conclusive remarks.

The model
We define a filtered probability space ( , F, (F t ) t≥0 , P) which satisfies the usual conditions over an infinite horizon [0, ∞), and where P is the statistical probability measure.
We consider a commodity market populated by J > 0 commodities. The price at time t > 0 of the j-th commodity is C j t , for each j = 1, 2, . . . , J . We state that there is a reference commodity in the market, whose price at time t is denoted by T t . An investor in the market considers the reference commodity as a target commodity that can be replicated through a portfolio of the J no-reference commodities. The replicating portfolio has the value Z t at time t. Then the following statistical fair-price relationship holds for a generic t: where E[·|F t ] is the expected value under the objective probability measure conditional to the information available at time t, F t . Relation (1) represents a long term relationship among variables that is broken when a mispricing of the considered commodities causes a deviation T t − Z t , t ≥ 0. Then the mispricing is a long-short portfolio, with the assumption of a long position on the target commodity and a short position on a synthetic asset (or viceversa).
Let M t be the price of the long-short portfolio at time t; we have: where Z t is the price of the synthetic asset at time t and (β 1 , β 2 , . . . , β J ) is its replication portfolio.
In particular, an appropriate selection of the parameters (β 1 , β 2 , . . . , β J ) might lead to the statement of the cointegration among the J +1 commodities of the market. The cointegration represents the financial concept of long-run equilibrium among asset prices, and is strongly related to the long term memory of the resulting cointegrated process. Indeed, if the J + 1 series are cointegrated and have a unitary root, then there exists a linear combination of them which is a stationary process. Under the perspective of the Hurst's exponent, stationarity means H = 0.5. In the peculiar case of the financial series we deal with, stationarity is also viewed as presence of statistical arbitrage opportunities. We will enter the details below, when we deal with the analysis.

Data and methodology
We consider a set of commodities, crude oils: Each contract is traded until the close of business on the third business day prior the 25th calendar day of the month preceding the delivery month and it is assumed that the investor will roll over the front month pair contracts the first day of the trading month.
The time interval [0, +∞) is conveniently discretized by introducing an increasing sequence of trading dates {t i } i∈N .
In our case, WTI crude oil is the reference commodity with price T t i at time t i and Brent and Dubai crude oils are held in the replicating portfolio whose value is Z t i . Hence, J = 2 in Eq. (2).
Then, the price of the mispricing portfolio is: where β 1 and β 2 are the weights generating the portfolio replicating the synthetic asset, while C 1 t i is the price of the Brent oil and C 2 t i is the price of the Dubai oil at time t i . Definition (3)  H is the slop of the linear plot. If the detrended walk profile is a white noise then the slope is roughly 0.5 and it has no autocorrelations. If the profile is persistent then the slope is greater than 0.5 and the autocorrelations are positive; if it is anti-persistent then the slope is less than 0.5 and the autocorrelations are negative.
The DFA with polynomial fit of order p removes trends of order p − 1.

Model implementation and results
The procedure of model implementation is divided into two parts: i) model estimation and ii) model simulation. According to the latter division, we split the data-set into two time series: the first time series spans from 2000 to 2009 and is used for an in-sample analysis in order to estimate the Hust exponent, H , of {M t } t , whereas the out-of-sample data of 2010 (52 weekly data) are used to assess the functioning of the model in a forecasting perspective. We start by considering 10,000 couples (β 1 , β 2 ) of portfolio (3), such that β 1 + β 2 = 1. β 1 is a random draw from an uniform distribution U (−1.5, 2.5) and β 2 is consequently calculated as β 2 = 1 − β 1 . This means that we consider 10,000 portfolios, each of them representing a choice of investment. For each couple (β 1 , β 2 ) we obtain, by using in-sample data, mispricing portfolio time series according to (3) and so we apply the DFA to estimate the Hurst's exponent for each scenario.
In Fig. 1 we represent the frequency distribution of the Hurst's exponents obtained through the considered portfolios. It is interesting to note that the widest part of mispricing portfolios exhibits an antipersistent long-run behavior, with H < 0.5. This means that the combination of commodities leads often to portfolios with mean reverting prices, hence implying a tendency to the return to the long-run equilibrium. This outcome confirms several studies, stating the property of mean reversion for commodities portfolios [see Geman (2007) for a survey on this field]. It is also worth pointing out the bimodal behavior of the distribution of the Hurst's exponents, with some of them being above 0.5. This outcome meets the evidence that a proper selection of the β's might lead to portfolios whose price has a persistent behavior, with positive autocorrelations on the long-run.
It is also important to point out the presence of some portfolios leading to H = 0.5, which means that statistical arbitrage opportunities can take place. Interesting is also study the relationship between H and portfolio weights. Figure 2 show that if we assume a short selling position (β 1 < 0) on C 1 t i , and consequently a long position on C 2 t i , the mispricing portfolio has a Hurst's exponent lower than 0.5. This fact means that mispricing portfolio dynamics such that a great amount of capital is invested on the Dubai oil by short selling the Brent are weakly stationary and have a mean reverting behavior. In particular we have anti-persistent time series, so that an increase will most likely be followed by a decrease or vice-versa (i.e., values will tend to revert to a mean). This means that future values have a tendency to return to a long term mean. Furthermore, the Hurst's exponent increases with respect to β 1 when β 1 ∈ [−1.5, 1) and decreases for β 1 ∈ (1, 2.5]. Its minimum value is assumed for β 1 = −1.5. Such results provide a confirmation of the meanreversion tendency as the long position on the Dubai is reinforced. Moreover, it is interesting to highlight the substantial symmetry of the values of the Hurst's exponents with respect to β 1 = 1, so that the persistence properties of the mispricing portfolios depend on their distance from the entire capital invested in the Brent oil. This outcome allows to identify the total investment in the Brent oil as a benchmark for assessing the long term memory property of the mispricing portfolio price. The mean-reversion tendency is less evident when considering a long position on both the commodities, with a Hurst's exponent close to 0.5 (greater than 0.4) when β 1 ∈ [0, 1]. The maximum value of the Hurst's exponent is, in this case, 0.5421, and it is taken for β 1 around 1. This means that if we do not allow short selling in the replicating portfolio, the mispricing portfolio dynamics tends to become strongly stationary because H is around the value 0.5, and we can observe that equally weighted portfolio leads to a H equals to 0.5. This specific situation gives insights on the profile of the portfolios for which StatArb can be achieved.
We also find that the bigger the quantity of Dubai oil we short sell, the lower H becomes. We can conclude that a short-selling of one of the commodities in the replicating portfolio leads to the passage from around strong stationarity to antipersistent behavior.
It is important to observe that in no cases there is a very significant long-run trending behavior of mispricing portfolio dynamics, i.e. with H > 0.5. In fact H ∈ (0.5, 0.55) only when we decide to invest a quote of capital β 1 ∈ (0.5, 1.5) in the commodity with price C 1 t i .
We now discuss the stochastic process generating the mispricing portfolio. When H = 0.5, we adopt a fractional Brownian motion with parameter H , so that the dynamics of the mispricing is: where α and σ are positive constants and B H t is a fractional brownian motion with Hurst parameter H . The Hurst parameter H characterizes a fractional brownian motion and determines the behaviuor of the process {M t } t in the following way: (i) H = 0.5 means that {M t } t is a strongly stationary process, and statistical arbitrage opportunities occur. This is the case in which the fractional Brownian motion collapses in the geometric Brownian motion, and we will discuss this peculiar case below; (ii) H ∈ (0, 0.5) means that {M t } t is weakly stationary, antipersistent on the long-run; (iii) H ∈ (0.5, 1) means that {M t } t is weakly stationary, persistent on the long-run and has the long memory property.
Then, when the Hurst's exponent is equal to 0.5, it is the case of statistical arbitrage portfolios. This is the specific case of geometric Brownian motion, so that: where {B t } t≥0 is a standard Brownian motion. It is known that there exists the solution of (5), that is: The calibration of the parameters of Eq. (5) is obtained by writing it in discrete time form, so that the following expression for return dynamics is obtained: where M t = M t − M t−1 , t = 1/52, and α and σ represent respectively the expected return value and the return volatility, whereas M represents the initial time series data. It is straightforward that α and σ are estimated by using in-sample data.
For H = 0.5, we simulate the dynamics of M t according to (6) over 52 weeks and in 10,000 scenarios of Brownian Motion; thus, we can compare the simulated trajectories with the actual trajectory, M t , obtained by building the mispricing portfolio by using the out-of-sample data. Figure 3 shows the comparison between M t and M t in some scenarios, when the selected mispricing portfolio has stationary dynamics (H = 0.5). The red line represents M t , whereas the green line is M t . As we can see from the figure, the simulated mispricing is an accurate estimate of the actual one. In order to have an indicator of accuracy of the model, we calculate the root-mean squared error. We obtained a root-mean squared error of 0.32, meaning that our model possesses a high level of predictive quality. This results is relevant from a financial point of view, because it reflects the forecasting capacity of the model, that can represent a significant tool for developing statistical arbitrage strategy in crude oil markets. In particular, one can simulate the mispricing portfolio dynamics at time t and go one period back to t − 1 for having an excellent estimation of the future prices. We can calculate the error series Er t = |M t − M t | in order to obtain the standard error of each scenario. In Fig. 4 we display the scatter plot of the standard errors versus the Hurst's exponent in all the scenarios with standard error below 0.3. By a visual inspection of the Figure, we can reasonably Interesting considerations can be done from Fig. 5 that shows the relationship between the coefficient α and the standard error. We can say that the standard errors are more scattered as α approaches zero, and this is totally in line with the evidence that the absence of deterministic trend-or, differently, a null expected return-implies dynamics driven by pure randomness.

Conclusive remarks
This paper presents an analysis of the long term memory properties of selected portfolios of commodities. The estimation of the Hurst's exponents H of the prices of such portfolios has been implemented. Empirical data consists of the daily prices of Brent Dubai oil and WTI, being the last one the reference commodity.
The series are proved to be cointegrated, and the considered portfolios exhibit generally an antipersistent and mean-reverting behavior. Therefore, there are opportunities to predict future portfolios prices, in that on the long-run they will tend to adhere to their statistical mean.
Moreover, such a behavior is also driven by the investment strategies on the no-reference commodities.
The presence of StatArb has been also observed. In particular, there are some portfolios for which H = 0.5, and the predictability of the prices has been accordingly explored.
This paper differs and extend the existing literature on the following three main aspects. Firstly, it proposes a statistical-based analysis of the empirical portfolios obtained by the crude oils data, namely WTI, Brent and Dubai. Secondly, the Hurst exponent is estimated through the detrended fluctuation analysis over a very large set of portfolios so that the Hurst's exponents distribution gives insights on the dynamics of the mispricing portfolio prices. Finally, we are also here able to discuss some stylized facts in commodity finance on the basis of the paramount view of the Hurst's exponents, namely antipersistent long-run price behavior, statistical arbitrage opportunities, regularity in portfolio Hurst's exponent changes, mean-reverting price behaviour, and price predictability.
Our results can be effectively employed by the policymakers for making forecast through the analysis of the long-run equilibrium of the oils prices time series. In particular, in the light of the efforts to be spent for having diversified crude oil sources at a country level, the selection of shares of portfolio leading to a specific long term memory property allows also its price prediction.