Approximating multivariate Markov chains for bootstrapping through contiguous partitions

This paper extends Markov chain bootstrapping to the case of multivariate continuous-valued stochastic processes. To this purpose, we follow the approach of searching an optimal partition of the state space of an observed (multivariate) time series. The optimization problem is based on a distance indicator calculated on the transition probabilities of the Markov chain. Such criterion aims at grouping those states exhibiting similar transition probabilities. A second methodological contribution is represented by the addition of a contiguity constraint, which is introduced to force the states to group only if they have “near” values (in the state space). This requirement meets two important aspects: first, it allows a more intuitive interpretation of the results; second, it contributes to control the complexity of the problem, which explodes with the cardinality of the states. The computational complexity of the optimization problem is also addressed through the introduction of a novel Tabu Search algorithm, which improves both the quality of the solution found and the computing times with respect to a similar heuristic previously advanced in the literature. The bootstrap method is applied to two empirical cases: the bivariate process of prices and volumes of electricity in the Spanish market; the trivariate process composed of prices and volumes of a US company stock (McDonald’s) and prices of the Dow Jones Industrial Average index. In addition, the method is compared with two other well-established bootstrap methods. The results show the good distributional properties of the present proposal, as well as a clear superiority in reproducing the dependence among the data.


Introduction and bibliography review
The correct understanding of economic and financial phenomena is crucial to several purposes, from model calibration to forecasting.Besides theoretical analysis, many statistical approaches have been developed.Among these, a powerful method is the bootstrap, originally introduced by Efron (1979).To apply the original method to an observed time series, it is preliminarily required that the series is fitted with an accurate model, so that the residuals are strictly stationary and independent.If these conditions are met, the bootstrap can be successfully applied resampling the residuals (with repetition) to reproduce a large number of bootstrapped series.Needless to say, the more appropriate is the choice for the model of the observed time series, the more accurate and informative is the bootstrap.Financial applications of this methodology are the focus of a wide strand of literature, here we mention only two prominent studies on financial markets due to Brock et al. (1992) and Sullivan et al. (1999).
However, the strict stationarity and independence conditions imposed in the original method by Efron happen to be too restrictive in many practical applications, especially where data tend to show nonlinear dependence.To overcome such drawback, bootstrap theory has addressed to more data-driven approaches.Block, sieve, and local bootstraps are examples of these approaches (see Bühlmann 2002, for a comparison), which require weaker stationarity conditions and, even more important, are less demanding with respect to the independence of data.Indeed, these methods aim at capturing such dependence directly from the data, therefore they reduce the risk of misspecifying the model applied to prepare the data.In the case of Markov chain bootstrapping, a simple removal from the observed time series of the deterministic components (such as trend and seasonality) suffices to proceed safely with the bootstrap.This class of bootstrap methods do not require specific assumptions on the probability distributions of the random variables, that is why they are also referred to as nonparametric and in a certain sense model free.Therefore, they can be fruitfully applied in situations where analytic (reduced form) models and econometric models show serious drawbacks.In many economic areas where time series analysis has flourished, such as in finance, there are many examples (e.g., stock returns, exchange rates, interest rates) where analytic (reduced form) models successfully fit observed time series.A discussion on the parametric case with financial applications has been carried out in Guastaroba et al. (2009).However, turning to less liquid markets, such as commodities or energy, the shocks on prices arise much more irregularly, therefore several analytic models show serious difficulties.Econometric models are in general much more flexible than analytic ones.They include a large variety of ways to link a dependent variable to its past values, its past residuals, as well as other variables and their residuals.A natural limit of these models is, of course, represented by the calibration of their parameters, which often becomes unmanageable in the presence of nonlinear dependence among the data.In general, this is the case where data-driven approaches, such as the Markov chain bootstrapping proposed here, are successfully adopted.
In this paper, we focus on the family of bootstrap methods based on Markov chain theory.More specifically, we apply Markov chain theory to bootstrap observed multivariate time series taking continuous values.
Markov chain bootstrapping is based on the assumption that an observed time series may be viewed as a realization of a Markov chain.
The case of univariate Markov chains with finite states is undoubtedly one of the most explored.Some classical references are Kulperger and Prakasa Rao (1989), Athreya and Fuh (1992), and Datta and McCormick (1992).The quoted papers face explicitly the problem of estimating the transition probability matrix of a stationary Markov chain with the objective of saving most information on data dependence.Some papers refer to Markovian processes, and provide an estimation of their transition density function by adopting a kernel-based procedure (see, e.g. , Rajarshi 1990;Horowitz 2003;Paparoditis and Politis 2001).These works assume that similar trajectories (with respect to a distance measured on the state space) will tend to show similar transition probabilities.Moreover, no particular attention is paid to the order of the chain (i.e., the memory of the Markov chain).
In general, modeling an observed time series as a discrete Markov chain moves from the assumption that the observed values are its states.Of course, this assumption does not hold when dealing with continuous-valued time series.To apply Markov chain theory to the continuous case, a discretization (i.e., a partition) of the range is required, where each interval is assumed as a state.An early analysis of the advantages of adopting Markov chain bootstrapping to continuous-valued stochastic processes has been faced by Anatolyev and Vasnev (2002).They construct a bootstrap method on the basis of a partition of the state space, such that each state comprises approximately a constant number of observations, and conclude that such bootstrap method shows interesting estimation properties.Of course, such simple approach does not really tell us much about the data-generating process.In particular, the number of states used to partition the range can be either larger or smaller than it is required for description purposes, and the cutting points do not necessarily reflect any effective change of regime.
Indeed, optimally partitioning the range of a continuous-valued time series, for the purpose of applying Markov chain bootstrapping, is a major problem for different reasons.
The natural objective of describing satisfactorily the dependence structure of the observed time series would push in the direction of searching for partitions rich of states, implying the estimation of larger and larger transition probability matrices.Such tendency contrasts, however, with the number of available observations (i.e., the length of the original sample), which become rapidly "short" as soon as the dimension of the transition probability matrix increases.Moreover, the mathematical tractability of a Markov chain also reduces as the dimension of its transition probability matrix increases.
There is another subtle difficulty linked to increasing the number of states.With continuous ranges, it is often possible to identify a partition such that every state occurred only once in the observed time series.In such a case, the transition probability matrix would degenerate to a "0-1" matrix1 , and the series which would result by bootstrapping it would just be exact replications of the observed time series.This is, of course, an unwanted result, showing that the number of states required to bootstrap a continuous-valued stochastic process through Markov chains must be balanced between a search for description quality and a need of diversification.
Finally, a relevant difficulty in determining the states of a Markov chain for a continuous-valued stochastic process is the identification of the correct "cutting values" of its range, which is at least as important as deciding on the optimal number of states.
A general approach introduced to solve this class of problems has been advanced in Cerqueti et al. (2010).They start from a given partition of the range, which includes a number of states larger than those needed for a Markov chain to describe successfully the data-generating process.They then look for an information efficient assessment of the transition probability matrix by implementing a suitable clustering of the states.In this way, they reduce their number, approximate the discriminant cutting points of the range, and assess the relevance of the time lags within the order of the chain.Some contributions are relevant, in our opinion, to analyze the order of a Markov chain.Bühlmann and Wyner (1999) propose a sophisticated procedure, the so-called variable length Markov chain.Basically, starting from the initial data of a path, a recursive routine is established to differentiate states only when they contribute to differentiate future evolution.At the last step of the recursion, the identification of a Markov process whose memory depends on the realized paths is obtained.The authors achieve a satisfactory computational efficiency.Some other contributions are worth mentioning.Merhav et al. (1989), Kieffer (1993), Csiszár and Shields (2000), and Morvai and Weiss (2005) deal with the estimation of the order of a Markov chain in a framework of equal relevance of the states at each time lag.
Following Cerqueti et al. (2010), we move from a discretized version of a continuous-valued stochastic process, and so our work can be inserted among the research area of the reduction of a discrete Markov chain by clustering its state space.In general, the reduction of the state space of a Markov chain by means of some clustering procedures returns more tractable processes, but at the cost of some information loss, as it is well known in clustering (or lumpability) theory.Among the drawbacks, there is the risk that a number of mathematical properties of the original Markov chain get lost.
The reference literature in this area is in general concerned with the problem of information loss and lumpability theory.Some classical references on lumpability are Burke and Rosenblatt (1958) and Kemeny and Snell (1976).An encyclopedic survey is provided by Thomas (2010).Relevant applications of this theory include the compression of data represented through a finite alphabet.Some important examples of data compression criteria are the AIC (Akaike Information Criterion, Akaike 1970), the BIC (Bayesian Information Criterion, Schwarz 1978), and the MDL principle (Minimum Description Length principle, Rissanen 1978).In each criterion, the compression is implemented to obtain the maximum level of simplification (i.e., minimization of an entropy-based functional) also by pursuing the scope of minimizing the information loss (i.e., minimization of a penalty term).
In the present approach, we extend the theory of lumpability and model jointly the clustering of the state space and the assessment of the order of a Markov chain.At this aim, we construct a unique optimization problem to identify both relevant states and time lags.Such optimization problem is linked to information theory: specifically, two definitions of distance measure are introduced, to the aim of pursuing simultaneously the maximum level of simplification and the minimum level of information loss.
Another relevant comparison is that between our approach and Markov switching models (on these models see, for example, the recent review in Franke 2012).Despite their name, these models are analytic: a mathematical expression defines the dynamics of the process.The difference in these models is represented by the particular link that one or more parameters establish with the current value of the process, or with one of its attributes (e.g., its local volatility).In particular, this link is modeled classifying the value of the process (or of its attribute) according to few "regimes".Its parameters take different values conditioning on the current state of the process and on a transition probability matrix.In this way, the process changes "regimes" through time, respecting Markov chain probabilities.The difference with our Markov chain bootstrapping is deep.The lack of any analytic expression to define the dynamics of the process and the hidden number of states (as well as of the cutting values on the range used to define them) are key distinguishing features.
Our extension to the multivariate case increases the computational complexity of the problem.To deal with multivariate Markov chains of order k ≥ 1, it is required to pass from defining a state as the combination of values observed for a single variable over an interval of k consecutive times to defining a state as the combination of values of a vector (over the same period).Some attempts have been proposed in the literature to manage the complexity of the clustering problem.In Cerqueti et al. (2012), the optimization problem has been conveniently rewritten as a mixed integer linear programming problem and its exact solution has been found.The authors obtain satisfactory results in terms of computing times to find the optimal solution, even though their approach has to be adopted to solve problems of small to medium size (i.e., 12-15 states observed over 2-3 time lags).Cerqueti et al. (2013) overcome the complexity problem and solve real-life instances of several observed time series and a large number of relevant time lags by implementing a heuristic procedure.Specifically, the authors adopt a Tabu Search algorithm and find heuristic solutions of the optimization problem in a reasonable computing time.Moreover, in both the cited works, extensive numerical experiments are provided to show that the resulting bootstrapped series respect in full the statistical properties of the observed time series.
Given the relevance of computational complexity in this paper we also rely on a Tabu Search approach.Besides, we introduce a contiguity constraint for the clusters of the partitions, with the purposes of reducing the complexity of the problem and of improving the interpretation of the relevant states.It is worth observing that, in our approach, the distance between any two states is calculated considering only the values of their transition probabilities.Such approach is, therefore, free to group states which can be arbitrarily distant on the range of the process.The contiguity constraint introduced here mitigates that extreme freedom.It also meets the evidence that, in several cases, the rationale of a group of states (such as a regime) can be better described combining the closeness of the transition probabilities with the closeness of their values.In this respect, the specific context of the regimes of the electricity prices is paradigmatic.Indeed, electricity prices exhibit a rather stationary behavior with some sudden peaks, and the identification of the regimes is based on grouping together states belonging to the same space-intervals (see, Huisman and Mahieu 2003;Bunn 2004;Weron et al. 2004;Weron 2006).
To the best of our knowledge, this paper is the first to explore multivariate Markov chain bootstrapping.The theoretical advantages of univariate Markov chain bootstrapping are even stronger under the multivariate setting.Transition probability matrices are suitable to model arbitrary types of dependence among the components of a multivariate stochastic process, as well as those among their values at different time lags.This aspect is quite relevant in real-life applications, mainly for what concerns the analysis of contagion effects and causality relations.Our paper also contributes to the operational research literature.We design a novel Tabu Search heuristic to solve the optimization problem.The flexibility of the heuristic enables to solve problems under univariate or multivariate settings, as well as with or without a contiguity constraint.The only solution strategy available in the literature is the heuristic algorithm introduced in the aforementioned work of Cerqueti et al. (2013), and developed to solve the optimization problem under a univariate setting and without a contiguity constraint.We show that the new Tabu Search algorithm outperforms the latter heuristic, in terms of both solution quality and computing times.
The paper is organized as follows.Section 2 introduces the model of the optimal clustering of multivariate states of a Markov chain of order k ≥ 1. Section 3 describes the Tabu Search algorithm advanced to solve the computational complexity of the solution search.Section 4 focuses on the application of the bootstrap method to two empirical cases.The first is concerned with prices and volumes of electricity in the Spanish market.The second considers a trivariate stochastic process: prices and volumes of a US company stock (McDonald's) and prices of the Dow Jones Industrial Average (DJIA) index.Section 5 comments on the results and Sect.6 concludes.Finally, Online Resource 1 details Tabu Search pseudo-codes, removal of trend and seasonality, initial states, Tabu Search settings and computational results, and information loss and distance measures.(1) We will call a vector of k consecutive realizations of the process as a k-trajectory.The k-trajectory starting at time s is the vector y s,k = (y s , . . ., y s+k−1 ), with M ≥ 2, k = 1, . . ., M − 1 and s ∈ {1, . . ., M − k + 1}.Define the set O k collecting all the k-trajectories that can be extracted from O as: Let us now define the discretized version of Y. Fix n = 1, . . ., N and assume that the support of the nth component of Y is given by an interval I n ⊆ R. Consider a partition of I n into L n (non-overlapping) intervals, or states, a (3) L n .We will refer to A n as to the initial partition of I n .
Remark 1 Without loss of generality, we assume that the intervals of A n are indexed in an increasing order, that is: The Cartesian product of the A n is The zth element of A is the vector of N components a z = a (1) , . . ., a (n) , . . ., a (N ) , where a (n) is a generic interval, or state, of A n , and z = 1, . . ., (A).The number (A) = N n=1 L n is the cardinality of A. We can also define A by listing its elements: The elements of A are called N -intervals, or N -states.Set A is the support of a discretized version of Y.
Since the elements of A n are non-overlapping intervals for each n = 1, . . ., N , there exists a unique a z , with z = 1, . . ., (A), such that y m ∈ a z , for all m ∈ N + .Therefore, letting: vector O in Eq. ( 1) can be rewritten as a vector of the same length, whose elements are N -states extracted from A, as: In a similar way, let us consider the vector a h,k = (a h k , . . ., a h 1 ) ∈ A k , where h w = 1, . . ., (A), w = 1, . . ., k.The row vector a h,k is the ordered set of k N-states a h k ∈ A, listed, in a natural way, from the furthest to the closest realization of the chain.The row vector a h,k will be called k N -interval, or k N -state.Now, the k-trajectories collected in set O k of Eq. ( 2) can be rewritten by means of the corresponding k N -states of A k giving:

Problem overview
It is now worth recalling the basic idea of our problem.Given a realization of a multivariate discrete time continuous-valued stochastic process, i.e., an observed multivariate time series, we aim at finding a suitable approximation of the process by means of a Markov chain of order k ≥ 1 with a finite number of multivariate states.We start with O, i.e., a realization of the process.O consists of N observed time series of length M, linked through an unknown dependence structure.Given O, we want to bootstrap the process which generated it.The approach that we adopt in this work is to resort to the theory of Markov chains of order k ≥ 1.To this purpose, instead of focusing on the observed values collected in O, we will focus on their discretized representation collected in A. Specifically, A may be viewed as a vector collecting M realizations of a Markov chain X = {X m } m∈N + of order k ≥ 1 and with discrete support A. Hence, we will refer to A as a discretized version of O, and X may be intuitively thought as a discretized version of Y.
The critical point for the Markov chain X to represent the best approximation of the stochastic process Y is to obtain the best estimation of its transition probability matrix.The probabilities of this matrix describe the dynamics of X.However, as pointed out in Cerqueti et al. (2010), two contrasting objectives emerge here.On the one hand, we would like to estimate the matrix keeping all the N -states of A distinguished, so to keep all the available information contained in the original sample A. On the other hand, aggregating some N -states can be a desirable result, since merging them could result in several benefits, such as a simpler representation of X, a better understanding of the relevant regimes of the unknown stochastic process Y, a clearer view of its significant thresholds, and a better diversification of the bootstrapped series.
At the same time, for cardinalities of A larger than some hundreds, finding the optimal aggregation of the N -states of A to balance these two competing requirements is a computationally complex problem.To handle such complexity, the approach we take is to combine two devices: a Tabu Search heuristic to scan efficiently the space of admissible aggregations of N -states and a contiguity constraint to remove from the analysis aggregations of N -states where the N -states grouped in the same class are distant (in a sense which will be made clear in the following).
Section 2.6 provides an outline of the method we propose to tackle the problem.

Estimation of the transition probability matrices
Fix M ≥ 2, k = 1, . . ., M − 1, z = 1, . . ., (A), and h = 1, . . ., (A k ).Observe that the cardinality of The transition probability of the Markov chain X from the k N -state a h,k to the N -state a z is defined as: These probabilities are estimated on the basis of the original sample O.More precisely, the transition probability P(a z |a h,k ) in Eq. ( 7) is estimated through the empirical frequency f (a z |a h,k ) of k N -state a h,k evolving to N -state a z .This frequency is the cardinality of a set as follows: Based on Ching et al. (2008) and on Eq. ( 8), we can provide an estimate of P(a z |a h,k ) as:

Contiguous partitions
We focus our attention only on contiguous partitions of A k (see the application of our model in Sect.4).Let us define them.
Definition 1 Fix n = 1, . . ., N and consider the following partition of A n :

123
where U n = 1, . . ., L n and the subscript v is placed for notation convenience, as we will see below in Eqs. ( 10) and ( 11).λ (n) v is said to be a contiguous partition of A n when: For a contiguous partition, θ in the same class without also grouping state a (n) l in that class.We will refer to the characterization of contiguous partitions in Definition 1 as contiguity constraint.
Remark 2 We assume, without loss of generality, that the intervals a (n)  l of A n belonging to a class θ (n)  v,u of contiguous partition λ (n)  v are indexed in an increasing order, that is, they satisfy Eq. ( 4).
Definition 1 can be easily extended to the multivariate case by introducing the set of contiguous multivariate partitions of A, which we will call N -partitions.
Definition 2 A (contiguous multivariate) N -partition of A is denoted as λ v and is defined as: where To simplify notation, we put: Remark 3 An N -partition of A can also be (more easily) represented by the n-tuple of contiguous partitions λ v,u n belong to.So, N -partition λ v can also be written as: Analogously to what we did for A in Eq. ( 6) and to simplify the notation, it is convenient to list the elements of and write: Definition 2 can be easily extended to the multidimensional case by introducing the set k of contiguous multivariate and multidimensional partitions of A k , which we will call k N -partitions.
Definition 3 A (contiguous multivariate and multidimensional) k N -partition of A k is denoted as λ v and is defined as: where the double subscript v w , u w identifies the generic class θ v w ,u w of N -partition λ v w for time lag w.To simplify notation, we put: Remark 4 A k N -partition of A k can also be (more easily) represented by the k-tuple of N -partitions λ v w , w = 1, . . ., k, which the classes θ v w ,u w belong to.So, k N -partition λ v can also be written as: The set k of k N -partitions of A k can be represented by listing its elements as follows: Remark 5 Consider a k N -partition of A k .If we choose n ∈ {1, . . ., N } and select w ∈ {1, . . ., k}, the k N -partition of A k reduces to a contiguous partition λ (n)  v w of A n , which will also be referred to as Univariate and Unidimensional Partition (UUP).
It is now convenient to define the transition probability between class v,u of k N -partition λ v and N -state a z of A. By referring to the probability law P introduced in Eq. ( 7), we have: where the inclusions in the right-hand side have to be understood componentwise.Analogously to Eq. ( 9), we can estimate probability P(a z | v,u ) in Eq. ( 12) through the empirical frequencies: (13) Probabilities P(a z | v,u ) generate a new transition matrix, where the k N -states are replaced by the classes of the k N -partitions.To simplify the notation, we use hereafter the symbol P to refer to both the transition probabilities P and their estimates P.
The partitions of A k and an optimal selection procedure among them are the tools used in the present paper to reduce the cardinality of the state space A of X to best approximate Y with a Markov chain of order k ≥ 1.

Optimization problem
The optimization problem takes into account two competing objectives: on the one side, we want bootstrapped series as "similar" as possible to the original sample O; on the other side, we want to avoid that they are "too similar" to (eventually coinciding with) O.
The optimization problem is constructed on the basis of the distance indicator d λ v and the multiplicity measure m λ v [see Eqs. ( 14) and ( 15), respectively, below].For a theoretical foundation of distance measures, like the distance indicator and the multiplicity measure, on information theory, refer to Online resource 1-Appendix F.
Here, a link between information loss (see Kolmogorov 1965) and distance measures is presented and convincingly supports the choice of d λ v and m λ v .The distance indicator proposed here penalizes the classes grouping one (or more) pairs of k N -states, say a h ,k and a h ,k , whose transition probabilities to the same N -state are very different.To keep this indicator low, a partition should group in any class k N -states having similar transition probabilities.
First of all, we need to introduce the distance d h ,h between two k N -states, a h ,k and a h ,k : Next, the distance within the classes of k N -partition λ v can be defined as: Finally, the distance d λ v of the k N -partition λ v is given by the average of the distances of its classes: where C = where B = 2 .Observe that a high multiplicity measure is typically associated with partitions consisting of few classes and where one or few of them have a larger number of k N -states than the others.Given the information loss theory of Kolmogorov, large classes tend to increase the information loss.The information loss of a class is maximized when its row (in the transition probability matrix) contains all equal values.Indeed this amounts to voiding that class of any conditioning power.So, a high multiplicity measure allows partitions with large classes to be chosen.This, in turn, implies that the bootstrap method will frequently sample the next value using low informative classes, reducing the risk of the mechanical replication of part of (if not even all) the observed time series.
It can be shown that while the all-comprehensive partition-the one with cardinality 1-exhibits the maximum value of d λ v (not necessarily 2) and m λ v .
Given the previous properties of the distance indicator and of the multiplicity measure, the choice of partitions with a low distance indicator results in bootstrapped series "similar" to the original sample O, while the choice of partitions with a high multiplicity measure avoids bootstrapped series "too similar" to O.The trade-off between the distance indicator and the multiplicity measure motivates the choice of partitions through the optimization problem that we introduce in the following.
Given the multiplicity constraint threshold γ ∈ [0, 1] and the solution (k * (γ ), λ * (γ )) of the minimization problem in Definition 4, we will replicate the original sample O through a Markov chain of order k * (γ ) described by a transition probability matrix of order k * (γ ) estimated by means of Eq. ( 13).The matrix is based on k * (γ )N -partition λ * (γ ).
As it will turn out to be useful for the interpretation of the results of the optimization problem, we introduce the efficient frontier associated with the optimal k N -partitions λ * (γ ) as γ varies in [0, 1] and k is fixed to a given value.
Definition 5 Set k = 1, . . ., M − 1 to a chosen value.The efficient frontier F k related to the minimization problem ( 16)-( 17) is: where λ * (γ ) is the solution of the problem: 2.6 Outline of the method In the following, we provide an intuitive and short description of the method we propose, which is outlined in Algorithm 1.
Algorithm 1 General Scheme of the Method.
Input: an observed (multivariate) time series.
Output: a given number of bootstrapped (multivariate) series.
1: Use regression analysis to remove possible deterministic components, such as trend and seasonality.2: Segment the support of each component of the stochastic process into initial intervals, or states.3: Build the multidimensional discrete support, A, of the observed time series using the previous states.4: Use Eq. ( 9) to estimate the transition probability matrix of the multivariate Markov chain of order k ≥ 1 with support A. 5: Build an approximation of the efficient frontier by means of Algorithm 2. 6: Choose a point on the approximation of the efficient frontier.7: Estimate the transition probability matrix corresponding to the chosen point according to Eq. ( 13).8: Use the bootstrap method in Sect.3.2 of Cerqueti et al. (2013).
Starting from a realization of a multivariate discrete time continuous-valued stochastic process, i.e., an observed multivariate time series, a regression analysis is used to remove possible deterministic elements, such as trend and seasonality (Step 1 in Algorithm 1).The removal of these deterministic elements makes the series weakly stationary.Other unknown deterministic components possibly remaining in the residuals will be captured, along with other dependencies among the data, by the transition probability matrix.The support of each component of the stochastic process is initially partitioned into intervals, where each interval corresponds to a state (Step 2).Multivariate states are obtained by a Cartesian product of the latter states (Step 3).Consider multivariate and multidimensional states, i.e., sequences of k ≥ 1 multivariate states.Based on the frequencies of transitions from multivariate and multidimensional states to multivariate states, the transition probabilities of the multivariate Markov chain of order k are estimated (Step 4).The approximation of the stochastic process is found by aggregating the multivariate and multidimensional states based on the distance between their transition probabilities.Specifically, we formulate the problem of finding the approximation as an optimization problem, where a distance indicator is minimized subject to a constraint on a multiplicity measure.Each feasible solution of the optimization problem is required to fulfill a contiguity constraint.By varying the right-hand side of the multiplicity constraint, the corresponding optimal solutions define an efficient frontier.We build an approximation of the efficient frontier embedding a Tabu Search heuristic into a classical -constraint method (Step 5).A point on the approximation of the efficient frontier is chosen (Step 6).Based on the aggregation of the multivariate and multidimensional states corresponding to the selected point, the corresponding transition probability matrix is estimated 2 (Step 7).This matrix defines a Markov chain of order k, which approximates the stochastic process.The perfor-mance of this approximation is validated via bootstrapping multivariate series (Step 8).To generate a bootstrapped multivariate series, an initial sequence of k multivariate states is selected from any of those defining the rows of the transition probability matrix.The probability distribution in the corresponding row is then used to extract randomly the next multivariate state of the Markov chain.After updating the sequence of k multivariate states (discarding the oldest one and including the new one), the generation can be iterated.Finally, some statistics are calculated on the bootstrapped multivariate series and are compared with the corresponding statistics computed on the observed time series.

Solution strategy
Due to the computational difficulties of determining the optimal solution of problem ( 18)-( 19), we develop a heuristic algorithm to tackle the problem.The heuristic algorithm is based on the Tabu Search framework.We assume that the reader is familiar with the Tabu Search framework and we refer to Glover and Laguna (1997) for a detailed description of the metaheuristic.
The Tabu Search algorithm is composed of two main phases, and then it is referred to as the Two-Phase Tabu Search.Phase 1 (TSP1, hereafter) is designed to explore the set k of k N -partitions, i.e., the set collecting all the contiguous multivariate and multidimensional partitions, to the goal of (heuristically) solving problem ( 18)-( 19).
Nevertheless, as the problem proposed in this paper is new in the literature and due to the difficulties of solving it to optimality, we design a second phase where the partitions of A k that violate the contiguity constraint are evaluated, as well.Particularly, Phase 2 (TSP2, hereafter) is designed to be performed after TSP1 and considers both k N -partitions in k and partitions of A k that violate the contiguity constraint (hereafter the procedure is referred to as TSP1+TSP2).This design of the algorithm allows the researcher interested in exploring only the k N -partitions in k , that is the scope of this paper, to perform only TSP1.Nevertheless, it allows us to assess the effectiveness of the Two-Phase Tabu Search by comparing its performance with the results reported in Cerqueti et al. (2013), where the contiguity constraint is not considered.Besides, this design allows us to evaluate the impact of the contiguity constraint on the distance indicator d λ and the multiplicity measure m λ by comparing the output of procedure TSP1 (partitions with contiguity constraint) with the output of procedure TSP1+TSP2 (partitions where the contiguity constraint is relaxed).

Building an approximation of the efficient frontier
In Definition 5, the efficient frontier F k is defined as a set of solutions of optimization problem ( 18)-( 19) as γ varies in [0, 1] and k is fixed to a given value.As we solve problem ( 18)-( 19) by means of a heuristic algorithm, the outcome is an approximation case.Although the extension to the present multivariate setting would be straightforward, its representation would require involved notation without adding any new insights.
of the efficient frontier F k .Therefore, we introduce the following definition that holds for γ ∈ [0, 1] and for a given value of k.
Definition 6 Set k = 1, . . ., M − 1.An approximation of the efficient frontier F k is the set: where λ H (γ ) is a heuristic solution of problem ( 18)-( 19), for a generic heuristic algorithm.Furthermore, set AF k is composed of mutually non-dominated points, i.e., there does not exist any pair of heuristic solutions and at least one inequality is strict.To build the approximated efficient frontier AF k , we embed the Two-Phase Tabu Search into an -constraint method.The -constraint method is one of the most known approaches for the solution of multi-objective optimization problems (see, e.g., Chankong and Haimes 1983;Miettinen 1999).The method generates a sequence of single-objective optimization problems, referred to as -constraint problems, by transforming all the objective functions, but one, into constraints.
Let us fix i ∈ N and consider the following -constraint formulation of problem ( 18)-( 19): where γ i ∈ [0, 1].The idea is to construct a sequence of -constraint problems (20)-( 21) where the right-hand side of constraint ( 21) is iteratively updated as follows: once problem (20)-( 21) is solved for a given value of γ i , we set γ i equal to m λ H i , and then we set γ i+1 := γ i + , where is a given parameter.Each -constraint problem (20)-( 21) is then solved by means of the Two-Phase Tabu Search described below.We denote as λ H i the best-known solution found by our heuristic algorithm for the ith -constraint problem (20)-( 21).
The general scheme of the algorithm proposed to construct the approximated efficient frontier AF k is sketched in Algorithm 2.
Handling the dominated points.It is worth highlighting that some dominated points might be generated solving the sequence of -constraint problems (20)-( 21) with any heuristic algorithm.Indeed, let us assume that the best-known solution found by a given heuristic for the ith -constraint problem is λ H i .It might happen that the bestknown solution λ H i+1 to the ).Therefore, we include in our algorithm a post-processing procedure that scans all the points included in set AF k and removes the dominated ones (Step 10 in Algorithm 2).
Algorithm 2 General Scheme to Build AF k .

TSP1: Exploring contiguous partitions
The description of TSP1 is organized as follows.First, we define the neighborhood structure as it represents the core of the algorithm.Subsequently, each component of TSP1 is detailed.
Neighborhood structure The introduction of the multivariate assumption for the stochastic process and of the contiguity constraint for the states imposes the following redefinition of the neighborhood structure introduced in Cerqueti et al. (2013).
Consider an incumbent solution λ ∈ k of problem ( 20)-( 21).At each iteration of procedure TSP1, one time lag w ∈ {1, . . ., k} is first selected, as detailed below, and then the analysis moves from the incumbent solution λ to a neighboring solution λ .Given the incumbent solution λ ∈ k , consider its N -partition λ w = (λ -(q, n)-relocation: q contiguous states {a l+q−1 } are removed from their origin class and assigned to an existing and not empty adjacent class; -(q, n)-creation: q contiguous states {a To guarantee that the contiguity constraint is fulfilled after performing the aforementioned moves, the q contiguous states {a l+q−1 } are jointly taken from the origin class and then jointly assigned to an adjacent class.In addition, if the number of states belonging to a class is smaller than q, a (q, n)-relocation is performed by moving all the available states in the class.Finally, if the number of states belonging to a class is equal to q, no (q, n)-creation is performed as it would not change the current UUP.
The rationale of introducing moves 1-4 is related to the fact that restricting to the (q, n)-relocations and (q, n)-creations has proved to be inefficient in some cases.On the one hand, a small neighborhood is easier to explore thoroughly but, on the other hand, it might worsen the quality of the solution found when N -partitions consist of many components.In this case, the interaction between changes in the N -partitions of two or more components is neglected.Observe that further moves involving more than two components could be introduced.Nevertheless, this would expand excessively the neighborhood making its exploration particularly slow.
Initial feasible solution Observe that the best-known solution found for -constraint problem i is not feasible for -constraint problem i + 1.Indeed, let m λ H i be the value of the multiplicity measure for -constraint problem i.The multiplicity constraint for problem i + 1 is m λ ≥ m λ H i + (see Steps 7 and 8 in Algorithm 2), making the best-known solution λ H i unfeasible for -constraint problem i + 1.The initial solution procedure takes as input the best-known solution of the previous -constraint problem found by TSP13 and performs a sequence of (1, n)-relocations until the feasibility of the current solution is restored.Indeed, performing a (1, n)relocation tends to increase the value of m λ (and of d λ as well).At each iteration, the (1, n)-relocation determining the largest increase of m λ is selected.Once the first feasible solution is found, the procedure to compute the initial solution stops.

Random selection of time lags
As mentioned above, at each iteration of TSP1 one time lag is randomly selected.To this aim, we use the heuristic procedure proposed in Cerqueti et al. (2013).In few words, the time lag is selected according to a discrete probability distribution function giving higher priority to those time lags that bring a key information to drive the evolution of the process.The interested reader is referred to the quoted paper for further details.
Tabu lists We use k × N tabu lists T L (n) w , w = 1, . . ., k and n = 1, . . ., N , i.e., one tabu list for each time lag and component.When UUP λ (n) w is selected to form the new incumbent solution, the selection of UUP λ (n) w at time lag w becomes forbidden (tabu) for τ iterations (see Online Resource 1-Appendix D for further details).Note that, when one of moves 1-4 is performed, two UUPs become temporarily tabu.
Aspiration criterion To avoid that feasible solutions of excellent quality are not considered because of the tabu status for some of their UUPs, we introduce the following standard aspiration criterion.A UUP belonging to a tabu list can be selected (i.e., its tabu status is revoked) if its selection leads to a feasible solution better than the best-known solution λ H encountered so far.
Diversification strategy Using tabu lists prevents short-term cycling, i.e., visiting the same feasible solution in two successive or very close iterations, but it is not adequate to avoid long-term cycling, i.e., being trapped in local optima.Hence, a wider exploration of the solution space has to be encouraged.To this aim, we design a diversification strategy to move the search to a different portion of the solution space.
First, it is worth noticing that, given UUP λ (n) w , the UUPs obtained by performing (q, n)-relocations and (q, n)-creations tend to be "more different" with respect to λ (n) w when q increases.Consider the following example.
Example 2 Let λ (n)  w = {{1, 2, 3}, {4, 5}} be a UUP of N -partition λ w at time lag w in the incumbent solution λ.For q = 1, the (1, n)-relocations available lead to: λ Trivial examples can be found for the (q, n)-creation and for moves 1-4.Therefore, in a first attempt, the value of q is increased by 1 every time that a maximum number of iterations without improvement (parameter iterNoImprP1 max ) have been consecutively performed.
Second, to move significantly the search to different regions of the solution space, a jump is performed once that a maximum value of q is exceeded (parameter q max ).The procedure designed to perform jumps in TSP1 is referred to as JumpingP1 and is sketched in Algorithm 4. Procedure JumpingP1 is an adaptation of the corresponding procedure introduced in Cerqueti et al. (2013) to the multivariate assumption for the stochastic process and to the contiguity constraint for the states.
Stopping criterion Algorithm TSP1 stops when iterP1 max iterations have been performed.
Tabu Search algorithm In Online Resource 1-Appendix A, a pseudo-code for procedure TSP1 is provided in Algorithm 2 along with a description.We highlight that the best-known solution λ H found at the end of one execution of TSP1 is provided as input of procedure TSP1 used to solve the next -constraint problem in the sequence.

TSP2: Extending the search also to non-contiguous partitions
This subsection is devoted to the description of TSP2.To the sake of brevity, we here highlight only the differences with respect to algorithm TSP1 described above.s Algorithm TSP2 takes as input the best-known solution λ H ∈ k found performing TSP1.The neighborhood structure is different from that described above for TSP1 as, in TSP2, non-contiguous partitions of A k are explored too.At each iteration of TSP2, given an incumbent solution λ of A k , which does not necessarily belong to the set of contiguous multivariate and multidimensional partitions k , select partition λ w, i.e., the time lag w component of λ, with w ∈ {1, . . ., k}, and then partition λ w is defined by the following moves: -(q, n) P2 -relocation: q states (not necessarily contiguous) are removed from their origin class and assigned to a not empty (and not necessarily adjacent) class; -(q, n) P2 -creation: q states (not necessarily contiguous) are removed from their origin class and assigned to a new (not necessarily adjacent) class containing them as the only elements; -(q, n)-swap: q states (not necessarily contiguous) are swapped with q states (not necessarily contiguous) belonging to another class.
Algorithm TSP2 stops when iterP2 max iterations have been performed.In Online Resource 1-Appendix A, a pseudo-code of TSP2 is sketched in Algorithm 3.

Data description
Our bootstrap method is applied to two observed multidimensional time series.In particular, we consider here a trivariate series composed of the prices and volumes of McDonald's and the prices of Dow Jones Industrial Average and a bivariate series composed of the "Mibel Spanish Electric System Arithmetic Average Price" and the corresponding total volume.The trivariate series covers the trading days on the New York Stock Exchange from January 2nd, 2004 to December 31st, 2012, and the prices of McDonald's are expressed in US dollars.The bivariate series spans the trading days from January 1st, 2001 to May 6th, 2010, and the prices are expressed in euros per MWh.The stock-index case consists of T = 2265 observations, while for the electricity series we have T = 2439.
Figures 1 and 2 show the two observed multidimensional time series.We can make some comments.
The original sample of electricity prices and volumes is characterized by the following features: -a weekly and annual seasonality (the annual seasonality is more evident for volumes); -a slightly positive trend of prices, a significant positive trend of volumes; -stochastic volatility; umes, the literature on time series of electricity prices and volumes usually considers them as "hard to model" cases.For a review of the difficulties in modeling electricity prices and volumes and the methods developed to solve them, see, for example, Bunn (2004), Huisman and Mahieu (2003), Weron et al. (2004), andWeron (2006).
About stock prices and volumes and index prices, the following features are commonly observed: -stock volumes show mean reversion, volatility clustering, and spikes; -stock and index prices appear more in line with the theory of market efficiency and seem to represent realizations of a geometric Brownian motion, even though the normality of stock and index return distributions has been regularly rejected in the empirical literature; -the joint series of volumes and prices shows unclear causal relationships, so its bootstrapping can be considered as a "hard test" for bootstrapping purposes.
In the electricity case, an (exponential) weekly seasonal component and an (exponential) trend have been removed from the raw prices and volumes.Removing the weekly seasonality lets us reduce the order of the Markov chain below 7, which corresponds to a great reduction in the complexity of our problem.The removal of the trend component makes the series (quite) stationary.Of course both components have been added back to the bootstrapped series.
In the stock-index case, no seasonality has been removed, because such feature is less evident than in the electricity case, therefore only a trend has been estimated to the goal of making the series stationary.To gain further stationarity, an additional logarithmic transformation of the detrended series has been performed in the case of stock and index prices.Also in this case, the trend component has been added back (for the stock and index prices after exponentiation of the bootstrapped series).
The estimation of exponential (rather than linear) components prevents the removal/reintroduction process from generating occasional negative prices (see Online Resource 1-Appendix B for further details).

Preliminary segmentation of the support
As explained in the previous subsection, the data treatment applied to the bivariate and trivariate series to gain stationarity can be synthesized as follows: -series of electricity prices: detrended and weekly deseasonalized; -series of electricity volumes: detrended and weekly deseasonalized; -series of stock volumes: detrended; -series of stock prices: detrended and then transformed into series of logarithmic returns; -series of index prices: detrended and then transformed into series of logarithmic returns.
After the data treatment, a preliminary segmentation of the supports of the continuousvalued stochastic processes generating the bivariate and trivariate series is performed (see Sect. 2.1).We make the following choice: 8 initial intervals for the stock volumes (sv), 6 for the stock returns (sr), and 6 for the index returns (ir) in the trivariate case, and 13 initial intervals for both the electricity prices (ep) and electricity volumes (ev) in the bivariate case.Thus, according to Eq. ( 5), the cardinality of A is 8 × 6 × 6 = 288 3-states in the trivariate case and 13 × 13 = 169 2-states in the electricity case.Such numbers of initial intervals correspond to an order of magnitude larger than the number of states commonly considered in the literature for the regimes of the stockindex and electricity cases (usually 2 or 3 for both the cases.See, e.g., Bühlmann 1998 for the stock-index case and Huisman and Mahieu 2003 for the electricity case).
The preliminary segmentation is performed through the minimum variance clustering procedure provided in Ward Jr. (1963).For details on this preliminary segmentation, see Online Resource 1-Appendix C.

Transition probability matrices
The solution of optimization problem ( 16)-( 17) for a given γ ∈ [0, 1] consists of the γ -optimal pair (k * (γ ), λ * (γ )), i.e., the optimal order k * (γ ) ∈ {1, . . ., M − 1} and the optimal k * (γ )N -partition λ * (γ ) ∈ k * .However, to the sake of reducing the complexity of the problem, we restrict the search of the solution to the space k and fix k.Observe that the solutions found under such restriction still tell us much about the "relevance" of the different time lags up to k.
Observe that we can fix different values of k for each component of our multivariate Markov chain, although the (overall) order of the Markov chain will be the highest value of k.In particular, the order of the Markov chain is 2 for the components represented by electricity prices, stock returns, and index returns, while it is set equal to 1 for the electricity and stock volumes.As a consequence, the order of the multivariate Markov chain will be 2 for both the bivariate and the trivariate cases.
In the present application, next to a reduction of the complexity of the problem, a reason to fix k is that the data treatment applied to the observed multivariate time series (trend and weekly seasonality removal and logarithmic return calculation) should reduce the need to look back far in time.
Equation ( 9) has been used to estimate M T and M B , i.e., the transition probability matrices of order 2 for the trivariate (T ) case and for the bivariate (B) case, respectively.The matrices are available from the authors upon request.
The sets of 2-trajectories O 2 [see Eq. ( 2)] comprise 2437 elements for the electricity instance and 2262 elements for the stock-index one. 4Each set O 2 has been rewritten by means of the corresponding 2N -states of A 2 giving A 2 (N = 2 in the electricity case, while N = 3 in the stock-index case).About 7.5 % of the 2N -states of A 2 in the electricity case have been observed to evolve to the same 2-state.We call these 2N -states deterministic.In the stock-index case, the percentage of deterministic 2Nstates is about 20 %.The remaining observed 2N -states are called probabilistic (2255 for the bivariate case and 1834 for the trivariate case).
M T consists of 725 rows, 304 referred to probabilistic 2N -states and 421 to deterministic ones, while M B consists of 472 rows, 305 referred to probabilistic 2N -states and 167 deterministic ones.
The 2N -states of A 2 not observed in A 2 are neglected, as their estimated transition probabilities are null.For bootstrapping purposes, deterministic 2N -states need not be aggregated to other 2N -states, therefore the optimization problem ( 16)-( 17) is applied only to the set of probabilistic 2N -states.As a result of these settings, all the admissible k N -partitions λ are characterized by three kinds of classes: -the unique class collecting the unobserved 2N -states, -as many classes as the different observed deterministic 2N -states, -the classes aggregating the observed probabilistic 2N -states.

Analysis of the distributional properties
We evaluate the performance of our bootstrap method by choosing two heuristic solutions, λ H B and λ H T , for the bivariate and the trivariate case, respectively.Each solution has been chosen as follows: select the pair of adjacent points of the approximations of the efficient frontier with the largest slope.Select the point of that pair with the smallest d λ and m λ .Such point shows that an additional improvement with respect to the multiplicity measure corresponds to a large loss in terms of distance indicator.For a visual understanding of how a solution is selected, consider, as an example, the red circles on the efficient frontiers in Fig. 3.The partitions so selected have multiplicity measure and distance indicator equal to 0.366 and 1.507, for the electricity case, and to 0.298 and 1.763, for the stock-index case (see Table 14 in Online Resource 1-Appendix E).These partitions are expected to generate diversified bootstrapped series, which could represent a varied enough set of scenarios to test the effectiveness of the procedure.
In particular, we generate 2 sets of 5000 bootstrapped series, one set for each case.Each bootstrapped series includes = 2439 trading days for the bivariate case and = 2265 trading days for the trivariate one.The bootstrap method used to generate the 5000 bivariate and trivariate series is described in Sect.3.2 of Cerqueti et al. (2013), which the interested reader is referred to.The lengths of the bootstrapped series are equal to the lengths of the corresponding observed time series (net of the values required to initialize the procedure) to allow for a fairer comparison (see Sect. 3.2 of Cerqueti et al. 2013, for the initialization steps of the bootstrap method).
To assess the quality of the method, we analyze the statistical properties of the bootstrapped series and compare them with the ones of the observed time series.To this goal, we calculate the following statistics on each bootstrapped series: 1. average, standard deviation, skewness, kurtosis, minimum, maximum, 2. autocorrelation at lag k, k = 1, 2, 3, 3. linear regression slope, b, with xt = a + bt + g t , t = 1, . . ., , where xt stands for the tth value of a bootstrapped series of prices or volumes.Moreover, to analyze the power of the method to capture the cross-dependencies among the components of the multivariate bootstrapped series, we consider the following two simple vector autoregression models of order 1, for the stock-index case: where ŝv t , ŝp t , and î p t represent the tth value of a bootstrapped series of stock volumes, stock prices, and index prices, respectively, and the following model for the electricity case: where êp t and êv t represent the tth value of a bootstrapped series of electricity prices and electricity volumes, respectively.We, therefore, also calculate: 4. autoregression coefficients for the two previous models.
The statistics in 1. are concerned with the distribution of prices and volumes, while the statistics from 2. to 4. are more concerned with the dynamic structure of the series.The autocorrelation at lag 3 is observed to check if the similarity between the observed time series and the bootstrapped series is kept beyond the order k = 2.
Tables 1, 2, and 3 (for the trivariate case) and 4 and 5 (for the bivariate case) report the 5th and 95th percentiles of the distributions of the mentioned statistics, together with the actual value of the observed time series.We also report the percentile rank, i.e., the percentage of cases for which the statistic value is smaller than or equal to the value of the observed time series.

Stock-index case
Tables 1, 2, and 3 report they statistics for the stock-index case.In particular, they show the 5th and the 95th percentiles of the bootstrap distributions of some statistics calculated on the 5000 series generated from heuristic solution λ H T .The tables also show the percentile ranks of the observed time series.
From Tables 1, 2, and 3, we can make the following remarks: -all the statistics computed for the observed time series take values between the 5th and the 95th percentiles, except for the minimum of McDonald's volumes ranked in the 4th percentile of the corresponding distribution, -the previous observation applies in particular to the autocorrelation coefficients at lag 3, which capture an auto-dependence beyond the order 2 of the Markov chain, -one could expect the statistics computed for the observed time series to fall close to the 50th percentile of the corresponding bootstrap distribution.However, such intuitive stance has to be restrained if the observed time series is characterized by rare events, such as jumps or spikes.To be more precise, consider the maximum of the observed volumes of McDonald's: its percentile rank is 71.This score could be acceptable, as the observed volumes show a strong spike at the beginning of 2007 [see Panel (b) of Fig. 2].Indeed, the 5000 bootstrapped series need not reproduce regularly such a high maximum (such regularity would contrast with the definition of rare event).So, the linear regression slope of McDonald's volumes is low (belonging to the 23rd percentile), since the majority of bootstrapped series, without such a spike, take a definitely more positive slope, -the autocorrelation coefficients at lag 1, 2, and 3 of stock and index prices show very different behaviors, being the ones of the observed McDonald's prices ranked very high (82-83) with respect to their bootstrap distributions and the ones of the observed DJIA prices ranked very low (31)(32)(33).This fact could be explained observing that the original sample of stock prices seems to show less jumps than the corresponding observed time series of index prices, therefore we could expect scenarios where the bootstrapped stock prices would include more jumps than the observed ones and the index prices less jumps than their observed counterpart, -the coefficients of the vector autoregression model of order 1 in Eq. ( 22) indicate that each component of the trivariate bootstrapped series is significantly explained by its value at time lag 1 (coefficients a 4 , b 4 , and c 4 ).Other significant explanatory power is given by coefficients a 2 and c 2 , indicating some explanatory power of stock prices on stock volumes and index prices, respectively.Observe that autoregression coefficients b 2 , b 3 , and c 3 are not reported since they are not significantly different from 0. Although not meant as the true model for the cross-dependencies among the three components of the trivariate series, this simple VAR(1) model shows a distribution of the coefficients such that the original sample cannot be distinguished from the others.

Electricity case
Tables 4 and 5 report the statistics for the electricity case.In particular, they show the 5th and the 95th percentiles of the bootstrap distributions of some statistics calculated on the 5000 series generated from heuristic solution λ H B .The tables also show the percentile ranks of the original sample.
From Tables 4 and 5, we can make the following remarks: -all the statistics computed for the observed time series take values between the 5th and the 95th percentiles, except for the minimum and maximum of the electricity volumes ranked in the 4th and 96th percentiles of the corresponding distributions, -as in the stock-index case, the previous observation applies, in particular, to the autocorrelation coefficients at lag 3, which capture an auto-dependence beyond the order 2 of the Markov chain, -as in the stock-index case, the coefficients of the vector autoregression model of order 1 in Eq. ( 23) indicate that each component of the bivariate bootstrapped series is significantly explained by its value at time lag 1 (coefficients d 3 and f 3 ).Other significant explanatory power is given by coefficient f 2 , indicating some explanatory power of electricity prices on electricity volumes.Observe that autoregression coefficient d 2 is not reported since it is not significantly different from 0. Although not meant as the true model for the cross-dependencies among the two components of the bivariate series, this simple VAR(1) model shows a distribution of the coefficients such that the observed time series cannot be distinguished from the others.

Rare events
To show that the presence of rare events is correctly handled by our Markov chain bootstrapping, we want to show how the percentile rank of the original sample changes and become more centered when calculated on a specific subset of the 5000 bootstrapped series.In particular, given the fact that the observed electricity prices show a significant positive spike at the beginning of 2002 [see Panel (a) of Fig. 1], we would like to separate the scenarios between those which reproduce such spike and those not showing it.The filter is represented by the maximum electricity price of the bootstrapped series.To this end, we choose to set a threshold of 70 euros per MWh: the scenarios where the electricity price has crossed such value are considered "spiked". 5This set contains 4751 elements, while the remaining 249 "non-spiked" scenarios fall into the other subset.
From Table 6, we can see that the overall effect of this separation is that the observed time series now "falls" much better in the middle of the 5-95 percentile range.The absence of a spike in a subsample of the entire bootstrap set is, however, welcome.Indeed, since spikes are rare events, we should not expect that they appear "regularly" in all the bootstrapped series.
The statistical soundness of the previous results, in terms of the percentile ranks of the statistics, is an important validation of the proposed bootstrap method.In particular, it is significant in the case of the autocorrelations and the autoregression coefficients, which capture the auto-and the cross-dependence among the data.Given these results, it can be said that the observed time series (both the trivariate series and the bivariate one) can be considered as trajectories sampled from the same processes generating the bootstrapped series.

A comparison with other bootstrap methods
Given that the proposed method is in the class of nonparametric approaches, to highlight its performance we develop here a comparison with other two wellestablished bootstrap methods of the same kind, namely the Variable Length  1989).The observed time series of prices and volumes of the Spanish electricity market have been used again in this comparison, thanks to their challenging features.Each method has been applied to generate 5000 bootstrapped series.
The VLMC bootstrap belongs to the class of Markov chain methods.However, to the best of our knowledge, this method addresses only the case of univariate processes.For this reason, we bounded our method to the univariate case of the electricity prices.We also bounded our method up to the order k = 7, whereas the VLMC bootstrap self-calibrates the order of the Markov chain.
The MB bootstrap is not a Markov chain method, but it is multivariate, so we have been able to fully compare it with our method.We, therefore, observed the bivariate series of electricity prices and volumes.To facilitate the comparison between the two methods, we set the length of blocks equal to 2, i.e., the order of our Markov chain in the bivariate case.
We analyzed the same statistics considered in the previous subsections (without distinguishing for rare events).Tables 7, 8, and 9 compare the 5th and the 95th percentiles, as well as the percentile ranks that the original sample takes with respect to the bootstrap distributions generated by the methods compared.The packages written in R named "VLMC" and "boot" (available at the web page http://cran.r-project.org/) were used to generate the bootstrapped series for the VLMC and the MB methods, respectively.In particular, the MB bootstrap was carried out using the function "tsboot" available in the package "boot" (other details in Davison and Hinkley 1997).Overall, our method outperforms the MB bootstrap (see Tables 7 and 8).The MB bootstrap actually shows better performances with respect to the average and standard deviation of both prices and volumes.However, it clearly performs poorly with respect to almost all the other statistics, in particular with respect to those focusing on the auto-and the cross-dependence among the data.The autocorrelations at all lags are severely underestimated compared to the corresponding value of the original sample, with percentile ranks of 99 %.The autoregression coefficients are also extremely underestimated, with percentile ranks either of 0 % or 99 %.
Our method also outperforms overall the VLMC bootstrap (see Table 9).What the VLMC does nicely is that it generates narrower ranges between the 5th and the 95th percentiles than our method.Unfortunately, they are only seldom consistent.In particular, the autocorrelations at all lags are severely underestimated in the VLMC bootstrapped series.This outcome is similar to the case of the MB bootstrap, but, given the Markov chain nature of the VLMC bootstrap, it is more surprising.
The general conclusion of the present comparison is that the superior performance of the method proposed here depends crucially, from a mathematical point of view, on a transition probability matrix keeping much of the information contained in the observed time series.This indeed explains why each value of the bootstrapped series keeps, much more precisely with respect to the methods compared here, the original dependence from its previous values, as well as from those of the other components (bivariate case).The goal of this subsection is twofold.First, we provide evidence on the effectiveness of the Two-Phase Tabu Search described in Sect. 3 by comparing its performance with that of the Tabu Search algorithm introduced in Cerqueti et al. (2013) for univariate instances without the contiguity constraint.Second, we give some insights on the computational results for the multivariate instances.The Two-Phase Tabu Search has been coded in Java.The computational experiments have been conducted on a PC Intel Xeon with a 3.33 GHz processor, and 12.00 GB of RAM.After extensive preliminary experiments, the parameters of the heuristic have been set to the values detailed in Online Resource 1-Appendix D.
It is worth to highlight that in Cerqueti et al. (2013) the approximation of the efficient frontier is built with a slightly different approach with respect to the one described in Sect.3.1 of the present work (see Sect. 3.1 of Cerqueti et al. 2013, for further details).Hence, to make a fair comparison, we adapt the Two-Phase Tabu Search and compare the approximated efficient frontiers.The comparison is based on the Spanish and German electricity prices analyzed in Cerqueti et al. (2013).The experiments are performed on the hardware used in Cerqueti et al. (2013) to allow a direct comparison of computing times.As far as the quality of the approximations of the efficient frontier is considered, we show in Fig. 3 three approximations of the efficient frontiers for the Spanish and German instances, respectively.The three approximations correspond to the heuristic designed in Cerqueti et al. (2013) (denoted as Tabu Search), algorithm TSP1, and procedure TSP1+TSP2 with the parameter settings described in Online Resource 1-Appendix D. The coordinates (m λ H , d λ H ) of the points composing the approximations of the efficient frontiers for the Spanish and German instances are reported in Tables 12 and 13 in Online Resource 1-Appendix E, respectively.In those tables, for algorithms TSP1 and TSP1+TSP2, we also report the computing time (in seconds) needed to solve each problem, while for the Tabu Search we report only the total and average computing times retrieved from Cerqueti et al. (2013).
As it can be noticed from Fig. 3, the approximations provided by both TSP1 and TSP1+TSP2 outperform the approximation found with the Tabu Search.This is evident for most of the points lying in the left half of the chart.Indeed, for most of these points, given a value of multiplicity measure m λ , both TSP1 and TSP1+TSP2 provide a solution with a smaller value of distance indicator d λ .Regarding the right half of the chart, it seems that the three heuristics found roughly the same set of points.As far as computing times are taken into consideration, both TSP1 and TSP1+TSP2 are faster than Tabu Search, as it can be seen in Table 10.
Regarding the impact of the introduction of the contiguity constraint, we can draw some conclusions.On the one hand, it seems that imposing the contiguity constraint has negligible effects on the solution quality.Indeed, only in few cases, algorithm TSP1+TSP2 has improved upon the solution found by algorithm TSP1 (note that, as we are dealing with heuristic algorithms, this outcome might also be related to a not effective performance of procedure TSP2).On the other hand, the introduction of the contiguity constraint restricts significantly the solution space making its exploration much quicker and more effective.
As far as computing the approximations of the efficient frontiers for the electricity and the stock-index cases is considered, algorithm TSP1 took approximately 5066 s to compute the approximation for the former case, whereas it required less than 4160 s for the latter case.We remark that no dominated point was found for the two instances.The details of the heuristic solutions λ H B and λ H T , i.e., the k N -partitions used as inputs for the bootstrap method in the electricity and the stock-index cases, are reported in Tables 11  and 12, respectively.Finally, Table 14 in Online Resource 1-Appendix E details the coordinates of the points found by algorithm TSP1 (the two points corresponding to λ H B and λ H T are highlighted in bold).

Conclusions
This paper contains a new methodological procedure for the application of Markov chain bootstrapping to treat the case of N -dimensional discrete time continuous-valued stochastic processes, which are of remarkable importance in many economic and financial applications.Indeed, the world of economics and finance is populated by strongly interconnected phenomena.Besides, in many cases, the dependence among variables is often difficult to know or to model satisfactorily.In all such cases, bootstrapping can show severe drawbacks if based on analytic or econometric models.On the contrary, the multivariate Markov chain bootstrapping proposed here can capture conditional dependencies of arbitrary complexity among variables.We proceed by formalizing a constrained optimization problem grounded on the basic requirements of the bootstrap theory: the bootstrapped series should maintain an N -dimensional discrete time continuous-valued stochastic process Y = {Y m } m∈N + ⊆ R N .For each n = 1, . . ., N , we will refer to the nth component of Y as {Y (n) m } m∈N + .The realization of Y m (i.e., the value assumed by Y at time m and under the occurrence of a given state of the world) will be named as y m and that of its nth component as y (n) m .Consider the vector O collecting the values of a trajectory of length M of the continuous-valued stochastic process Y: O = (y 1 , . . ., y m , . . ., y M ).
at time lag w.We define two types of neighborhood.The first type, denoted as N P1 n w , is a set containing all the UUPs [see Remark (5)] λ (n) w that can be obtained performing local changes only on UUP λ (n) w .The second type, denoted as N P1 n, n w , is a set containing all pairs of UUPs (λ } are removed from their origin class and assigned to a new adjacent class containing them as the only elements. {2}, {3}} (state 1 or, alternatively, state 2 is assigned to a new adjacent class).Observe that the (2, n)-creation consisting in removing states 1 and 2 from their origin class and assigning them to a new adjacent class is not considered, as it does not alter UUP λ (n) w .Besides the previous moves which modify only the single component λ (n) w of Npartition λ w at time lag w in the incumbent solution λ, algorithm TSP1 also considers the following moves that change simultaneously two components of λ w, say λ (n) w and λ ( n) w , with n, n = 1, . . ., N and n = n.In particular, the following moves defining neighborhood N P1 n, n .e., the nth component of λ w, with n = 1, . . ., N .The neighborhood N P2 n

Fig. 3
Fig.3Comparison of three approximations of the efficient frontiers for the Spanish and the German instances.The points in the red circles show possible preferred choices, according to the rule described at the beginning of Sect.5.1

Table 1
Percentiles and percentile ranks of the observed daily McDonald's volumes with respect to the bootstrap distributions of some statistics calculated on 5000 scenarios

Table 2
Percentiles and percentile ranks of the observed daily McDonald's prices with respect to the bootstrap distributions of some statistics calculated on 5000 scenarios

Table 3
Percentiles and percentile ranks of the observed daily DJIA prices with respect to the bootstrap distributions of some statistics calculated on 5000 scenarios

Table 4
Percentiles and percentile ranks of the observed daily Spanish electricity prices with respect to the bootstrap distributions of some statistics calculated on 5000 scenarios

Table 5
Percentiles and percentile ranks of the observed daily Spanish electricity volumes with respect to the bootstrap distributions of some statistics calculated on 5000 scenarios

Table 6
Percentile ranks of the observed daily Spanish electricity prices with respect to the bootstrap distributions of some statistics calculated on 4751 spiked scenarios and 249 non-spiked scenarios Bühlmann and Wyner 1999;Mächler and Bühlmann 2004ner 1999;Mächler and Bühlmann 2004) and the Multivariate Block (MB) bootstrap (see Künsch

Table 7
Percentiles and percentile ranks of the bootstrap distributions of some statistics calculated on 5000 scenarios generated with the Multivariate Block bootstrap (MB boot.) and the bootstrap method proposed in this paper (Spanish electricity prices) a Value is the actual value of the statistic observed in the original sample b Pctl stands for percentile c Pctl rank stands for percentile rank of the original sample value

Table 8
Percentiles and percentile ranks of the bootstrap distributions of some statistics calculated on 5000 scenarios generated with the Multivariate Block bootstrap (MB boot.) and the bootstrap method proposed in this paper (Spanish electricity volumes) Statistics b Pctl stands for percentile c Pctl rank stands for percentile rank of the original sample value 123

Table 9
Percentiles and percentile ranks of the bootstrap distributions of some statistics calculated on 5000 scenarios generated with the VLMC bootstrap and the bootstrap method proposed in this paper (Spanish electricity prices) Value is the actual value of the statistic observed in the original sample b Pctl stands for percentile c Pctl rank stands for percentile rank of the original sample value 5.3 Performance evaluation of the Two-Phase Tabu Search a

Table 10
Spanish and German instances introduced in Cerqueti et al. (2013): A summary of computing times (in seconds) for different approximations of the efficient frontiers

Table 11
Heuristic solution λ H B of the electricity case Table 12 Heuristic solution λ H T of the stock-index case i as i