Problem Statement

Following Brodsky et al. (2009), the problem of structural break detection could be formulated through nonparametric estimation of copulas applied to time series. In contrast to the original paper, this approach was stated for structural break detection in time series; the hypothesis about independence of multidimensional vectors of observations is not proven.

Let us look at a batch of observations of time series x1 … xN. Assuming

time series of the form AR(m) with nonlinear dependence structure of previous

observations, in any time of moment t = m + 1 … N, it can be assumed that

dependence from lagged values is defined through some continuous (m + 1)-

dimensional copula Ct(xt; xt_ 1; … xt_m). The problem of determining structural

break is that hypothesis H0: C2 = … = CN about the permanence of dependence

 {

C — … — c = C *

m+1 „„ where C* ф C**.

Ck+i = ••• = Cn = C

In the case of rejecting the null hypothesis, it’s required to find a consistent estimation K of the structural break moment.

The proposed technique is based on estimation in every moment of existing dependence before and after the suspected moment, and if the difference between the dependences is large enough, we could identify changing of dependence copula in this moment of time. For estimation, the nonparametric method is used. Nonparametric methods are based either on estimation of empirical copula or kernel estimations (Penikas 2010). We construct an empirical copula at first estimates marginal distribution functions for xt, L(xt), … Lm (xt) where L = lag operator, or    {Ls(xty/m= 0:

Where I(A)—indicator of event A. Then we found estimations of pseudo­observations:

xsi = Fsemp (xi), i = s + 1, s + 2… N — m + s; s = 0, 1,… m. (20)

Omelka et al. (2009) used Monte Carlo modeling to show that it’s better to use asymptotically equivalent estimations:

Nm

xsi = Femp (x, ), i = s + 1, s + 2…N — m + s; s = 0, 1,… m,

N — m + 1

(21)

in which small shifts move pseudo-observations to zero and works better on finite samples. Thus N-m (m + 1)-dimensional observations of current and m lagged values are derived, the dependence between which is assumed to be defined through the corresponding (m + 1)-dimensional copula.

It is worth noting that in this stage of derived empirical marginal distribution functions Fesmp(x), the stationary condition could be checked by comparing any two obtained functions with two-sample test (Kolmogorov-Smirnov, Cramer-von Mises, Anderson-Darling, chi-squared).

 *I (U1 > X1h) *■■■* I (Um > Xmim)   Further, for every time of moment L = m + 1 … N estimates of empirical copulas are based before and after an anticipated moment of break:

(22) 1 N—m N—m+1 N

N—L io=L—m + 1 i1=L— m+2". im =L+11 (U0 > X0io)

*I (U1 > Х1І1) *■■■* I (Um > Xmim)

(23)

Copula evaluations may also be used depending on nuclear and evaluation. Then CbLefore and Cfter will be smooth multidimensional functions, weakly converging to the true distributions.

A measure of difference between copulas cLefore and Cfter could be applied via modified Kolmogorov-Smirnov statistic as suggested by Brodsky et al. (2009) and used by Penikas (2012). At each time moment L = m + 1 … N following function is constructed:

^L (U0, …Um) = abs j CbLefore (U0, …Um) – CLfter (U0,… Um)} * W (L), (24)

where W(L)—special correction factor, depending on proximity of the L moment to the middle of the sample of observations.

Then as a measure of value of statistic it’s accepted that:

Tks = maxLeB(N, p) {SUP(U0,…Um)2[0;1]m + 1 ( *L (U0, . . . Um)) , (25)

and as a estimation of break moment

Kks = argmaxL2B(N, P){SUP(UQ,…Um)2[0-,1]m + 1 ( ^L (u0; . . . Um))) ; (26)

where B(N, P)—set of moments of time m + 1 … N not including share ft of first values and (1 — P) of last values: (N, P) = m + [P*(N-m-1)] + 1, m + [P*(N-m – 1)] + 2, … N — [P*(N-m-1)] — 2, N — [P*(N-m-1)] — 1. Due to the small quantity of observations in estimation, one of the empirical copulas could have obtained unlikely statistic values. In this approach, the difference between multidimensional functions is determined by the maximal value of difference between copulas over
all points of the (m + 1)-dimensional unit cube and over all observations, except for some shares from two ends of the sample.

In (Brodsky et al. 2009) they suggested W(L) = V(L — m) * (N — L/N — m, however the results of the next chapter convince, that more accurate results are obtained, using the square of the coefficient: W(L) = (L — m)*(N — L)/(N — m)2. For convenience, in the denominator you can use the first degree. It will not affect the assessment of the time shift, and will only increase the value of the statistics at the time of L and the critical value of the statistic, which will be discussed later.

Moreover, we can use the difference between copulas, integrated over the unit cube. The obtained modified statistic (modified Cramer-von Mises statistic) will be expressed as follows:

Tcm = maxL2B(N, p) ^(ио,…Ит) duo…dUm (27)

[0;1]m+1

Kcm = argmaxL€B(N, p) {jj +&ь(ио,…ыт) duo…dum),j (28)

For every L finding maximum ФL(u0, … um) and integration is carried out numerically on grid with mesh size Q for each of the axes, i. e. with nodes of type (‘0/g і… ^ Q ), І0, … im = 0,1, … Q.