# Correlation Dimension Approach to Research

Unfortunately, there exists no statistical test that has chaos as a hypothesis, nor a characteristic property separating chaos from stochastic process. The basic method for identification is the algorithm by Grassberger and Procaccia (1983), presenting a characteristic property of a wide class of pure stochastic processes. The algorithm is based on the concept of a correlation dimension for the observed m-dimensional trajectory. The main idea of the method is the following: given observable trajectory x1, x2, … xN, we reconstruct a series of m-dimensional vectors yk = (xk, xk-p, … xk-(m + Dp)mx 1. m and p are considered a priori given the parameters of the method. Then we find an estimate of the so-called correlation integral of the system: number of pairs (yi, yj): 11 y; – yj ||<g

total number of pairs (yi, yj)
m

m! mimd-i) Hs – II yi – yj II);

i, j =1

where 9(x) is a Heaviside step function. For small £ correlation integral grows according to power law at the rate of D(m):

Cm (") * £D(m) Fig. 2 Correlation dimension for Lukoil stock spread series (left) and relative spread series (right)

For stochastic white noise D(m) is proportional to m, but for a large class of deterministic systems, correlation exponent D(m) has saturation level D’ which can be used as a characteristic of non-stochastic behavior of the variable. Figure 2 demonstrates correlation exponent D(m) for Lukoil stock spread and relative spread. Saturation of correlation exponent D can be seen for return, price changes and relative spread, indicating the existence of complex nonlinear but deterministic behavior. Price and spread show pure stochastic properties. Another advantage of the Grassberger and Procaccia algorithm is that correlation dimension allows us to find upper boundaries for generating system dimensions. Taken’s embedding theorem implies that phase dimension of the system cannot be higher than 2D’ + 1, where D’ is the saturation level.

Unfortunately, realization of the Grassberger and Procaccia method is quite difficult in practice. One shortcoming is a priori value of lag parameter p. The classical solution is to estimate an autocorrelation function of the series and take the first lag value at which autocorrelation turns to zero. The main problem is an insufficient amount of data for correlation integral estimate. While in natural sciences the amount of data used for one test approaches 20,000-30,000, the usual length of a financial series is about several thousand (for example, daily index data or aggregated intraday data). This makes estimates of correlation dimensions unreliable for m higher than 10-15. Moreover, small values of threshold £ lead to insufficient number of summands in estimate and zero value of integral for rather small values. Figure 3 shows a real form of correlation integral for different values of a threshold in a logarithmic scale. For large lengths of input series, the dependency must be close to linear, but in practice the property holds only for a certain range of threshold values that must be chosen very carefully.

Another approach to identifying nonlinear behavior in data was introduced by Brock et al. (1986). The authors presented a statistical test which has i. i.d. of the series as a null hypothesis. Typical use of the method is fitting some a priori linear model to given data and testing residuals for i. i.d. property. Necessary statistics uses the correlation integral estimate, which raises all the above mentioned problems, such as large amount of input data. Liu et al. (1992) examined the possibilities of a BDS test and found that its power varies for different linear models, e. g., its power, is less for nonlinear moving average models. It is also necessary to emphasize that rejection of the null cannot be interpreted as the presence of chaotic model. It only implies some (probably stochastic) nonlinearity.