# Estimation and testing for cointegration in a single equation framework

Based upon the VECM representation, Engle and Granger (1987) suggest a two – step estimation procedure for single equation dynamic modeling which has become very popular in applied research. Assuming that yt ~ I(1), then the pro­cedure goes as follows:

1. First, in order to test whether the series are cointegrated, the cointegration regression

Ун = a + в уъ + Zt (30.5)

is estimated by ordinary least squares (OLS) and it is tested whether the cointegrating residuals zt = y1t – 7 – Sy2t are I(1). To do this, for example, we can perform a Dickey-Fuller test on the residual sequence {flt} to determine whether it has a unit root. For this, consider the autoregression of the residuals

A = pflt-1 + £t, (30.6)

where no intercept term has been included since the {flt}, being residuals from a regression equation with a constant term, have zero mean. If we can reject the null hypothesis that p1 = 0 against the alternative p1 < 0 at a given significance level, we can conclude that the residual sequence is I(0) and, therefore, that y1t and y2t are CI(1, 1). It is noteworthy that for carrying out this test it is not possible to use the Dickey-Fuller tables themselves since {flt} are a generated series of residuals from fitting regression (30.5). The problem is that the OLS estimates of a and в are such that they minimize the residual variance in (30.5) and thus prejudice the testing procedure toward finding stationarity. Hence, larger (in absolute value) critical levels than the standard Dickey-Fuller ones are needed. In this respect, MacKinnon (1991) provides appropriate tables to test the null hypothesis p1 = 0 for any sample size and also when the number of regressors in (30.5) is expanded from one to several variables. In general, if the {є t} sequence exhibits serial correlation, then an augmented Dickey-Fuller (ADF) test should be used, based this time on the extended autoregression

p

A 2t = pfl – + X ZA fl-i + et, (30.6′)

i =1

where again, if p1 < 0, we can conclude that y1t and y2t are CI(1, 1). Alternative versions of the test on {z} being I(1) versus I(0) can be found in Phillips and Ouliaris (1990). Banerjee et al. (1998), in turn, suggest another class of tests based this time on the direct significance of the loading parameters in (30.4) and (30.4′) where the в coefficient is estimated alongside the remaining parameters in a single step using nonlinear least squares (NLS).

If we reject that 2t are I(1), Stock (1987) has shown that the OLS estimate of в in equation (30.5) is super-consistent, in the sense that the OLS estimator S converges in probability to its true value в at a rate proportional to the inverse of the sample size, T-1, rather than at T ~1/2 as is the standard result in the ordinary case where y1t and y2t are I(0). Thus, when T grows, convergence is much quicker in the CI(1, 1) case. The intuition behind this remarkable result can be seen by analyzing the behavior of S in (30.5) (where the constant is omitted for simplicity) in the particular case where zt ~ iid (0, о2), and that 020 = 021 = 0 and p3 = p4 = 0, so that y2t is assumed to follow a simple random walk

Ay2t = Є 2t, (3°.7)

or, integrating (30.7) backwards with y20 = 0,

y2t = Xе 2i, (30.7′)

i = 1

with e2t possibly correlated with zt. In this case, we get var(y2t) = t var(e21) = to2, exploding as T t m. Nevertheless, it is not difficult to show that T-2I|=1 ylt converges to a random variable. Similarly, the cross-product T~1/2’ZTt=1 y2tzt will explode, in contrast to the stationary case where a simple application of the central limit theorem implies that it is asymptotically normally distributed. In the I(1) case, T^1’ZTt=1y2tzt converges also to a random variable. Both random variables are functionals of Brownian motions which will be denoted henceforth, in general, as /(B). A Brownian motion is a zero-mean normally distributed continuous (a. s.) process with independent increments, i. e. loosely speaking, the continuous ver­sion of the discrete random walk; see Phillips (1987), and Chapter 29 by Bierens in this volume for further details.

Now, from the expression for the OLS estimator of в, we obtain

T

X y2tzt

S – в = —T, (30.8)

X y2t

t=1

and, from the previous discussion, it follows that

T -1X

T(S – P) =———————— Ц (30.9)

T-2 X Vl

t=1

is asymptotically (as T T ^) the ratio of two non-degenerate random variables that in general, is not normally distributed. Thus, in spite of the super-consistency, standard inference cannot be applied to S except in some restrictive cases which are discussed below.

2. After rejecting the null hypothesis that the cointegrating residuals in equa­tion (30.5) are I(1), the fl t_1 term is included in the VECM system and the remain­ing parameters are estimated by OLS. Indeed, given the super-consistency of S, Engle and Granger (1987) show that their asymptotic distributions will be iden­tical to using the true value of p. Now, all the variables in (30.4) and (30.4′) are I(0) and conventional modeling strategies (e. g. testing the maximum lag length, residual autocorrelation or whether either 011 or 021 is zero, etc.) can be applied to assess model adequacy; see Chapter 32 by Lutkepohl in this volume for further details.

In spite of the beauty and simplicity of the previous procedure, however, several problems remain. In particular, although S is super-consistent, this is an asymptotic result and thus biases could be important in finite samples. For instance, assume that the rates of convergence of two estimators are T _1/2 and 1010T-1. Then, we will need huge sample sizes to have the second estimator domi­nating the first one. In this sense, Monte Carlo experiments by Banerjee et al. (1993) showed that the biases could be important particularly when zt and Ay2t are highly serially correlated and they are not independent. Phillips (1991), in turn, has shown analytically that in the case where V2t and zt are independent at all leads and lags, the distribution in (30.9) as T grows behaves like a Gaussian distribution (technically is a mixture of normals) and, hence, the distribution of the t-statistic of в is also asymptotically normal. For this reason, Phillips and Hansen (1990) have developed an estimation procedure which corrects for the previous bias while achieving-asymptotic normality. The procedure, denoted as a fully modified ordinary least squares estimator (FM-OLS), is based upon a correction to the OLS estimator given in (30.8) by which the error term zt is conditioned on the whole process {Ay2t, t = 0, ± 1,…} and, hence, orthogonality between regressors and disturbance is achieved by construction. For example, if zt and e2t in (30.5) and (30.7) are correlated white noises with у = E(zte2t)/var(e2t), the FM-OLS estimator of p, denoted SFM, is given by

T

X V2t (V1t – yAV2t)

Sfm = — T, (30.10)

X V2t

t=1

where f is the empirical counterpart of у obtained from regressing the OLS residuals 2t on Ay2t. When zt and Ay2t follow more general processes, the FM-OLS estimator of в is similar to (30.10) except that further corrections are needed in its numerator. Alternatively, Saikkonen (1991) and Stock and Watson (1993) have shown that, since E(zt |{Ay2t}) = h(L)Ay2t, where h(L) is a two-sided filter in the lag operator L, regression of y1t on y2t and leads and lags of Ay2t (suitably truncated), using either OLS or GLS, will yield an estimator of в which is asymptotically equivalent to the FM-OLS estimator. The resulting estimation approach is known as dynamic OLS (respectively GLS) or DOLS (respectively, DGLS).