# Preliminaries: Unit Roots and Cointegration

2.1 Some basic concepts

A well known result in time series analysis is Wold’s (1938) decomposition theorem which states that a stationary time series process, after removal of any determi­nistic components, has an infinite moving average (MA) representation which, under some technical conditions (absolute summability of the MA coefficients), can be represented by a finite autoregressive moving average (ARMA) process.

However, as mentioned in the introduction, many time series need to be appropriately differenced in order to achieve stationarity. From this comes the definition of integration: a time series is said to be integrated of order d, in short I(d), if it has a stationary, invertible, non-deterministic ARMA representation after differencing d times. A white noise series and a stable first-order autoregressive AR(1) process are well known examples of I(0) series, a random walk process is an example of an I(1) series, while accumulating a random walk gives rise to an I(2) series, etc.

Consider now two time series y1t and y2t which are both I(d) (i. e. they have compatible long-run properties). In general, any linear combination of y1t and y2t will be also I(d). However, if there exists a vector (1, – P)’, such that the linear combination

Zt = yu – a – Pj/2t (30.1)

is I(d – b), d > b > 0, then, following Engle and Granger (1987), y1t and y2t are defined as cointegrated of order (d, b), denoted yt = (y1t, y2t)’ ~ CI(d, b), with (1, – P)’ called the cointegrating vector.

Several features in (30.1) are noteworthy. First, as defined above, cointegration refers to a linear combination of nonstationary variables. Although theoretically it is possible that nonlinear relationships may exist among a set of integrated variables, the econometric practice about this more general type of cointegration is less developed (see more on this in Section 4). Second, note that the cointegrating vector is not uniquely defined, since for any nonzero value of K, (K, – KP)’ is also a cointegrating vector. Thus, a normalization rule needs to be used; for example,

X = 1 has been chosen in (30.1). Third, all variables must be integrated of the same order to be candidates to form a cointegrating relationship. Notwithstanding, there are extensions of the concept of cointegration, called multicointegration, when the number of variables considered is larger than two and where the possibility of having variables with different order of integration can be addressed (see, e. g. Granger and Lee, 1989). For example, in a trivariate system, we may have that y1t and y2t are I(2) and y3t is I(1); if y1t and y2t are CI(2, 1), it is possible that the corresponding combination of y1t and y2t which achieves that property be itself cointegrated with y3t giving rise to an I(0) linear combination among the three variables. Fourth, and most important, most of the cointegration litera­ture focuses on the case where variables contain a single unit root, since few economic variables prove in practice to be integrated of higher order. If vari­ables have a strong seasonal component, however, there may be unit roots at the seasonal frequencies, a case that we will briefly consider in Section 4; see Chapter 30 by Ghysels, Osborn, and Rodrigues in this volume for further details. Hence, the remainder of this chapter will mainly focus on the case of CI(1, 1) variables, so that zt in (30.1) is I(0) and the concept of cointegration mimics the existence of a long-run equilibrium to which the system converges over time. If, e. g., economic theory suggests the following long-run relationship between y1t and y2t,

y1t = a + Py2t, (30.2)

then zt can be interpreted as the equilibrium error (i. e. the distance that the system is away from the equilibrium at any point in time). Note that a constant term has been included in (30.1) in order to allow for the possibility that zt may have nonzero mean. For example, a standard theory of spatial competition argues that arbitrage will prevent prices of similar products in different locations from moving too far apart even if the prices are nonstationary. However, if there are fixed transportation costs from one location to another, a constant term needs to be included in (30.1).

At this stage, it is important to point out that a useful way to understand cointegrating relationships is through the observation that CI(1, 1) variables must share a set of stochastic trends. Using the example in (30.1), since y1t and y2t are I(1) variables, they can be decomposed into an I(1) component (say, a random walk) plus an irregular I(0) component (not necessarily white noise). Denoting the first components by Mit and the second components by utt, i = 1, 2, we can write

y1t = Mu + u1t (3°.3)

y2t = M2t + u2t. (30.3 )

Since the sum of an I(1) process and an I(0) process is always I(1), the previous representation must characterize the individual stochastic properties of y1t and y2t. However, if y1t – Py2t is I(0), it must be that |M1t = Pp2t, annihilating the I(1) component in the cointegrating relationship. In other words, if y1t and y2t are

CI(1, 1) variables, they must share (up to a scalar) the same stochastic trend, say pt, denoted as common trend, so that p1t = pt and p2t = PlT. As before, notice that if pt is a common trend for y1t and y2t, Xpt will also be a common trend implying that a normalization rule is needed for identification. Generalizing the previous argument to a vector of cointegration and common trends, then it can be proved that if there are n – r common trends among the n variables, there must be r cointegrating relationships. Note that 0 < r < n, since r = 0 implies that each series in the system is governed by a different stochastic trend and that r = n implies that the series are I(0) instead of I(1). These properties constitute the core of two important dual approaches toward testing for cointegration, namely, one that tests directly for the number of cointegrating vectors (r) and another which tests for the number of common trends (n – r). However, before explaining those approaches in more detail (see Section 3), we now turn to another useful repre­sentation of CI(1, 1) systems which has proved very popular in practice.

Engle and Granger (1987) have shown that if y1t and y2t are cointegrated CI(1, 1), then there must exist a so-called vector error correction model (VECM) representation of the dynamic system governing the joint behavior of y1t and y2t over time, of the following form

P1 P2

AV1t = 010 + 011Zt-1 + ^012,i AV1,t-i + 1013,i tyl, t-i + e1t, (3°.4)

i=1 i=1

Рз P4

AV2t = 020 + 021Zt-1 + X022, i AV1,t-i + X023, i AV2,t-i + ^2t, (30.4′)

i=1 i=1

where A denotes the first-order time difference (i. e. Ayt = yt – yt-1) and where the lag lengths pi, i = 1,…, 4 are such that the innovations et = (e1t, e2t)’ are iid (0, X). Furthermore, they proved the converse result that a VECM generates cointegrated CI(1, 1) series as long as the coefficients on zt-1 (the so-called loading or speed of adjustment parameters) are not simultaneously equal to zero.

Note that the term zt-1 in equations (30.4) and (30.4′) represents the extent of the disequilibrium levels of V1 and V2 in the previous period. Thus, the VECM representation states that changes in one variable not only depends on changes of the other variables and its own past changes, but also on the extent of the disequilibrium between the levels of y1 and y2. For example, if в = 1 in (30.1), as many theories predict when y1t and y2t are taken in logarithmic form, then if y1 is larger than y2 in the past (zt-1 > 0), then 011 < 0 and 021 > 0 will imply that, everything else equal, V1 would fall and V2 would rise in the current period, implying that both series adjust toward its long-run equilibrium. Notice that both 011 and 021 cannot be equal to zero. However, if 011 < 0 and 021 = 0, then all of the adjustment falls on V1, or vice versa if 011 = 0 and 021 > 0. Note also that the larger are the speed of adjustment parameters (with the right signs), the greater is the convergence rate toward equilibrium. Of course, at least one of those terms must be nonzero, implying the existence of Granger causality in cointegrated systems in at least one direction; see Chapter 32 by Lutkepohl in this volume for the formal definition of causality. Hence, the appeal of the VECM formulation is that it combines flexibility in dynamic specification with desirable long-run properties: it could be seen as capturing the transitional dynamics of the system to the long-run equilibrium suggested by economic theory (see, e. g. Hendry and Richard, 1983). Further, if cointegration exists, the VECM representation will generate better forecasts than the corresponding representation in first-differenced form (i. e. with 0U = 021 = 0), particularly over medium – and long-run horizons, since under cointegration zt will have a finite forecast error variance whereas any other linear combination of the forecasts of the individual series in yt will have infinite variance; see Engle and Yoo (1987) for further details.