# MA(q): the probabilistic reduction perspective

At this point it is important to emphasize that the above discussion relating to the convergence of certain partial sums of the MA(«) coefficients is not helpful

from the empirical modeling viewpoint because the restrictions cannot be assessed a priori. Alternatively, one can consider restrictions on the temporal covariances of the observable process {yt, t Є T} which we can assess a priori:

1d |
constant mean: |
E(yt) := P, t Є T, |

2d |
constant variance: |
var( yt) := a0, t Є T, |

3d |
ш-autocorrelation: |
, , Кк т 1 ^ . . . , ^ cov(yu yt-т) := n |

[0, t > q, |
||

4d |
normality: |
yt ~ N(-,.), t Є T. (28.22) |

where the first two moments in terms of the statistical parameterization ф := (a0, a1,…, aq, a2) take the form:

In view of the fact that the likelihood function is defined in terms of the joint distribution of the observable process {yt, t Є T}, it is apparent that:

Цф) ^ (2n)- "(det О(ф))-2exp{-Ky – Ьї0)тО(Ф)-1(у – by,)},

where the T x T temporal variance-covariance О(ф) takes the banded Toeplitz form with all elements along the diagonal and off-diagonals up to q coincide and are nonzero but after the qth off-diagonal the covariances are zero. This gives rise to a loglikelihood function whose first-order conditions with respect to ф are nonlinear and the estimation requires numerical optimization; see Anderson (1971).

Returning to the Wold decomposition theorem we note that the probabilistic structure of the observable process {yt, t Є T} involves only normality and stationarity which imply that the variance-covariance matrix is Toeplitz, which, when compared with О(ф) the result becomes apparent; the banded Toeplitz covariance matrix in О(ф) as T ^ gives rise to a MA(q) formulation and the unrestricted Toeplitz covariance matrix as T ^ gives rise to a MA(^) formulation. Does this mean that to get an operational model we need to truncate the temporal covariance matrix, i. e. assume that aT = 0 for all т > q, for some q > 1? This assumption will give rise to the MA(q) model but there are more general models we can contemplate that do not impose such a strong restriction. Instead, we need some restrictions which ensure that aT ^ 0 as т ^ ^ at a "reasonable" rate such as:

I ат I < cXT, c > 0, 0 < X < 1, т = 1, 2, 3,… (28.24)

This enables us to approximate the non-operational MA(^) representation with operational models from the broader ARMA(p, q) family. This should be contrasted with stochastic processes with long memory (see Granger, 1980) where:

| oT | < cT(2d-1), c > 0, 0 < d < .5, T = 1, 2, 3,… (28.25)

In cases where, in addition to the normality and stationarity, we assume that the process {yt, t Є T} satisfies the dependence restriction (28.24), we can proceed to approximate the infinite polynomial in the lag operator L, a„(L) = 1 + a1L + … + akLk + …, of the MA(«) representation:

yt = fi + Xакгt-k + £t = fi + a«,(L) ■ Et, t Є T, (28.26)

k= 1

bya rati°of two finite order polynomials a„(L) = := g+g++g ;…’..;^,

p > q > 0; (see Dhrymes, 1971). After re-arranging the two polynomials:

yt = h + £t ^ ap(L)yt = h + Yq (L)Zt, t Є T,

a p(L)

yields the autoregressive-moving average model ARMA(p, q) popularized by Box and Jenkins (1970):

p q

yt + X aкУ-к = h + X Yk^t-k + et, t Є T.

к =1 k=1

Such models proved very efficient in capturing the temporal dependence in time series data in a parsimonious way but failed to capture the imagination of economic modelers because it’s very difficult to relate such models to economic theory.

The question that arises in justifying this representation is why define the statistical GM in terms of the errors? The only effective justification is when the modeler has a priori evidence that the dependence exhibited by the time series data is of the q-autocorrelation form and q is reasonably small. On the other hand, if the dependence is better described by (28.24), the AR( p) representation provides a much more effective description. The relationship between the MA(q) representation (28.18) and the autoregressive AR(^) representation takes the form:

yt = X bkyt-k + £t, t Є T,

k=1

where the coefficients are related (by equating the coefficients) via:

a = a1, A = аг + аД, Ьз = аз + a^2 + a2by…,…,

q

bq = aq + a1bq-1 + a2bq-2 + … + aq-A, Ьт = X aA-k, T > q.

k=1

Given that bT T^°°> 0, the modeler can assume that the latter representation can be approximated by a finite AR(p) model; which is often preferred for forecasting.

## Leave a reply