Generalized Method. of Moments
Generalized method of moments (GMM) was first introduced into the econometrics literature by Lars Hansen in 1982. Since then, GMM has had considerable impact on the theory and practice of econometrics. For theoreticians, the main advantage is that GMM provides a very general framework for considering issues of statistical inference because it encompasses many estimators of interest in econometrics. For applied researchers, it provides a computationally convenient method of estimating nonlinear dynamic models without complete knowledge of the probability distribution of the data. These applications have been in very diverse areas spanning macroeconomics, finance, agricultural economics, environmental economics, and labour economics. Depending on the context, GMM has been applied to time series, cross-sectional, and panel data. In this chapter we provide a survey of the GMM estimation framework and its properties in correctly specified models.1 Inevitably, GMM builds from earlier work, and its most obvious statistical antecedents are method of moments (Pearson, 1893, 1894, 1895) and instrumental variables estimation (Wright, 1925; Reiersol, 1941; Geary, 1942; Sargan, 1958).
To introduce the basic idea behind the GMM framework, it is useful to consider briefly the structure of method of moments (MM) estimation. Suppose that an economic and/or statistical model implies a vector of observed variables, vt, and a p x 1 vector of unknown parameters, 00, satisfy a p x 1 vector of population moment conditions,
E[/(v„ 00)] = 0. (11.1)
The MM estimator of 0 0, is found by solving the analogous sample moment condition. So if the MM estimator is denoted by PT then it is defined by
gT (Pt) = TX f(vt, P t) = 0, (11.2)
where T is the sample size. Notice that (11.2) represents a set of p equations in p unknowns and so has a unique solution under certain conditions. This approach has a natural appeal, and intuition suggests – correctly – that the solution to (11.2), PT, converges in probability to the solution to (11.1), 0 0, subject to appropriate regularity conditions. Now suppose thatf( ) is a q x 1 vector and that q > p. In this case (11.2) represents a set of q equations in p < q unknowns. Such a system typically does not possess a solution and so MM estimation is rendered infeasible. Generalized method of moments circumvents this problem by choosing the value of 0 which is closest to satisfying (11.2) as the estimator for 0 0. To make the approach operational, it is necessary to define a measure of how far gT (0) is from zero. In GMM, the measure of distance is
Qt (0) = gT (0)’^TgT (0), (11.3)
where WT is a q x q weighting matrix which must satisfy certain conditions that need not concern us for the moment. So the GMM estimator is defined to be
P t = argmin0e0QT (0), (11.4)
where 0 denotes the parameter space.
If it were always the case that q = p in econometric applications then there would be no need for a separate GMM theory because GMM would reduce to MM. However, q is greater than p in many situations of interest, and it is this possibility which leads to the unique features of the GMM framework. In this chapter we concentrate on issues pertaining to estimation. This means we will largely ignore the considerable literature on hypothesis testing based on GMM estimators. However, the interested reader can find discussion of various aspects of hypothesis testing elsewhere in this volume.2
Throughout this chapter, the analysis abstracts to the general form of population moment condition given in (11.1). However, before we begin, it is useful to present two examples which help to illustrate both how population moment conditions arise and also the forms they can take. The first example is taken from a study in the education literature based on cross-sectional data. The second example is a study from the empirical finance and macroeconomic literatures based on time series data.
1. Education example: Angrist and Krueger (1992) investigate the impact of age at school entry on educational attainment using the model,
y, n = a + в a, n + г in
where yi, n is the average number of years of education completed by students born in quarter i of year n, and ai, n is the average age of school entry for members of that cohort. Within this model, the marginal response of attainment to age of entry is captured by p, and so this represents the parameter of interest. Estimation of this parameter is complicated by a correlation between the explanatory variable and the error which arises because many children who start school at a younger age do so because they show above average learning potential. This correlation means the ordinary least squares estimator is inconsistent. However, the error is anticipated to be uncorrelated with the quarter of birth. This logic leads to the population moment condition E [zirn (yirn – a – pUin)] = 0 where z’^ = [Qv, QXi, QXi, QXij and Qu = 1 if i = j and 0 otherwise.
2. Empirical finance example: Hansen and Singleton (1982) estimate a model which seeks to explain the relationship between asset prices and their returns via the decisions of a representative consumer.3 Within this framework, a representative consumer makes consumption and investment decisions to maximize his or her expected discounted lifetime utility. If it is assumed that the agent possesses a constant relative risk aversion utility function and invests at time t in an asset which matures at time t + 1 then the asset return satisfies the equation
E[5 (rt+i/pt)(ct+i/ct)1-1 – 1| Of] = 0 (11.5)
where ri+1 is the return on the asset in period t + 1, pt is the price of the asset in period t, ct is consumption in period t, Ot is the information set available to the agent in period t, у is the agent’s coefficient of relative risk aversion and 5 is his or her discount factor. To use this model for asset pricing, it is necessary to estimate у and 5. Unfortunately, the joint distribution of consumption growth and asset returns is unknown, and this makes maximum likelihood infeasible. However, (11.5) and an iterated expectations argument imply the population moment condition,
E[zt(5 (rt+1/pt)(cM/cty-1 – 1)j = 0
where zt is a vector of variables contained in Ot.
An overview of the chapter is as follows. To begin, we return to the basic definition of the GMM estimation principle, and consider formally certain issues which were swept aside in the heuristic discussion above. Most notably, the discussion was predicated on the assumption that the population moment condition provides sufficient information to uniquely determine 0 0. This need not be the case, and Section 2 introduces the concepts of global and local identification of the parameter vector. One important ramification of q > p is that the estimation effects a decomposition on the population moment condition into so-called identifying and overidentifying restrictions. Section 3 describes this decomposition and shows how these components are linked to the parameter estimator and estimated sample moment, gT (0 T). Section 4 considers the asymptotic properties of the estimator and the estimated sample moment. For the estimator, this discussion focuses on the consistency and asymptotic distribution of the estimator. The latter can be used to construct large sample confidence intervals for the elements of 00. In practice, these intervals depend on the long-run variance of the sample moment, and so we briefly consider how this variance can be estimated. For the estimated sample moment, the discussion concentrates on its asymptotic distribution. Up to this point the analysis only restricts the weighting matrix to be a member of a certain class. However, it will emerge that the choice of WT impacts on the estimator via its asymptotic variance. Section 5 characterizes the optimal choice of WT and discusses certain issues involved in the calculation of the associated "optimal" GMM estimator. Although we restrict attention to correctly specified models, a researcher can never be sure in practice that this is the case. Section 6 describes how the estimated sample moment can be used to construct the "overidentifying restrictions test" for the adequacy of the model specification. Throughout the first six sections, the population moment condition is taken as given. The next two sections explore issues related to the choice of f (■). Section 7 shows how various other econometric estimators can be considered special cases of GMM. Section 8 considers two extremes: the optimal choice of f() and what happens if the population moment condition provides no – or virtually no – information about 0O. Finally Section 9 provides a brief review of the available evidence on the finite sample behavior of GMM.
Due to space constraints, we present only heuristic arguments for the main results and provide references to appropriate sources for more formal analyses. A rigorous treatment of the material in the chapter – and many other aspects of the GMM framework – can also be found in Hall (2000b).