Basic Elements of. Asymptotic Theory
Consider the estimation problem where we would like to estimate a parameter vector 0 from a sample Y1, …, Yn. Let 0n be an estimator for 0, i. e. let 0n =
h (Y1,…, Yn) be a function of the sample.1 In the important special case where 0n is a linear function of Y1,…, Yn, i. e. 0n = Ay, where A is a nonrandom matrix and y = (Y1,…, Yn)’, we can easily express the expected value and the variance – covariance matrix of 0n in terms of the first and second moments of y (provided those moments exist). Also, if the sample is normally distributed, so is 0n. Well known examples of linear estimators are the OLS – and the GLS-estimator of the linear regression model. Frequently, however, the estimator of interest will be a nonlinear function of the sample. In principle, the distribution of 0n can then be found from the distribution of the sample, if the model relating the parameter 0 to the observables Y1,…, Yn fully specifies the distribution of the sample. For example in a linear regression model with independently and identically distributed errors this would require assuming a specific distribution for the errors. However, even if the researcher is willing to make such a specific assumption, it will then still often be impossible – for all practical purposes – to obtain an exact expression for the distribution of 0n because of the complexity of the necessary calculations. (Even if 0n is linear, but the distribution of y is nonnormal, it will typically be difficult to obtain the exact distribution of 0n.) Similarly, obtaining expressions for, say, the first and second moments of 0n will, for practical purposes, typically be unfeasible for nonlinear estimators; and even if it is feasible, the resulting expressions will usually depend on the entire distribution of the sample, and not only on the first and second moments as in the case of a linear estimator. A further complication arises in case the model relating 0 to the observables Y1,…, Yn does not fully specify the distribution of Y1,…, Yn. For
example in a linear regression model the errors may only be assumed to be identically and independently distributed with zero mean and finite variance, without putting any further restrictions on the distribution function of the disturbances. In this case we obviously cannot get a handle on the distribution of Pn (even if Pn is linear), in the sense that this distribution will depend on the unknown distribution of the errors.
In view of the above discussed difficulties in obtaining exact expressions for characteristics of estimators like their moments or distribution functions we will often have to resort to approximations for these exact expressions. Ideally, these approximations should be easier to obtain than the exact expressions and they should be of a simpler form. Asymptotic theory is one way of obtaining such approximations by essentially asking what happens to the exact expressions as the sample size tends to infinity. For example, if we are interested in the expected value of Pn and an exact expression for it is unavailable or unwieldy, we could ask if the expected value of Pn converges to 0 as the sample size increases (i. e. if Pn is "asymptotically unbiased"). One could try to verify this by first showing that the estimator Pn itself "converges" to 0 in an appropriate sense, and then by attempting to obtain the convergence of the expected value of Pn to 0 from the "convergence" of the estimator. In order to properly pose and answer such questions we need to study various notions of convergence of random vectors.
The article is organized as follows: in Section 2 we define various modes of convergence of random vectors, and discuss the properties of and the relationships between these modes of convergence. Sections 3 and 4 provide results that allow us to deduce the convergence of certain important classes of random vectors from basic assumptions. In particular, in Section 3 we discuss laws of large numbers, including uniform laws of large numbers. A discussion of central limit theorems is given in Section 4. In Section 5 we suggest additional literature for further reading.
We emphasize that the article only covers material that lays the foundation for asymptotic theory. It does not provide results on the asymptotic properties of estimators for particular models; for references see Section 5. All of the material presented here is essentially textbook material. We provide proofs for some selected results for the purpose of practice and since some of the proofs provide interesting insights. For most results given without a proof we provide references to widely available textbooks. Proofs for some of the central limit theorems presented in Section 4 are given in a longer mimeographed version of this article, which is available from the authors upon request.
We adopt the following notation and conventions: throughout this chapter Zv Z2,…, and Z denote random vectors that take their values in a Euclidean space Rk, k > 1. Furthermore, all random vectors involved in a particular statement are assumed to be defined on a common probability space (Q, F, P), except when noted otherwise. With |.| we denote the absolute value and with ||.|| the Euclidean norm. All matrices considered are real matrices. If A is a matrix, then A denotes its transpose; if A is a square matrix, then A-1 denotes the inverse of A. The norm of a matrix A is denoted by 11 A11 and is taken to be 11 vec(A) 11, where vec(A) stands for the columnwise vectorization of A. If Cn is a sequence of sets, then Cn T C stands for Cn C Cn+1 for all n Є N and C = IX=i Cn. Similarly, Cn і C stands for Cn D Cn+1 for all n Є N and C = I °TO=1 Cn. Furthermore, if B is a set, then 1(B) denotes the indicator function of B.