Category Introduction to the Mathematical and Statistical Foundations of Econometrics

Applications to Statistical Inference under Normality

5.6.1. Estimation

Statistical inference is concerned with parameter estimation and parameter in­ference. The latter will be discussed next in this section.

In a broad sense, an estimator of a parameter is a function of the data that serves as an approximation of the parameter involved. For example, if X1, X2,Xn is a random sample from the N(д, a2)-distribution, then the sample mean X = (1 /n) Хд = Xj may serve as an estimator of the un­known parameter д (the population mean). More formally, given a data set {X1, X2,.Xn } for which the joint distribution function depends on an un­known parameter (vector) в, an estimator of в is a Borel-measurable function 6 = gn (X1,.Xn) of the data that serves as an approximation of в ...

Read More

Asymptotic Efficiency of the ML Estimator

The ML estimation approach is a special case of the M-estimation approach discussed in Chapter 6. However, the position of the ML estimator among the M – estimators is a special one, namely, the ML estimator is, under some regularity conditions, asymptotically efficient.

To explain and prove asymptotic efficiency, let


в = argmax(1/n) V’ g( 1j, в) (8.43)

ве© j=1

be an M-estimator of

Подпись: (8.44)§o = argmax E[g( 11, §)],

в ев

where again 11,…, 1n is a random sample from a k-variate, absolutely con­tinuous distribution with density f (z|§0), and © c Rm is the parameter space. In Chapter 6, I have set forth conditions such that

4П(§ _ §0) ^dNm [0, A_1 BA-1], (8.45)

Подпись: AE Подпись: d 2 g( Z 1,§0) d§0d§0T image709 Подпись: (8.46)



B = E [(dg( 11, §0)/d§0T) (dg( 11, §0>/9§0>]

Подпись: (8.47)= J {dg(z, §0)^) (dg(z, §0)/d§0) f (z| §0)dz.


Read More

Conditional Probability Measures and Conditional Independence

The notion of a probability measure relative to a sub-a – algebra can be defined as in Definition 3.1 using the conditional expectation of an indicator function:

Definition 3.2: Let {2, X, P} be a probability space, and let X0 C X be a a-algebra. Then for any set A in IX, P(A|X0) = E[IA|X0], where IA(ш) = I(ш e A).

In the sequel I will use the shorthand notation P(Y e B|X) to indicate the conditional probability P({ш e 2 : Y(ш) e B}|XX), where B is a Borel set and XX is the a – algebra generated by X, and P (Y e B|X0) to indicate P ({ш e 2 : Y(ш) e B}|X0) for any sub-a-algebra X0 of X. The event Y e B involved may be replaced by any equivalent expression.

Recalling the notion of independence of sets and random variables, vectors, or both (see Chapter 1), we can now define conditional...

Read More

Convergence in Distribution

Let Xn be a sequence of random variables (or vectors) with distribution functions Fn (x), and let X be a random variable (or conformable random vector) with distribution function F(x).

Definition 6.6: We say that Xn converges to X in distribution (denoted by Xn ^d X) if limn^XlFn(x) = F(x) pointwise in x – possibly except in the dis­continuity points of F(x).

Alternative notation: If X has a particular distribution, for example N(0, 1), then Xn ^d X is also denoted by Xn ^d N(0, 1).

The reason for excluding discontinuity points of F(x) in the definition of convergence in distribution is that limn^TO Fn(x) may not be right-continuous in these discontinuity points. For example, let Xn = X + 1/n. Then Fn(x) = F(x – 1/n)...

Read More

Projections, Projection Matrices, and Idempotent Matrices

Consider the following problem: Which point on the line through the origin and point a in Figure I.3 is the closest to point b? The answer is point p in Figure

I.4. The line through b and p is perpendicular to the subspace spanned by a, and therefore the distance between b and any other point in this subspace is larger than the distance between b and p. Point p is called the projection of b on the subspace spanned by a. To find p, let p = c ■ a, where c is a scalar. The distance between b and p is now ||b – c ■ a\; consequently, the problem is to find the scalar c that minimizes this distance. Because ||b – c ■ a\ is minimal if and only if

||b — c ■ a||2 = (b — c ■ a)T(b — c ■ a) = bTb — 2c ■ aTb + c2aTa

is minimal, the answer is c = aTb/aTa; hence, p = (aTb/aTa) ■ a.

Similarly, we...

Read More

Distributions Related to the Standard Normal Distribution

The standard normal distribution generates, via various transformations, a few other distributions such as the chi-square, t, Cauchy, and F distributions. These distributions are fundamental in testing statistical hypotheses, as we will see in Chapters 5, 6, and 8.

4.6.1. The Chi-Square Distribution

Let X1,Xn be independent N(0, 1)-distributed random variables, and let


Yn = £ X2. (4.30)


The distribution of Yn is called the chi-square distribution with n degrees of freedom and is denoted by x2 or x2(n). Its distribution and density functions

can be derived recursively, starting from the case n = 1:

Gi(y) = P[71 < y] = P [X < y] = P[-Vy < Xi < Vt]

4у 4у

= j f (x)dx = 2 j f (x)dx for y > 0,

-Vt о

Gi(y) = 0 for y < 0,

where f (x) is defined by (4.28); hence,

gi(y) = G 1(y) = f (Vу) /...

Read More