# Category Introduction to the Mathematical and Statistical Foundations of Econometrics

## Independence of Linear and Quadratic Transformations of Multivariate Normal Random Variables

Let X be distributed Nn(0, In) – that is, X is n-variate, standard, normally distributed. Consider the linear transformations Y = BX, where B is a k x n matrix of constants, and Z = CX, where C is an m x n matrix of constants. It follows from Theorem 5.4 that

Then Y and Z are uncorrelated and therefore independent if and only if CBT = O. More generally we have

Theorem 5.6: Let X be distributed Nn(0, In), and consider the linear trans­formations Y = b + BX, where b is a k x 1 vector and B a k x n matrix of constants, and Z = c + CX, where cis anm x 1 vector and C anm x n ma­trix of constants. Then Y and Z are independent if and only if BCT = O.

This result can be used to set forth conditions for independence of linear and quadratic transformations of standard normal random vectors:

## First – and Second-Order Conditions

The following conditions guarantee that the first – and second-order conditions for a maximum hold.

Assumption 8.1: The parameter space © is convex and в0 is an interior point of ©. The likelihood function L n (в) is, with probability 1, twice continuously dif­ferentiable in an open neighborhood ©0 of в0, and, for i, i2 = 1, 2, 3,…,m,

and

Theorem 8.2: Under Assumption 8.1,

= —Var

Proof: For notational convenience I will prove this theorem for the uni­variate parameter case m = 1 only. Moreover, I will focus on the case that Z = (zT, •••, zT )T is a random sample from an absolutely continuous distri­bution with density f (z^0).

Observe that

1 n f

E [ln( L n (в ))/n] = -£> [ln( f( Zj |в))] = Ы(/^в ))f(z^o)dz,

n j=i J

(8.23)

It fol...

## Series Expansion of the Complex Logarithm

For the case x є К, |x | < 1, it follows from Taylor’s theorem that ln(1 + x) has the series representation

TO

ln(1 + x) = ^(-1)*-1 xk /k. (III.18)

k=1

I will now address the issue of whether this series representation carries over if we replace x by i ■ x because this will yield a useful approximation of exp(i ■ x),
which plays a key role in proving central limit theorems for dependent random variables.[25] See Chapter 7.

If (III.18) carries over we can write, for arbitrary integers m,

TO

log(1 + i ■ x) = ^2(—1)k-1ikxk/k + i ■ mn

k=1

TO

= ( — 1)2k-1i 2kx 2k / (2k)

k=1

TO

+ (-1)2k-1-1i2k-1 x2k-1/(2k – 1) + i ■ mn

k=1

TO

= J2 (—1)k-1 x 2k/(2k)

k=1

TO

+ i J2(—1)k-1 x2k-1/(2k – 1) + i ■ mn. (III.19)

k=1

On the other hand, it follows from (III.17) that 12

log(1 + i ■ x) = ln(1 + x2)...

## Conditional Expectations

3.1. Introduction

Roll a die, and let the outcome be Y. Define the random variable X = 1 if Y is even, and X = 0 if Y is odd. The expected value of Y is E[Y] = (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. But what would the expected value of Y be if it is revealed that the outcome is even: X = 1? The latter information implies that Y is 2, 4, or 6 with equal probabilities 1 /3; hence, the expected value of Y, conditional on the event X = 1, is E[Y|X = 1] = (2 + 4 + 6)/3 = 4. Similarly, if it is revealed that X = 0, then Y is 1, 3, or, 5 with equal probabilities 1 /3; hence, the expected value of Y, conditional on the event X = 0, is E[Y|X = 0] = (1 + 3 + 5)/3 = 3. Both results can be captured in a single statement:

E [Y | X] = 3 + X. (3.1)

In this example the conditional probability of Y = y, given, X...

## Applications of the Uniform Weak Law of Large Numbers

6.4.2.1. Consistency of M-Estimators

Chapter 5 introduced the concept of a parameter estimator and listed two desir­able properties of estimators: unbiasedness and efficiency Another obviously

See Appendix II.

desirable property is that the estimator gets closer to the parameter to be esti­mated if we use more data information. This is the consistency property:

Definition 6.5: An estimator в of a parameter в, based on a sample of size n, is called consistent ifplimn^XlB = в.

Theorem 6.10 is an important tool in proving consistency of parameter esti­mators. A large class of estimators is obtained by maximizing or minimizing an objective function of the form(1/n) nj=1 g(Xj, в), where g, Xj, and в are the same as in Theorem 6.10...

## Gaussian Elimination of a Nonsquare Matrix

The Gaussian elimination of a nonsquare matrix is similar to the square case ex­cept that in the final result the upper-triangular matrix now becomes an echelon matrix:

Definition I.10: Anm x n matrix Uis an echelon matrix if, for i = 2,…, m, the first nonzero element of row i is farther to the right than the first nonzero element of the previous row i — 1.

For example, the matrix

 2 0 1 0 0 3 0 0 0

is an echelon matrix, and so is

 2 0 1 0 0 0 0 1 0 0 0 0
 U=

Theorem I.8 can now be generalized to

Theorem I.11: For each matrix A there exists a permutation matrix P, possibly equal to the unit matrix I, a lower-triangular matrix L with diagonal elements all equal to 1, and an echelon matrix U such that PA = LU...