# Category Introduction to the Mathematical and Statistical Foundations of Econometrics

## Elementary Matrices and Permutation Matrices

Let A be the m x n matrix in (I.14). An elementary m x m matrix E is a matrix such that the effect of EA is the addition of a multiple of one row of A to another row of A. For example, let Ei, j (c) be an elementary matrix such that the effect

of E,, j(c)A is that c times row j is added to row i < j:

 a1,1 • . . a1,n ^ ai-1,1 . . . ai — 1,n ai, 1 + caj, 1 • ♦ ♦ ai, n + caj, n ai+1,1 . • • ai +1,n aj,1 ••• a j, n am,1 • • • am, n /

 Ei, j (c) A

 (1.19)

Then E+j (c)6 is equal to the unit matrix Im (compare (1.18)) except that the zero in the (i, j)’s position is replaced by a nonzero constant c. In particular, if i = 1 and j = 2 in (I...

## Transformations of Discrete Random Variables and Vectors

In the discrete case, the question Given a random variable or vector X and a Borel measure function or mapping g(x), how is the distribution of Y = g(X) related to the distribution of X? is easy to answer. If P[X є {хь x2,…}] = 1 and
g(x1), g(x2),… are all different, the answer is trivial: P(Y = g(xj)) = P(X = Xj). If some of the values g(x1), g(x2),… are the same, let {y1, y2,…} be the set of distinct values of g(x1), g(x2),… Then

TO

P(Y = yj) = £ I[y = g(Xi)]P(X = xi). (4.13)

i=1

It is easy to see that (4.13) carries over to the multivariate discrete case.

For example, if X is Poisson(X)-distributed and g(x) = sin2(nx) = (sin(nx))2 – and thus for m = 0, 1, 2, 3,…, g(2m) = sin2(nm) = 0 and g(2m + 1) = sin2(nm + n/2) = 1 – then P(Y = 0) = e-XY°TO=0 xlj/(2j)! and P(Y = 1) = e-kJ2j=0 X2j+1...

## Dependent Laws of Large Numbers and Central Limit Theorems

Chapter 6 I focused on the convergence of sums of i. i.d. random variables – in particular the law of large numbers and the central limit theorem. However, macroeconomic and financial data are time series data for which the indepen­dence assumption does not apply. Therefore, in this chapter I will generalize the weak law of large numbers and the central limit theorem to certain classes of time series.

7.1. Stationarity and the Wold Decomposition

Chapter 3 introduced the concept of strict stationarity, which for convenience will be restated here:

Definition 7.1: A time series process Xt is said to be strictly station­ary if, for arbitrary integers m < m2 < ••• < mn, the joint distribution of Xt—m1Xt—mn does not depend on the time index t.

A weaker version of stationarity is cova...

## Eigenvalues and Eigenvectors of Symmetric Matrices

On the basis of (I.60) it is easy to show that, in the case of a symmetric matrix A, в = 0 and b = 0:

Theorem I.34: The eigenvalues of a symmetric n x n matrix A are all real valued, and the corresponding eigenvectors are contained in Кп.

Proof: First, note that (I.60) implies that, for arbitrary Ц є К,

/ bT / A — a In в In (a

0 £a) —в In A — a iJ bJ

= Ц aT Ab + bT Aa — abTa — ЦaaTb + в bTb — ^a^a.

Next observe that bTa = aTb and by symmetry, bTAa = (bTAa)T = aTATb = aT Ab, where the first equality follows because bT Aa is a scalar (or1 x 1 matrix). Then we have for arbitrary Ц є К,

(Ц + 1)aT Ab — a(t; + 1)aTb + в (bTb — Ц aTa) = 0. (I.61)

If we choose Ц = —1 in (I.61), then в(bTb + aTa) = в ■ x ||2 = 0; conse­quently, в = 0 and thus X = a є К...

## Follows now from Theorem 5.2. Q. E. D

Note that this result holds regardless of whether the matrix BE BT is non­singular or not. In the latter case the normal distribution involved is called “singular”:

Definition 5.2: Ann x 1 random vector Y has a singular Nn (д, E) distribution if its characteristic function is of the form yY (t) = exp(i ■ t т д – 21TE t) with E a singular, positive semidefinite matrix.

Because of the latter condition the distribution of the random vector Y in­volved is no longer absolutely continuous, but the form of the characteristic function is the same as in the nonsingular case – and that is all that matters. For example, let n = 2 and

*=й -=(0 "0.

where a 2 > 0 but small. The density of the corresponding N2(*, —) distribution

of Y = (Yb Y2)t is

Then lima4...

## The Tobit Model

Let Zj = (Yj, XTj )T, j = 1,…, и be independent random vectors such that

Yj = max(Yj, 0), where Yj = a0 + всТXj + Uj

with Uj |Xj – N(0,o02). (8.16)

The random variables Yj are only observed if they are positive. Note that

P[Yj = 0|Xj] = P [ao + eoTXj + Uj < 0|Xj]

= P [Uj >a0 + вГXj |Xj] = 1 – Ф (ja0 + A0TXj)М>),

X

where Ф(х) = j exp(-u2/2)/V2ndu.

This is a Probit model. Because model (8.16) was proposed by Tobin (1958) and involves a Probit model for the case Yj = 0, it is called the Tobit model. For example, let the sample be a survey of households, where Yj is the amount of money household j spends on tobacco products and Xj is a vector of household characteristics. But there are households in which nobody smokes, and thus for these households Yj = 0.

In this case the setup of ...