# Category INTRODUCTION TO STATISTICS AND ECONOMETRICS

## WHAT IS AN ESTIMATOR?

In Chapter 1 we stated that statistics is the science of estimating the probability distribution of a random variable on the basis of repeated observations drawn from the same random variable. If we denote the random variable in question by X, the n repeated observations in mathe­matical terms mean a sequence of n mutually independent random vari­ables X, X2, . . . , Xn, each of which has the same distribution as X. (We say that (X;) are i. i.d.)

For example, suppose we want to estimate the probability (p) of heads

for a given coin. We can define X = 1 if a head appears and = 0 if a tail appears. Then Xt represents the outcome of the zth toss of the same coin. If X is the height of a male Stanford student, X* is the height of the ith student randomly chosen.

We call the basic random var...

## DEFINITION OF BASIC TERMS

Matrix. A matrix, here denoted by a boldface capital letter, is a rectan­gular array of real numbers arranged as follows:

A matrix such as A in (11.1.1), which has n rows and m columns, is called an n X m (read “n by m”) matrix. Matrix A may also be denoted by the symbol {fly}, indicating that its i, jth element (the element in the ith row and jth column) is aly

Transpose. Let A be as in (11.1.1). Then the transpose of A, denoted by A’, is defined as an и X и matrix whose i, jth element is equal to a]r For example,

1 4 ‘

2 5

3 6

Note that the transpose of a matrix is obtained by rewriting its columns as rows.

Square matrix. A matrix which has the same number of rows and col­umns is called a square matrix. Thus, A in (11.1.1) is a square matrix if

n = m.

Symmetric matrix...

## Cramer-Rao Lower Bound

We shall derive a lower bound to the variance of an unbiased estimator and show that in certain cases the variance of the maximum likelihood estimator attains the lower bound.

THEOREM 7.4.1 (Cramer-Rao) Let L(Xb X2, . .., Xn | 0) be the likeli­hood function and let 0(Xlt X2,. . . , Xn) be an unbiased estimator of 0. Then, under general conditions, we have

(7.4.1) V(0) > ——— ^———

log L

d02

The right-hand side is known as the Cramer-Rao lower bound (CRLB).

(In Section 7.3 the likelihood function was always evaluated at the observed values of the sample, because there we were only concerned with the definition and computation of the maximum likelihood estimator...

## Heteroscedasticity

In the classical regression model it is assumed that the variance of the error term is constant (homoscedastic). Here we relax this assumption and specify more generally that

0^3.1.12) Vut = пі t = 1, 2, . . . , T.

This assumption of nonconstant variances is called heteroscedasticity. The other assumptions remain the same. If the variances are known, this model is a special case of the model discussed in Section 13.1.1. In the present case, X is a diagonal matrix whose tth diagonal element is equal to erf. The GLS estimator in this case is given a special name, the weighted least squares estimator.

If the variances are unknown, we must specify them as depending on a finite number of parameters. There are two main methods of parameteri­zation.

In the first method, the variances are assum...

## BIVARIATE REGRESSION MODEL

10.1 INTRODUCTION

In Chapters 1 through 9 we studied statistical inference about the distri­bution of a single random variable on the basis of independent observa­tions on the variable. Let {Xt}, t = 1, 2, . . . , T, be a sequence of inde­pendent random variables with the same distribution F. Thus far we have considered statistical inference about F based on the observed values {xt} of {X,}.

In Chapters 10, 12, and 13 we shall study statistical inference about the relationship among more than one random variable. In the present chap­ter we shall consider the relationship between two random variables, x and y...

## APPENDIX: DISTRIBUTION THEORY

 DEFINITION 1 (Chi-square Distribution) Let {ZJ, і = 1, 2, . . . , n, be i. i.d. as N(0, 1). Then the distribution of X”=1Z2 is called the chi-square 9 distribution, with n degrees of freedom and denoted by Xn • 2 2 THEOREM 1 IfX~xn and T ~ Xm and if X and Y are independent, then X + Y ~ xl+m ■ THEOREM 2 If X ~ xl > then EX = n and VX = 2n. THEOREM 3 Let {X,} be i. i.d. as iV(|a, cr2), і = 1, 2, . . . , n. Define Xn = n"1 SjLiXj. Then n X № – *n)2 i= 1 2 2 Xn—1 * CT

 Proof. Define Z* = (X* — |x)/a. Then Z,- ~ N(0, 1) and

But since (Z — Z2)/V2 ~ N{0, 1), the right-hand side of (2) is Xi by Definition 1. Therefore, the theorem is true for n = 2. Second, assume it is true for n and consider n + 1. We have

П+1 n

(3) X (Z* – Zn+l)2 = X (Zi – Zn...