# INTRODUCTION TO STATISTICS AND ECONOMETRICS

Although there are many textbooks on statistics, they usually contain only a cursory discussion of regression analysis and seldom cover various gen­eralizations of the classical regression model important in econometrics and other social science applications. Moreover, in most of these textbooks the selection of topics is far from ideal from an econometrician’s point of view. At the same time, there are many textbooks on econometrics, but either they do not include statistics proper, or they give it a superficial treatment. The present book is aimed at filling that gap.

Chapters 1 through 9 cover probability and statistics and can be taught in a semester course for advanced undergraduates or first-year graduate students. My own course on this material has been taken by both under­graduate and graduate students in economics, statistics, and other social science disciplines. The prerequisites are one year of calculus and an ability to think mathematically.

In these chapters I emphasize certain topics which are important in econometrics but which are often overlooked by statistics textbooks at this level. Examples are best prediction and best linear prediction, conditional density of the form f(x x < y), the joint distribution of a continuous and a discrete random variable, large sample theory, and the properties of the maximum likelihood estimator. I discuss these topics without undue use of mathematics and with many illustrative examples and diagrams. In addition, many exercises are given at the end of each chapter (except Chapters 1 and 13). I devote a lot of space to these and other fundamental concepts because I believe that it is far better for a student to have a solid knowledge of the basic facts about random variables than to have a su­perficial knowledge of the latest techniques.

I also believe that students should be trained to question the validity and reasonableness of conventional statistical techniques. Therefore, I give a thorough analysis of the problem of choosing estimators, including a comparison of various criteria for ranking estimators. I also present a critical evaluation of the classical method of hypothesis testing, especially in the realistic case of testing a composite null against a composite alter­native. In discussing these issues as well as other problematic areas of classical statistics, I frequendy have recourse to Bayesian statistics. I do so not because I believe it is superior (in fact, this book is written mainly from the classical point of view) but because it provides a pedagogically useful framework for consideration of many fundamental issues in statis­tical inference.

Chapter 10 presents the bivariate classical regression model in the conventional summation notation. Chapter 11 is a brief introduction to matrix analysis. By studying it in earnest, the reader should be able to understand Chapters 12 and 13 as well as the brief sections in Chapters 5 and 9 that use matrix notation. Chapter 12 gives the multiple classical regression model in matrix notation. In Chapters 10 and 12 the concepts and the methods studied in Chapters 1 through 9 in the framework of the i. i.d. (independent and identically distributed) sample are extended to the regression model. Finally, in Chapter 13, I discuss various generalizations of the classical regression model (Sections 13.1 through 13.4) and certain other statistical models extensively used in econometrics and other social science applications (13.5 through 13.7). The first part of the chapter is a quick overview of the topics. The second part, which discusses qualitative response models, censored and truncated regression models, and dura­tion models, is a more extensive introduction to these important subjects.

Chapters 10 through 13 can be taught in the semester after the semester that covers Chapters 1 through 9. Under this plan, the material in Sections 13.1 through 13.4 needs to be supplemented by additional readings. Alternatively, for students with less background, Chapters 1 through 12 may be taught in a year, and Chapter 13 studied independently. At Stan­ford about half of the students who finish a year-long course in statistics and econometrics go on to take a year’s course in advanced econometrics, for which I use my Advanced Econometrics (Harvard University Press, 1985).

It is expected that those who complete the present textbook will be able to understand my advanced textbook.

I am grateful to Gene Savin, Peter Robinson, and James Powell, who read all or part of the manuscript and gave me valuable comments. I am also indebted to my students Fumihiro Goto and Dongseok Kim for care­fully checking the entire manuscript for typographical and more substan­tial errors. I alone, however, take responsibility for the remaining errors. Dongseok Kim also prepared all the figures in the book. I also thank Michael Aronson, general editor at Harvard University Press, for constant encouragement and guidance, and Elizabeth Gretz and Vivian Wheeler for carefully checking the manuscript and suggesting numerous stylistic changes that considerably enhanced its readability.

I dedicate this book to my wife, Yoshiko, who for over twenty years has made a steadfast effort to bridge the gap between two cultures. Her work, though perhaps not conspicuous in the short run, will, I am sure, have a long-lasting effect.