# Masters of ’Metrics: The Remarkable Wrights

The IV method was invented by economist Philip G. Wright, assisted by his son, Sewall, a geneticist. Philip wrote frequently about agricultural markets. In 1928, he published The Tariff on Animal and Vegetable Oils.— Most of this book is concerned with the question of whether the steep tariffs on farm products imposed in the early 1920s benefited domestic producers. A 1929 reviewer noted that “Whatever the practical value of the intricate computation of elasticity of demand and supply as applied particularly to butter in this

chapter, the discussion has high theoretical value.”—

In competitive markets, shifting supply and demand curves simultaneously generate equilibrium prices and quantities. The path from these observed equilibrium prices and quantities to the underlying supply and demand curves that generate them is unclear. The challenge of how to derive supply and demand elasticities from the observed relationship between prices and quantities is called an identification problem. At the time Philip was writing, econometric identification was poorly understood. Economists knew for sure only that the observed relationship between price and quantity fails to capture either supply or demand, and is somehow determined by both.

Appendix В of The Tariff on Animal and Vegetable Oils begins with an elegant statement of the identification problem in simultaneous equations models. The appendix then goes on to explain how variables present in one equation but excluded from another solve the identification problem. Philip referred to such excluded variables as “external factors,” because, by shifting the equation in which they appear, they trace out the equation from which they’re omitted (that is, to which they are external). Today we call such shifters instruments. Philip derived and then used IV to estimate supply and demand curves in markets for butter and flaxseed (flaxseed is used to make linseed oil, an ingredient in paint). Philip’s analysis of the flaxseed market uses prices of substitutes as demand shifters, while farm yields per acre, mostly driven by weather conditions, shift supply.

Appendix В was a major breakthrough in ’metrics thought, remarkable and unexpected, so much so that some have wondered whether Philip really wrote it. Perhaps Appendix В was written by Sewall, a distinguished scholar in his own right. Like ’metrics masters Galton and Fisher, profiled at the end of Chapters 1 and 2, Sewall was a geneticist and statistician. Well before the appearance of Appendix B, Sewall had developed a statistical method called “path analysis” that was meant to solve problems related to omitted variables bias. Today we recognize path analysis as an application of the multivariate regression methods discussed in Chapter 2: it doesn’t solve the identification problem raised by simultaneous equations models. Some of Appendix В references Sewall’s idea of “path coefficients,” but Philip’s method of external factors was entirely new.

Masters James Stock and Francesco Trebbi investigated the case for Sewall’s authorship using Stylometrics.— Stylometrics identifies authors by the statistical regularities in their word usage and sentence structure. Stylometrics confirms Philip’s authorship of Appendix B. Recently, however, Stock and his student Kerry Clark uncovered letters between father and son that show the ideas in Appendix В developing jointly in a self-effacing give and take. In this exchange, Philip describes the power and simplicity of IV. But he wasn’t naive about the ease with which the method could be applied. In a March 1926 letter to Sewall, writing on the prospect of finding external factors, Philip commented: “Such factors, I fear, especially in the case of demand conditions, are not easy to find.”— The search for identification has not gotten easier in the intervening decades.

Philip’s journey was personal as well as intellectual. He worked for many years as a teacher at obscure Lombard College in Galesburg, Illinois. Lombard College failed to survive the Great Depression, but Philip’s time there bore impressive fruit. At Lombard, he mentored young Carl Sandburg, whose loosely structured and evocative poetry later made him an American icon. Here’s Sandburg’s description of the path blazed by experience:—

THIS morning I looked at the map of the day

And said to myself, “This is the way! This is the way I will go;

Thus shall I range on the roads of achievement,

The way is so clear—it shall all be a joy on the lines marked out.”

And then as I went came a place that was strange,—

’Twas a place not down on the map!

And I stumbled and fell and lay in the weeds,

And looked on the day with rue.

I am learning a little—never to be sure—

To be positive only with what is past,

And to peer sometimes at the things to come

As a wanderer treading the night

When the mazy stars neither point nor beckon,

I see those men with maps and talk Who tell how to go and where and why;

I hear with my ears the words of their mouths,

As they finger with ease the marks on the maps;

And only as one looks robust, lonely, and querulous,

As if he had gone to a country far And made for himself a map,

Do I cry to him, “I would see your map!

I would heed that map you have!”

Appendix: IV Theory

IV, LATE, and 2SLS

We first refresh notation for an IV setup with one instrument and no covariates. The first stage links instrument and treatment:

D; = 4- tfrZj + dfj*

The reduced form links instrument and outcomes:

К + р%і + eou

The 2SLS second stage is the regression of outcomes on first-stage fitted values:

Yi — &2 + A*Df + Є2І –

Note that the LATE formula (3.2) can be written in terms of first-stage and reduced-form regression coefficients as

; Cq^ZjVWZi) = CjY^Zj)

ф C{Di, Zi)fV{Zi) C(i>f, Zp‘

Here, we’ve used the fact that the differences in means on the top and bottom of equation ІЗ.21 are the same as the regression coefficients, ф and p. Written this way, that is, as a ratio of covariances, A is called the IV formula. It’s sample analogue is the IV estimator.

In this simple setup, the regression of Yt on Щ (the 2SLS second step) is the same as equation (3.121. This is apparent once we write out the 2SLS second stage:

X _ C(Yitax + \$Zi)

m V(Di)

ф<ж, zp _ p _ k

WM ф

In deriving this, we’ve used the rules for variances and covariances detailed in the appendix to Chapter 2.

With covariates included in the first and second stage—say, the variable Af, as in our investigation of the population bomb—the 2SLS second stage is equation (3.91. Here, too, 2SLS and the IV formula are equivalent, with the latter again given by the ratio of reduced-form to first-stage coefficients. In this case, these coefficients are estimated with Af included, as in equations (3.71 and ІЗ.81:

p C(rf, Zf)/y<7,-) Ф C(Dit ZiVViZi)

where z,; is the residual from a regression of Z;- on A;- (this we know from regression anatomy). The details behind the second equals sign are left for you to fill in.

2SLS Standard Errors

Just as with sample means and regression estimates, we expect IV and 2SLS estimates to vary from one sample to another. We must gauge the extent of sampling variability in any particular set of estimates as we decide whether they’re meaningful. The sampling variance of 2SLS estimates is quantified by the appropriate standard errors.

2SLS standard errors for a model that uses Z; to instrument D;, while controlling for A{, are computed as follows. First the 2SLS residual is constructed using

tfi = Yi ~ al ~ A25L5Di ~ YlAi-

The standard error for *2sls is then given by

s*a2SLs)=^*-f, [3.13)

V» af>

where ац is the standard deviation of q,-, and °r> is the standard deviation of the first-stage fitted values, A — yAL

It’s important to note that q;- is not the residual generated by manual estimation of the 2SLS second stage, equation (3.9). This incorrect residual is

e2i = Yi ~~ <*Z – *251.5 A ” УгА і ■

The variance of e2;- plays no role in equation 13.131. so a manual 2SLS second stage

generates incorrect standard errors. The moral is clear: explore freely in the privacy of your own computer, but when it comes to the estimates and standard errors you plan to report in public, let professional software do the work.

2SLS Bias

IV is a powerful and flexible tool, but masters use their most powerful tools wisely. As we’ve seen, 2SLS combines multiple instruments in an effort to generate precise estimates of a single causal effect. Typically, a researcher blessed with many instruments knows that some produce a stronger first stage than others. The temptation is to use them all anyway (econometrics software doesn’t charge more for this). The risk here is that 2SLS estimates with many weak instruments can be misleading. A weak instrument is one that isn’t highly correlated with the regressor being instrumented, so the first-stage coefficient associated with this instrument is small or imprecisely estimated. 2SLS estimates with many such instruments tend to be similar to OLS estimates of the same model. When 2SLS is close to OLS, it’s natural to conclude you needn’t worry about selection bias in the latter, but this conclusion may be unwarranted. Because of finite sample bias, 2SLS estimates in a many – weak IV scenario tell you little about the causal relationship of interest.

When is finite sample bias worth worrying about? Masters often focus on the first-stage F-statistic testing the joint hypothesis that all first-stage coefficients in a many-instrument setup are zero (an F-statistic extends the t-statistic to tests of multiple hypotheses at once). A popular rule of thumb requires an F value of at least 10 to put many-weak fears to rest. An alternative to 2SLS, called the limited information maximum likelihood estimator (LIML for short) is less affected by finite sample bias. You’d like LIML estimates and 2SLS estimates to be close to one another, since the former are unlikely to be biased even with many weak instruments (though LIML estimates typically have larger standard errors than do the corresponding 2SLS estimates).

The many-weak instruments problem loses its sting when you use a single instrument to estimate a single causal effect. Estimates of the quantity-quality trade-off using either a single dummy for multiple births or a single dummy for same-sex sibships as an instrument for family size are therefore unlikely to be plagued by finite sample bias. Such estimates appear in columns (2) and (3) of Table 3.5. Finally, reduced-form estimates are always worth a careful look, since these are OLS estimates, unaffected by finite sample bias. Reduced-form estimates that are small and not significantly different from zero provide a strong and unbiased hint that the causal relationship of interest is weak or nonexistent as well, at least in the data at hand (multiple reduced-form coefficients are also tested together using an F-test). We always tell our students: If you can’t see it in the reduced form, it ain’t there.

1 Jay Mathews’ book, Work Hard. Be Nice, Algonquin Books, 2009, details the history of KIPP. In 2012, Teach for America was the largest single employer of graduating seniors on 55 American college campuses, ranging from Arizona State to Yale.

– Martin Carnoy, Rebecca Jacobsen, Lawrence Mishel, and Richard Rothstein, The Charter School Dust-Up: Examining Evidence on Student Achievement, Economic Policy Institute Press, 2005, p. 58.

– Joshua D. Angrist et al., “Inputs and Impacts in Charter Schools: KIPP Lynn,” American Economic Review Papers and Proceedings, vol. 100, no. 2, May 2010, pages 239-243, and Joshua D. Angrist et al., “Who Benefits from KIPP?” Journal of Policy Analysis and Management, vol. 31, no. 4, Fall 2012, pages 837-860.

– As noted in Chapter 1. attrition (missing data) is a concern even in randomized trials. The key to the integrity of a randomized design with missing data is an equal probability that data are missing in treatment and control groups. In the KIPP sample used to construct Table 3,1. winners and losers are indeed about equally likely to have complete data.

– Section 3,3 details the role of covariates in IV estimation.

– This theorem comes from Guido W. Imbens and Joshua D. Angrist, “Identification and Estimation of Local Average Treatment Effects,” Econometrica, vol. 62, no. 2, March 1994, pages 467-475. The distinction between compilers, always-takers, and never-takers is detailed in Joshua D. Angrist, Guido W. Imbens, and Donald B. Rubin, “Identification of Causal Effects Using Instrumental Variables,” Journal of the American Statistical Association, vol. 91, no. 434, June 1996, pages 444^155.

1 Simpson was acquitted of murder in a criminal trial but was held responsible for the deaths in a civil trial. He later authored a book titled If I Did It: Confessions of the Killer, Beaufort Books, 2007. Our account of repeated police visits to Simpson’s home is based on Sara Rimer, “The Simpson Case: The Marriage; Handling of 1989 Wife-Beating Case Was a ‘Terrible Joke,’ Prosecutor Says,” The New York Times, June 18, 1994.

s The original analysis of the MDVE appears in Lawrence W. Sherman and Richard A. Berk, “The Specific Deterrent Effects of Arrest for Domestic Assault,” American Sociological Review, vol. 49, no. 2, April 1984, pages 261-272.

– Our IV analysis of the MDVE is based on Joshua D. Angrist, “Instrumental Variables Methods in Experimental Criminological Research: What, Why and How,” Journal of Experimental Criminology, vol. 2, no. 1, April 2006, pages 23^14.

– This theoretical result originates with Howard S. Bloom, “Accounting for No-Shows in Experimental Evaluation Designs,” Evaluation Review, vol. 8, no. 2, April 1984, pages 225-246. The LATE interpretation of the Bloom result appears in Imbens and Angrist, “Identification and Estimation,” Econometrica, 1994. See also Section 4.4.3 in Joshua D. Angrist and Jorn-Steffen Pischke, Mostly Harmless Econometrics: An Empiricist’s Companion, Princeton University Press, 2009. An example from our field of labor economics is the Job Training Partnership Act (JTPA). The JTPA experiment randomly assigned the opportunity to participate in a federally funded job-training program. About 60% of those offered training received JTPA services, but no controls got JTPA training. An IV analysis of the JTPA using treatment assigned as an instrument for treatment delivered captures the effect of training on trainees. For details, see Larry L. Orr et al., Does Training for the Disadvantaged Work? Evidence from the National JTPA Study, Urban Institute Press, 1996.

– See David Lam, “How the World Survived the Population Bomb: Lessons from 50 Years of Extraordinary Demographic History,” Demography, vol. 48, no. 4, November 2011, pages 1231-1262, and Wolfgang Lutz, Warren Sanderson, and Sergei Scherbov, “The End of World Population Growth,” Nature, vol. 412, no. 6846, August 2, 2001, pages 543-545.

– Just how much Indian living standards have risen is debated. Still, scholars generally agree that conditions have improved dramatically since 1970 (see, for example, Angus Deaton, The Great Escape: Elealth, Wealth, and the Origins of Inequality, Princeton University Press, 2013).

– Gary S. Becker and H. Gregg Lewis, “On the Interaction between the Quantity and Quality of Children,” Journal of Political Economy, vol. 81, no. 2, part 2, March/April 1973, pages S279-288, and Gary S. Becker and Nigel Tomes, “Child Endowments and the Quantity and Quality of Children,” Journal of Political Economy, vol. 84, no. 4, part 2, August 1976, pages S143-S162.

M John Bongaarts, “The Impact of Population Policies: Comment,” Population and Development Review, vol. 20, no. 3, September 1994, pages 616-620.

– You might think this is tme only of societies with access to modern contraceptive methods, such as the pill or the penny (held between the knees as needed). But demographers have shown that even without access to modern contraceptives, potential parents exert a remarkable degree of fertility control. For example, in an extensive body of work, Ansley Coale documented the dramatic decline in marital fertility in nineteenth – and twentieth-century Europe (see http://opr. princeton. edu/archive/pefp/i. This pattern, since repeated in most of the world, is called the demographic transition.

– Mark R. Rosenzweig and Kenneth I. Wolpin, “Testing the Quantity-Quality Fertility Model: The Use of Twins as a Natural Experiment,” Econometrica, vol. 48, no. 1, January 1980, pages 227-240.

– Joshua D. Angrist, Victor Lavy, and Analia Schlosser, “Multiple Experiments for the Causal Link between the Quantity and Quality of Children,” Journal of Labor Economics, vol. 28, no. 4, October 2010, pages 773-824.

– In more recent samples, twins instruments are also compromised by the proliferation of in vitro fertilization, a treatment for infertility. Mothers who turn to in vitro fertilization, which increases twin birth rates sharply, tend to be older and more educated than other mothers.

– Joshua D. Angrist and William Evans, “Children and Their Parents’ Labor Supply: Evidence from Exogenous Variation in Family Size,” American Economic Review, vol. 88, no. 3, June 1998, pages 450-477.

– We’ve seen a version of IV with covariates already. The KIPP offer effects reported in column (3) of Table 3,1 come from regression models for the first stage and reduced form that include covariates in the form of dummies for application risk sets.

-1 Alert readers will have noticed that the treatment variable here, family size, is not a dummy variable like KIPP enrollment, but rather an ordered treatment that counts children. You might wonder whether it’s OK to describe 2SLS estimates of the effects of variables like family size as LATE. Although the details differ, 2SLS estimates can still be said to capture average causal effects on compilers in this context. The extension of LATE to ordered treatments is developed in Joshua D. Angrist and Guido W. Imbens, “Two Stage Least Squares Estimation of Average Causal Effects in Models with Variable Treatment Intensity,” Journal of the American Statistical Association, vol. 90, no. 430, June 1995, pages 431-442. Along the same lines, 2SLS easily accommodates instruments that aren’t dummies. We’ll see an example of this in Chapter 6.

– In addition to the male dummy, other covariates include indicators for census year, parents’ ethnicity, age, missing month of birth, mother’s age, mother’s age at first birth, and mother’s age at immigration (where relevant). See the Empirical Notes section for details.

– Specifically, the regression estimate of -.145 lies outside the multi-instrument 2SLS confidence interval of.237 ± (2 x.128) = [-.02, .49]. You can, in some cases, have too many instruments, especially if they have little explanatory power in the first stage. The chapter appendix elaborates on this point.

– Philip G. Wright, The Tariff on Animal and Vegetable Oils, Macmillan Company, 1928.

– G. O. Virtue, “The Tariff on Animal and Vegetable Oils by Philip G. Wright,” American Economic Review, vol. 19,

no. 1, March 1929, pages 152-156. The quote is from page 155.

– James H. Stock and Francesco Trebbi, “Who Invented Instrumental Variables Regression?” Journal of Economic Perspectives, vol. 17, no. 3, Summer 2003, pages 177-194.

– This quote and the one in the sketch are from from unpublished letters, uncovered by James H. Stock and Kerry Clark. See “Philip Wright, the Identification Problem in Econometrics, and Its Solution,” presented at the Tufts University Department of Economics Special Event in honor of Philip Green Wright, October 2011 f http://ase. tu fts. edu/econ/news/documents/wrightPhilipAndSewa 11 .pdf’). and Kerry Clark’s 2012 Harvard senior thesis, “The Invention and Reinvention of Instrumental Variables Regression.”

– “Experience.” From In Reckless Ecstasy, Asgard Press, 1904, edited and with a foreword by Philip Green Wright.

Chapter 4