How Much Do We Trust Staggered Difference-in-Differences Estimates?

David F. Larcker is the James Irvin Miller Professor of Accounting at Stanford Graduate School of Business; Charles C.Y. Wang is the Glenn and Mary Jane Creamer Associate Professor of Business Administration at Harvard Business School; and Andrew Baker is a J.D. candidate at Stanford Law School. This post is based on their recent paper. Related research from the Program on Corporate Governance includes Short-Termism and Capital Flows by Jesse Fried and Charles C. Y. Wang (discussed on the Forum here).

Difference-in-differences (DiD) has been the workhorse statistical methodology for analyzing regulatory or policy effects in applied finance, law, and accounting research. A generalized version of this estimation approach that relies on the staggered adoption of regulations or policies (e.g., across states or across countries) has become especially popular over the last two decades. For example, from 2000 to 2019, there were 751 papers published in (or accepted for publication by) top tier finance or accounting journals that use DiD designs. Among them, 366 (or 49%) employ a staggered DiD design. Many of the staggered DiD papers address significant questions in corporate governance and financial regulation.

The prevalent use of staggered DiD reflects a common belief among researchers that such designs are more robust and mitigate concerns that contemporaneous trends could confound the treatment effect of interest. However, recent advances in econometric theory suggest that staggered DiD designs often do not provide valid estimates of average treatment effects.

In a paper recently posted on SSRN, we find that staggered DiD designs often can, and have, resulted in misleading inferences in the literature. We also show that applying robust DiD alternatives can significantly alter inferences in important papers in corporate governance and financial regulation.

We begin by providing an overview of the recent work in econometrics that explain why treatment effects estimated from a staggered DiD design not easily interpretable. In general, such designs produce estimates of variance-weighted averages of many different treatment effects. Importantly, staggered DiD estimates can obtain the opposite sign compared to the true average treatment effect, even when the researcher is able to randomize treatment assignment. The intuition is that in the standard staggered DiD approach, already-treated units can act as effective controls, and changes in their outcomes over time are subtracted from the changes of later-treated units (the treated).

To demonstrate the situations under which these problems can arise, we design a series of simulation analyses that produce three main insights. First, DiD estimates are unbiased in settings where there is a single treatment period, even when there are dynamic treatment effects. Second, staggered DiD estimates are also unbiased in settings with staggered timing of treatment assignment and no treatment effect heterogeneity across firms or over time. Finally, when research settings combine staggered timing of treatment effects and treatment effect heterogeneity across firms or over time, staggered DiD estimates are likely to be biased, and can be of the opposite sign compared to the true average treatment effects.

We describe three alternative estimators for modifying the standard staggered DiD designs. While the econometrics literature has not settled on a standard alternative approach, the proposed solutions all deal with the bias issues inherent in these design by estimating event-study DiD specifications, and modifying the set of units that effectively act as controls in the treatment effect estimation process. In each case, the alternative estimation strategy ensures that firms receiving treatment are not compared to firms that already received treatment in recent past. However, the methods differ in terms of which observations are used as effective controls and how covariates are incorporated in the analysis. Using our simulated dataset, we show that each of these alternative estimators can help to recover the true treatment effect.

Finally, we assess the extent to which these problems matter in applied research by applying the alternative DiD estimators to prior published results. We replicate and analyze the findings of three important papers in financial regulation and corporate governance. In each paper, we find that the published DiD estimates are susceptible to the biases created by treatment effect heterogeneity. Once correcting for the use of prior treated units as effective controls, the evidence often no longer supports the original paper’s findings.

We detail here the analyses of one of these papers, Wang, Yin, and Yu (2021) (“WYY”), which provides empirical evidence on a contemporary corporate governance and policy debate that has stimulated many posts on this forum: the effect of repurchases on corporate investment. WYY’s research design relies on the staggered legalization of stock repurchases across countries to study the effects of such repurchases on firm outcomes. Perhaps the most central result in the paper is the finding that stock repurchases led to significant declines in firm investments, in terms of both capital expenditures and research and development.  Unfortunately, these results are sensitive to the flaws of staggered DiD designs.

We show that, after applying various alternative DiD estimators that correct for the use of prior treated firms as comparison units in staggered DiD designs, the empirical evidence does not support the conclusion that the legalization of open market repurchases significantly lowered repurchasing firms’ investing behavior. Instead, our analysis of the data suggests that repurchasing firms did not exhibit any differences in their investment behavior. As shown through this example, the methodological concerns about staggered DiD designs can yield dramatically different answers and, potentially, policy conclusions.

To summarize, our analyses suggest that corporate governance researchers should interpret the treatment effects from staggered DiD studies with caution, particularly applications in research contexts where the treatmen effect could be expected to evolve over time. We conclude the paper by discussing features of the data structure used in empirical finance and accounting studies that make the use of staggered DiD designs particularly problematic, and propose a framework for conducting generalized DiD studies in a robust and structured manner.

The complete paper is available here.

Both comments and trackbacks are currently closed.