A New Dataset of Historical States of Incorporation of U.S. Stocks 1994-2019

Holger Spamann is the Lawrence R. Grove Professor of Law at Harvard Law School and Colby Wilkinson is an Empirical Research Associate at Harvard Law School. Related research from the Program on Corporate Governance includes Firms’ Decisions Where to Incorporate by Lucian Bebchuk and Alma Cohen.

To learn about the effects of (state) corporate law, researchers often compare the performance of firms incorporated in different states. An obvious requirement for such comparisons is to know where firms are incorporated, or more to the point, where they were incorporated at the moment of the comparison (“historical state of incorporation”). Unfortunately, identifying the historical state of incorporation is not reliably possible with standard data sources. In particular, Compustat/CRSP contains the state of incorporation only as a so-called header variable, meaning the variable is part of the stock’s identifying information that constantly updates and hence only reflects the most recent state of incorporation. The problem is that firms may change their state of incorporation. A researcher downloading the data in 2019 could not be sure that a firm listed as incorporated in state X was also incorporated in state X in 1998 or even in 2018. The WRDS SEC Analytics Suite extracts historical state of incorporation data from the SEC’s Edgar system, but it is such a premium product that not even Harvard presently subscribes to it.

To make historical state of incorporation data widely and freely available, we have written our own script to scrape the SEC’s Edgar system, and make this script and its output freely available at https://doi.org/10.7910/DVN/KBPZ5V. Our data can be linked to standard Compustat/CRSP data using the CIK (we provide a CIK-CUSIP crosswalk for stocks lacking CIK information in Compustat/CRSP) and the filing period, which we also extract. Researchers could easily adapt our script to extract other standardized fields in 10-Ks or other SEC filings. Here we give a high-level overview of how we constructed the data.

As is well known, the SEC’s Edgar system makes all required SEC filings since 1994-1996 available online. Among these filings are firms’ annual reports on form 10-K, which contains as one of its mandatory header fields “State or other jurisdiction of incorporation or organization.” Building on the R `edgar’ package, our script extracts this header field, the CIK and the filing period from all 10-Ks in the Edgar system. For most files, identification of the fields is straightforward, but we also include several backup steps to catch data that was entered in non-standard format. We retain only firms incorporated in the fifty U.S. States, the District of Columbia, or Puerto Rico, as well as federally chartered banks.

We spot-checked the data in various ways and believe it is now highly accurate and complete. In particular, we manually checked a random subset of 100 incorporation changes returned by our script and found only one error (which we corrected). We also manually checked all incorporation changes returned by our script that suspiciously seemed to last for only one year; while some of these were real, we corrected the many that were recorded or reported in error. Many of these changes were related to parent-subsidiary filings in 1995, which followed a different error-prone format and for which we created a warning flag. Any manual changes are obviously documented in our code. We welcome bug reports of any kind.

The data set and the code to generate it are available for free at https://doi.org/10.7910/DVN/KBPZ5V.

Both comments and trackbacks are currently closed.

One Comment

  1. Chong Chen
    Posted Saturday, March 7, 2020 at 10:49 pm | Permalink

    Thanks for providing this dataset.
    Basically, I want to know the histical cik of each gvkey(in each fiscal year). I wonder whether you can provide this link(gvkey-historical cik-filing period), except we resort to the The WRDS SEC Analytics Suite.
    Many thanks!