Cleaning Corporate Governance: A New Open-Access Dataset on Firm- and State-Level Corporate Governance

Eric Talley is the Isidor & Seville Sulzbacher Professor of Law at Columbia Law School. This post is based on a recent paper forthcoming in the University of Pennsylvania Law Review, authored by Prof. Talley; Jens Frankenreiter, Postdoctoral Fellow in Empirical Law and Economics at the Ira M. Millstein Center for Global Markets and Corporate Ownership at Columbia Law School; Cathy Hwang, Professor of Law at the University of Virginia School of Law; and Yaron Nili, Assistant Professor of Law at the University of Wisconsin-Madison Law School. Related research from the Program on Corporate Governance includes Learning and the Disappearing Association between Governance and Returns, by Lucian Bebchuk, Alma Cohen, and Charles C.Y. Wang (discussed on the Forum here); and What Matters in Corporate Governance? by Lucian Bebchuk, Alma Cohen, and Allen Ferrell.

In the iconic 1994 Tarantino film Pulp Fiction, Harvey Keitel makes a brief yet memorable appearance as Winston Wolfe (a.k.a., the “Cleaner”). His forte? Tidying up the inconvenient (and usually gruesome) messes perpetrated by others. Wolfe’s modus operandi was never pretty and rarely polite; but it was invariably effective.

Empirical corporate governance needs its own Winston Wolfe. Over the last thirty years, the field has risen in prominence by quantifying what was traditionally thought unquantifiable—text from state laws, federal regulations, and firm-level governance documents—to measure the quality of governance. Canonical studies have shown that countries with strong investor protections are more likely to have higher firm valuations, that more shareholder-friendly firms outperformed more management-friendly ones, and numerous other significant real-world predicted effects of governance on firm performance.

But beneath the field’s orderly veneer lurks an unsettling vulnerability: three decades of finance, economics, and legal studies in corporate governance have been built substantially on data sets with nearly unknown provenance.

In our new paper, Cleaning Corporate Governance, forthcoming in the University of Pennsylvania Law Review, we aim to shine a brighter light on the field and to kick-start the cleaning process. We debut a brand new resource—the CCG database—that allows researchers to investigate, for the first time, the fidelity of foundational corporate governance findings. The database is anchored by a first-of-its-kind, open-source textual corpus representing nearly thirty years of historical charters for companies listed in the S&P 1500—a total of approximately 3,000 companies over time. We hand-label a significant subset of this firm-level data and augment it further with labeled state-level panel data that tracks sixteen statutory governance rules across 50 states (and the District of Columbia).

Although the CCG database has many valuable prospective applications (more on that below), it also is capable of unsettling some of the most beatified results in empirical corporate governance. A core example (which we pursue in the paper) is the much-cited “G-index” first explored in the now-classic paper by Gompers, Ishii & Metric (GIM). Using data from a third-party provider, the G-index aggregates 24 binary corporate governance variables into a single additive index, using it to classify firms along a spectrum from more “dictatorial” (or management-centered) to “democratic” (or shareholder-centered). Deploying that index (for years in the late 1990s), GIM demonstrated that a strategy of systematically investing in democratic issuers (while shorting the dictatorial ones) would have delivered an astounding 9% excess return on a risk-adjusted basis. Our CCG database, however, reveals that the data underlying this finding turned contains significant inaccuracies: We found, for example, that the G-index is inaccurate over 82% of the time, and that the rate of inaccuracy grows worse in the 2000s—even as that database (and results from it) gained increasing attention among academics, regulators, and practitioners. We use the CCG to implement a conservative correction to the underlying G-index, and we show that the relationship between democratic governance and arbitrage returns diminishes significantly with corrected data.

Because this paper meant principally to introduce the CCG database as a new resource, we do not (at present) extend our replication exercise to other well-known results or governance indexes in the literature. In some cases, doing so will require marshaling additional information drawn from corporate bylaws (as opposed to charters), which we are currently adding to the CCG database. To take one notable example, we have yet to assess the persistence of a similar arbitrage result using Bebchuk, Cohen & Ferrell’s similarly influential “E-index” (a subset of the G-index), because its components are much more heavily reliant on bylaw-specific inputs.

All that said, an equally exciting use case for the CCG database is prospective in nature. In addition to allowing researchers to reassess some of the foundational insights of law and finance, CCG also lays the foundation for what we consider to be the next chapter of corporate governance research. Its underlying textual corpus, in particular, is fertile ground for machine learning and computational text analysis research. To illustrate the myriad of new avenues now open for future exploration, we deploy some of those burgeoning methods in our paper and show, among other results, that non-Delaware charters have become longer and less readable over time and that the similarity of charters for firms in certain industries has increased over time.

The CCG database is also useful for researchers who investigate deeper governance questions, such as whether state law matters, how governance evolves during periods of upheaval (such as the Financial Crisis), and whether common ownership of firms by large passive investors lead to anti-competitive behavior. Our database is also unique in that it allows scholars to use the underlying data to build new measures of stakeholder governance, which sets it apart from pre-existing shareholder-focused databases.

Perhaps the most important contribution of the CCG data is that we will make the underlying corpus and all of our labeled data free and open-access as the paper nears publication. By doing so, we hope to right two important wrongs.

  • First, we help solve the problem of access. While the data we collected are theoretically available from state secretaries of state and the Securities and Exchange Commission, gathering data from either source is no walk in the park. We estimate that harvesting the Delaware firms in our sample—constituting about 58% of the total dataset—from the Delaware Secretary of State’s office would cost half a million dollars in fees alone. Searching through the SEC’s online EDGAR database is theoretically free, but frustrating—it is impossible to search for only charters and bylaws, so the process of finding these documents is an exercise is excavation. Commercial databases like Westlaw and Bloomberg are slightly better (even though they also reflect EDGAR’s disorganization), but harvesting data from those sources come with their own obstacles.
  • Second, we surmise that a key reason for two decades’ worth of error propagation in existing data is that lawyers have taken a back seat in assembling and utilizing quantitative data, fearing that we are unqualified for empirical work. In our absence, non-lawyers did the best they could to dispense judgments that required distilling into binary variables overlapping state law, stock exchange listing rules, federal securities laws, and firm-level governance documents. These complicated legal questions require legal training—in fact, clients pay big bucks for lawyers to do just that. Through a careful process (and with many legally-rained research assistants), we have been able to create a database that, we believe, is the cleanest and most accurate available.

One early reviewer of this paper described the existing governance data as “mystery meat,” contrasting our CCG data as “the organic, meet-the-grower stuff from the farmer’s market.” We believe that’s true—and in the coming years, we invite others to help us cultivate this brood of governance data in the open range it was meant for.

The complete paper is available for download here.

Both comments and trackbacks are currently closed.