Rewriting History II: The (Un)Predictable Past of ESG Ratings

Florian Berg is a Postdoctoral Associate in Economics, Finance and Accounting at MIT Sloan School of Management; Kornelia Fabisik is Assistant Professor of Finance at Frankfurt School of Finance & Management, and Zacharias Sautner is Professor of Finance at Frankfurt School of Finance & Management. This post is based on their recent paper. Related research from the Program on Corporate Governance includes The Illusory Promise of Stakeholder Governance by Lucian A. Bebchuk and Roberto Tallarita (discussed on the Forum here); Reconciling Fiduciary Duty and Social Conscience: The Law and Economics of ESG Investing by a Trustee by Max M. Schanzenbach and Robert H. Sitkoff (discussed on the Forum here); and Companies Should Maximize Shareholder Welfare Not Market Value by Oliver Hart and Luigi Zingales (discussed on the Forum here).

Importance of ESG Ratings

Research on environmental, social, and corporate governance (ESG) topics has exploded over the last years. The surge in academic work mirrors the massive rise in the importance of ESG principles in the investment management industry. For example, funds that invest according to ESG principles attracted net inflows of $71.1bn globally between April and June 2020, despite the Covid-19 crisis, pushing assets under management in these funds to an all-time high of over $1tn.

A key challenge for researchers and investment professionals lies in the measurement of a firm’s “ESG quality,”’ that is, in quantifying how well a firm performs with respect to ESG criteria. To address this challenge, most empirical ESG analyses have resorted to ESG scores (or ratings) constructed by professional data providers. The growing usage of these vendors’ ESG scores has raised questions by policymakers, investors, researchers, and firms about their reliability, consistency, and overall quality.

Refinitiv ESG Downloads

In a new paper we document widespread changes to the historical ESG scores of Thomson Reuters Refinitiv ESG (“Refinitiv ESG” henceforth). We further show that the rewriting of these scores has important implications for analyses linking ESG scores to outcome variables such as firm performance or stock returns. The ESG scores constructed by Refinitiv ESG, formerly known as ASSET4, are influential. Refinitiv ESG is a key ESG rating provider and ESG scores by Refinitiv ESG have been used (or referenced) in more than 1,000 academic articles over the past 15 years (see Figure 1). Moreover, Refinitiv ESG data are used by many major asset managers to manage ESG investment risks.

Figure 1

To document the rewriting of the ESG scores, we downloaded at different points in time two versions of the same Refinitiv ESG data for the same set of firm-years. We downloaded the first (“initial”) version of the data in September 2018, and the second (“rewritten”) version two years later in September 2020. The scores that we downloaded include an overall ESG score, as well as E, S, and G subscores. The sample contains 29,828 firm-year observations between 2011 and 2017 from 72 countries.

Divergences between Data Downloads

After inspecting the two downloads, we observed that the ESG scores for identical firm-years differed between the two data versions – in some cases dramatically. In fact, not a single ESG score was the same across the two versions. Thirteen percent (13%) of the sample observations were subject to a score “upgrade,” that is, the rewritten ESG score was higher than the initial ESG score. Even more remarkably, 87% of the observations were subject to a score downgrade. The data rewriting is also large economically.

The differences between the two data versions raise the question of why and how the scores were changed by Refinitiv ESG. According to information by Refinitiv ESG, the score deviations originate from adjustments in its scoring methodology. This scoring adjustment came into effect on April 6, 2020, that is, between our two data downloads. Importantly, Refinitiv ESG applied the methodology change not just to newly created scores, but it also retrospectively modified the historical scores in its database.

As we do not have access to Refinitiv ESG’s methodology to understand and verify these changes, we use statistical methods to infer the role of different economic variables in explaining the score deviations. We demonstrate that the ex-post score changes are systematic and partially driven by reassessments of industry- and country-level drivers of ESG performance (or risks). Substantial parts of the score rewriting also play out at the individual firm level. These firm level effects can partially be explained by time-varying firm characteristics. Overall, we show that large parts of the score deviations originate from ex-post reassessments of the ESG performance of specific firms in specific years.

Implications for the Relation between ESG Ratings and Performance

We then turn to the question of whether the deviations in ESG scores have implications for the estimation and interpretation of the relationship between ESG scores and outcome variables. We focus this analysis on S&P 1500 firms. We first demonstrate that the ESG score deviations strongly affect ESG-based ranking of S&P 1500 firms. This in turn affects the classification of firms into different ESG quantiles. For the overall ESG score, only 68.5% of firm-year observations are classified into the top decile (top 10%) in the initial and rewritten data versions; numbers are similar for the bottom decile. The overlap is only slightly larger if we look at extreme quartiles or terciles. We find similar patterns for the classification of firms based on their E, S, and G subscores. Hence, the retrospective score rewriting leads to large changes in what are deemed to be high- or low-ESG firms. This insight is important as the classification of firms into quantiles based on ESG scores (or their subscores) is widely used in ESG research and in the investment industry.

We use the recent Covid-19 crisis as a setting to explore the effects of these classification changes. We thereby build on prior work showing that firms with higher E&S ratings prior to the crisis exhibited better stock market performance during the pandemic. We classify firms as “high-E&S firms” if they are ranked in the top quartile of the S&P 1500 sample based on the average value of their E&S scores. Our tests then compare daily abnormal returns of high- and low-E&S firms before versus after a Covid-19 event date (February 24, 2020).

Our results are remarkable. When classifying firms based on the initial E&S scores, we find no evidence that high-E&S firms performed better during the Covid-19 pandemic. This picture looks entirely different if we run regressions using a classification of firms based on the rewritten data. We now find strong evidence that high-E&S firms exhibited better performance during the pandemic. Not only is the statistical significance in these regressions much elevated, but we also observe that the coefficient estimates scale up by a factor between three and ten, depending on the specification.

Implications for Researchers and Investors

The large differences in results that we document have economic implications. Retrospectively, one would attribute a positive performance effect to high-E&S firms if one were to classify firms based on the rewritten data. However, this performance would not have been achievable with the data available to investors at the onset of (or before) the pandemic. At this point in time, investors would have classified firms differently into high- and low-E&S groups, and the performance differences between these two sets of firms would not have been economically and statistically different. Hence, the benefits of being a high-E&S firms during the crisis would have been exaggerated. The implications of this observation extend beyond our setting. They apply more broadly for the backtesting of ESG strategies, as also for such tests it is critical to verify that the original, not the rewritten, scores are being used. Of course, our insights are also critical for future ESG research using Refinitiv ESG data. A recommendation that follows from our analysis is that researchers using these data should verify whether the initial, originally-available data are needed to test their hypotheses. This consideration is important in light of the expected (continued) growth in ESG research.

The complete paper is available for download here.

Both comments and trackbacks are currently closed.