Structures Behind Numbers: Critically Examining the “Credibility Revolution” and “Evidence-based Policy”

Vinayak Krishnan (vinayak1994@gmail.com) is a PhD research scholar at the University of Sussex.
20 October 2023

 

Statistical analysis has become a mainstay of contemporary social science research. This is particularly salient in the discipline of economics. While the mid-20th century saw theoretical work as the most fruitful form of research, shifts from the 1980s onward saw far greater attention being paid to empirical techniques and applying the results of economics research to real-world policy problems. Most economists have welcomed this shift as a “Credibility Revolution” that has facilitated a more scientific and objective analysis of socio-economic phenomenon. Moreover, much of this research is increasingly being used to analyse and affect policy change as well, giving rise to the much-vaunted practice of evidence-based policy This article examines the historical antecedents of both these phenomena and the empirical foundations that form their bases. Further, it critically examines the premise of “scientific objectivity” that these methods of research promise. It seeks to argue that while statistical knowledge provides a range of insights to researchers and policymakers, the numbers and indicators that form the basis of this analysis are socially and politically constructed.

 

 

The 2019 Nobel Prize in Economics was awarded to Abhijit Banerjee, Esther Duflot, and Michael Kremer “for their experimental approach to alleviating global poverty” (Royal Swedish Academy of Sciences 2019). Following in quick succession, the 2021 prize was also conferred on scholars who made methodological breakthroughs in the field. One half was awarded to David Card for “empirical contributions to labour economics” and the other to Joshua Angrist and Guido Imbens for “methodological contributions to the analysis of causal relationships” (Royal Swedish Academy of Sciences 2021). Both of these instances represent an interesting moment in understanding the forms of knowledge that are receiving international recognition. While empiricists have been honoured previously (among others, Daniel McFadden and James Heckman received the prize in 2000 for their contributions to econometrics), the economics Nobel Prize has been dominated by scholars who have furthered theoretical understanding in various economic sub-disciplines. In this context, the (almost) consequent cases of Nobel Prizes being granted for methodological innovation mark an interesting historical event and merit further examination. 

The Credibility Revolution and the subsequent rise of Evidence-based Policy  

The phenomenon of awarding methodological innovations in economics research has been closely associated with two important processes that have completely altered the landscape of economics and allied social sciences. The first of these is what has widely come to be known as the “credibility revolution” within the discipline of economics. Coined by economists Joshua Angrist (one of the 2021 Nobel Prize Winners) and Jörn-Steffen Pischke in a paper published in the Journal of Economic Perspectives in 2010, the term has come to define the raison d'etre for much of economics research today. 

Angrist and Pischke (2010) open their paper with a critique made by various well-known economists during the 1980s about the lack of empirical rigour within the field. These scholars had lamented the fact that economists of the time did not pay close to attention to the quality of data and econometric methods while conducting research. The authors then go on to argue that contemporary economics research has effectively remedied this problem, and researchers today pay far greater attention to empirical methods than was seen in earlier decades. This change in approach, with an emphasis on strong research design and the use of scientific techniques, is what is termed the “credibility revolution” in economics.  

A crucial methodological innovation that has facilitated this revolution, according to Angrist and Pischke (2010: 4), is the use of research designs that involve “random assignments.” The foundational idea here is that the economic impact of a particular policy intervention or politico-economic event, known as the “treatment” in the economics literature, cannot be analysed through a simple comparison of those who received it and those who did not. Rather, a causal connection can only be obtained when the treatment is given randomly to separate groups of people. It would be instructive to understand this concept with an example.

Assume that researchers want to find out if a cash transfer programme implemented in a particular country has led to improvements in health outcomes. A plain comparison of the health indicators between groups of people who did and did not receive the cash transfer is not sufficient to determine its causal impact. This is primarily because of what economists refer to as “selection bias.” Cash transfer schemes are normally availed by poor people who have lower incomes (oftentimes there is some threshold income above which individuals are not eligible for the scheme). Since poorer and low-income people are more likely to participate in this scheme, it is possible that they will also possess worse-off health outcomes to start with as compared to those who do not avail the scheme. Even if their health improves dramatically after receiving income support, the average health indicators of this group (termed the “treatment group”) may be less than the average health indicators for the group that did not receive assistance from the programme (termed the “control group”). This is because the control group consists of people with high incomes who may already have better health outcomes, as they are able to access better quality resources. Hence, we are in a situation where  people from lower income groups “select into the treatment” and therefore bias the estimates from a general comparison. 

Therefore, researchers need to remove selection bias to arrive at an accurate assessment of the improvement in health outcomes due to the programme. In order to do this, it is necessary to compare the difference between the health of people who did receive the cash transfer with what their health levels would have been had they not received the cash transfer. This situation, of what the outcome for treated individuals would have been had they not received the treatment, is known as the “counterfactual.” The obvious problem in undertaking such an evaluation is that the counterfactual, by definition, cannot be observed. 

To solve this problem, economists employ the concept of randomisation. Continuing with our above example, the cash transfer is now randomly assigned to two groups of people. Unlike the prior research design, which was handicapped by the fact that people with lower incomes were more likely to avail of the programme (or had to because of some threshold income), in this case the policy treatment is handed out in a random fashion and is independent of any underlying characteristics of the individuals involved. In other words, all individuals, regardless of their existing socio-economic position, are equally likely to receive the treatment. This ensures that the baseline comparison is happening between equivalent groups. Further, the control group in this case acts as a counterfactual; they represent what happens when an equivalent group of people (not a richer set of people as in the original example) does not receive the cash transfer. Hence, a comparison between health outcomes of these two groups would yield an accurate estimate of the causal impact of the cash transfer programme on health. Randomisation thus removes the problem of selection bias. 

This principle of randomised assignment is based heavily on the research in medical science where randomised clinical trials are conducted to estimate the effectiveness of drugs (Deaton and Cartwright 2018). In economics research, randomisation is achieved in two ways. As described in the above example, it could involve an actual experiment where the treatment, which is usually some form of policy intervention, is randomly allocated to different groups and then the effects are studied. This is what is known as randomised controlled trials (RCTs). The winners of the 2019 Nobel Prize were pioneers of this method and have applied RCTs to a wide variety of economic and political questions. 

However, conducting a randomised experiment is an expensive and time-consuming endeavour, which is not feasible for studying all research questions. Therefore, the second method involves exploiting randomness created through a pre-existing policy change or a sudden socio-political event, such as the raising of minimum wages or an unexpected influx of migrants, to create situations where treatment groups and their equivalent counterfactuals can be compared. This allows researchers to estimate the causal impacts of such events on various economic parameters of interest. The latter method is known as “quasi-experiments.” They are similar to randomised experiments but, unlike RCTs, do not involve experiments that have actually been conducted on the field. Instead, they rely on real-life occurrences, which are called “natural experiments,” to create random assignments. There are different forms of quasi-experimental techniques, with the most widely used ones in the economics literature being difference-in-differences, instrumental variables, and regression discontinuity design. Each of these methods utilises different empirical innovations to ensure that the control group represents an actual counterfactual for the treatment group. The three winners of the 2021 Nobel Prize were at the forefront of developing and applying such quasi-experimental research designs to important questions in areas such as labour economics, health, and education. 

The use of randomisation, whether in the form of RCTs or quasi-experimental techniques, has hugely contributed to the “credibility revolution” in economics. While RCTs began only in the late 1990s and the early 2000s, quasi-experimental methods have been utilised extensively since the 1980s in economics research. Moreover, the credibility revolution has now moved beyond economics and sees widespread application in other social science disciplines. Both political science and public policy research, particularly within American academic institutions, are heavily quantitative and frequently employ either RCTs or quasi-experimental methods. 

This research has also transitioned from being purely academic to influencing policy design. Through applying randomisation techniques, the researchers claim that they can rigorously “evaluate” the impact of various policy proposals and help decision-makers choose the most appropriate intervention for a particular problem. This has given rise to the second major phenomenon that has deeply influenced social science and development research over the last few decades: that of “evidence-based policy.” According to this framework, only those policies are to be implemented and scaled for which a statistical impact on a particular socio-economic indicator (or group of indicators) can be clearly demonstrated. The most rigorous way to show this is by conducting a field experiment in the form of an RCT and then implementing the results of such trials through the state’s administrative machinery. A large number of international development agencies (such as the World Bank) and policy consultancies are primarily engaged in this work of evidence-based policy research today. Their clients largely tend to be national or sub-national governments in developing countries of South Asia or Africa. Various governments, including the central and state governments in India, have become increasingly receptive to these forms of research and include statistical inputs from such organisations in their policy process. Here is an example of J-PAL South Asia, one of the most well-known development research organisations in the world, being engaged by the Government of Tamil Nadu “to institutionalise the use of evidence in its policy decisions.” 

A Deeper Look at “Evidence” and the Statistical Analysis of Society  

The credibility revolution, as discussed by Angrist and Pischke (2010), has had an incredible impact on economics and allied disciplines. Researchers now pay immense attention to data quality and the econometric methods used to analyse such data. In addition, there is a strong focus on applying economics research to real-world policy problems. As economic historians Roger Backhouse and Beatrice Cherrier (2017: 7) have argued, the contours of the economics discipline have changed from one where “being a theorist was the most prestigious activity for an economist to engage in” to one “in which economists take pride in being applied, whether applied theorists or empirical economists who tackle problems of policy.” This profound shift has been the most prevalent in the field of development economics and has subsequently spilled over to related social sciences such as political science and public policy. 

There can be no argument that this shift has led to an explosion in new forms of knowledge. The myriad papers and research reports that use causal inference techniques have generated a great number of novel insights for both social scientists and policymakers. Yet, this development needs to be critically analysed. The credibility revolution and evidence-based policy has led to a situation where statistical data is automatically viewed as representing an unbiased and objective portrayal of socio-economic reality. This is a problematic and oftentimes incorrect point of view. Anthropologist Sally Merry (2011), in a provocative article titled “Measuring the World,” argues that while statistical indicators have a sense of scientific objectivity attached to them, they “typically conceal their political and theoretical origins and underlying theories of social change and activism.” Statistical analysis requires converting social, economic, and political phenomena into numbers. Merry’s argument essentially seeks to highlight that this conversion process involves assumptions that are inherently political and ideological. This applies to both field experiments (RCTs) as well as studies that use quasi-experimental methods and hence the results of such studies must be scrutinised more closely. 

Let us begin with RCTs, which have become the gold standard of empirical research in social science. Studies using RCTs have been critiqued on various grounds, often by fellow economists and researchers themselves. As Jean Drèze (2019) notes, RCTs assume a very technocratic and scientific approach to policy formulation: similar to a lab experiment, an RCT is designed to find a specific policy “fix” to socio-economic problems. However, policy decisions often involve questions of redistribution and power that are intrinsically political and do not lend themselves to solely technocratic solutions. RCTs have been criticised for ignoring these realities. Drèze (2019), for instance, argues that policy decisions ultimately involve value judgements “that no RCT, or for that matter no evidence, can settle on its own.” In a similar vein, Angus Deaton and Nancy Cartwright (2018: 10) state that “the widespread and largely uncritical belief that RCTs give the right answer permits them to be used as dispute-reconciliation mechanisms for resolving political conflicts.” 

In addition to RCTs, it is also necessary to analyse quasi-experimental methods, which rely on natural randomisation caused by external events, with a critical lens. There is an assumption that quasi-experimental methods, because they are less interventionist and involve large numbers of data points, have more applicability and accuracy than the results of RCT studies. However, this need not always be the case. A case in point is the influential study on labour regulation by Timothy Besley and Robin Burgess in 2004. Besley and Burgess, both faculty members at the London School of Economics and Politcs’s Department of Economics, published a paper that sought to understand the impact of labour regulation on output, employment, investment, and productivity. Using data from the National Sample Surveythe Annual Survey of Industries, and an instrumental variable methodology (one of the three frequently used quasi-experimental methods mentioned above), they find that “pro-worker labour regulation resulted in lower output, employment, investment and productivity in the formal manufacturing sector” (Besley and Burgess 2004: 92). The paper essentially argues that regulations that seek to protect the welfare of workers raise the cost of doing business for firms and employers, which then leads to poor economic outcomes in terms of employment, output, and investment. This is evident from the concluding paragraphs of the paper where the authors state the following: “Our finding that regulating in a pro-worker direction was associated with increases in urban poverty are particularly striking as they suggest that attempts to redress the balance of power between capital and labour can end of up hurting the poor.”       

It is instructive to carry out a deeper analysis of Besley and Burgess’s econometric methodology. As mentioned earlier, statistical analysis requires the translation of abstract concepts into some kind of numerical metric. In the case of Besley and Burgess (2004), the theoretical variable of “labour regulation” had to be converted into a measurable indicator to statistically analyse how it affects the other outcomes mentioned above. To do this, they create an index based on amendments to the Industrial Disputes (ID) Act, the key legislation that governs industrial relations in India. Labour laws can be amended by both Parliament and state assemblies as they belong to the Concurrent List of the Constitution. To build their numerical index of labour regulation, the two authors analyse individual state-level amendments made to the ID act and “code” them as “neutral, pro-worker or pro-employer” (Besley and Burgess 2004: 98). These are respectively coded as +1 (for a pro-worker amendment), 0 (for a neutral amendment), and −1 (for a pro-employer amendment). These scores are then aggregated over years “to give a quantitative picture of how the regulatory environment evolved over time” (Besley and Burgess 2004: 98). Once this numerical index of labour regulation is created, it is then regressed on other outcomes such as employment and output. A negative causal relationship is reported, which means that higher labour regulation (measured in the form of this index) has caused lower levels of employment and manufacturing output. 

The paper by Besley and Burgess has subsequently been heavily critiqued on methodological grounds. Scholars have pointed out that their regulatory index is constructed only using the Industrial Disputes act, while ignoring a whole set of other labour laws that comprise the regulatory landscape of labour in India (Storm 2019). Moreover, as Aditya Bhattacharjea’s (2006) comprehensive rebuttal to Besley and Burgess shows, their numerical index for labour regulation is based on various flawed assumptions. Having reviewed each of the legal amendments that Besley and Burgess analyse and code into their index, Bhattacharjea (2006:15) finds various instances of “inappropriate classification of individual amendments, summary coding of incommensurable changes as +1 or −1, and misleading cumulation over time.” The critique about “incommensurability” is an important one. Statistical analysis requires the construction of numerical indicators, where values of that indicator can be placed on a scale of increasing and decreasing order (such as an index measuring “higher and lower” labour regulation). Bhattacharjea argues that in this case, such an indicator cannot be created because various amendments to the ID Act are fundamentally different from each other and thus cannot be numerically compared to each other on a common scale. Given all these problems with their index, it seems quite clear that the empirical foundations for Besley and Burgess’ core claim that labour regulation causes poor economic outcomes is built on incredibly shaky ground. 

Further, the paper and its subsequent critiques reveal something more striking about the process of statistically analysing society. Merry’s argument, mentioned previously, is that statistical work on society presents a veneer of objectivity and scientific inquiry, despite being based on underlying ideological predilections about society itself. This is completely evident is the case of Besley and Burgess (2004). Their index of labour regulation, which is core to their statistical framework, is entirely based on value judgements and subjective opinions. They arbitrarily convert complex legal changes, all of which were carried out in specific social and political contexts over decades, into simple numerical forms (+1, 0, and −1) that entirely hide all of these nuances. Moreover, many of these value judgements are clearly linked to a neoclassical ideological framework in which the market mechanism for allocating returns to labour and capital is considered the most efficient, and any intervention by the state must be minimal. Yet, despite these strong judgements and ideological opinions, the statistical results of the paper are interpreted as objective and representing a true economic reality. The political values that remain foundational to the statistical analysis are forgotten. These arbitrary decisions, however, need to be taken into account to arrive at a more critical understanding of socio-economic statistics and the inferences that can be drawn from their analysis. 

Economists would defend themselves by arguing that Besley and Burgess (2004) represents an outlier and that subsequent research has comprehensively debunked their results about labour regulation. The paper has been followed by a whole range of studies that clearly show, through various statistical techniques, that greater labour regulation does not cause reduced employment and output (Bhattacharjea 2006; Karak and Basu 2019; Sood and Nath 2020). This is, however, not an acceptable position. Despite its fundamental flaws, the results of the paper continue to be cited in various influential policy publications. As recently as 2019, 15 years after Besley and Burgess’ questionable findings, the Government of India stated in its Economic Survey that restrictive labour laws caused firms to remain small and hire less workers, furthering the level of unemployment (Ministry of Finance 2019). Moreover, based on this flawed analysis, the central government and various state governments have gone on to make legislative amendments to labour laws, which include diluting a significant number of safeguards for workers (Ministry of Finance 2021). This policy of deregulation will have incredible material consequences for millions of India’s workers (Storm 2019). The policy life of Besley and Burgess (2004) is thus unconnected with its academic credentials. Although it was thoroughly critiqued, the paper precipitated a range of policies that can potentially hurt India’s working class. The real-world impacts of Besley and Burgess’ research thus go well beyond the confines of academia and therefore cannot be justified as simply a case of poorly conducted research.  

This entire saga is also illustrative to understand the politics of “evidence-based policy.” Even though the results of Besley and Burgess (2004) were strongly contested, decision makers went ahead and formulated policy on the basis of this highly ambiguous “evidence.” Notwithstanding their claim to objectivity, policymakers and research organisations that work with them (mostly staffed by individuals with PhDs in economics who most certainly had the technical expertise to understand the pitfalls in Besley and Burgress’ research), chose selective evidence that was convenient for them at a given point of time. How else does one explain the continued policy relevance of a flawed paper in the form of Besley and Burgess (2004), even though there exists equally (if not more) empirically rigorous evidence from heterodox economists showing that labour regulation is not associated with significantly lower employment and output.     

Conclusions

There cannot be any doubt that the credibility revolution in economics and broader social science has provided useful knowledge on a variety of socio-economic questions. The focus on high-quality statistical data and methodological rigour has had major implications for social scientific analysis. However, an incessant reliance on the objectivity of statistical information on socio-economic phenomena needs to be questioned. Statistical tools are necessary to understand society, but they are not the only legitimate forms of social knowledge. Non-statistical information in the form of qualitative interviews or ethnographic evidence can generate as much insight about the social, political, and economic realities as numerical indicators can. Numbers about society are ultimately connected to the structures of political economy and social stratification as well as ideological frameworks, that underly social life itself. These structures behind numbers must be taken into account when undertaking and interpreting statistical analysis of socio-economic events. 

Given these realities, the “evidence” that goes into “evidence-based policy” needs to be viewed with more caution than is the case at the present. As the Besley and Burgess example highlights, state officials and organisations working with them (international development agencies and private policy consultancies) are all operating within a hierarchy of power relations. In such a situation, particular forms of evidence that is critical of these hierarchies often fall by the wayside. Moreover, “evidence” cannot be restricted entirely to statistical data and analyses, particularly those that conform to a neoclassical understanding of political economy. Other forms of knowledge, based on differing ideological frameworks, need to enter public policy discourse. Only then can there be wider dialogues and policy be truly formulated on the basis of evidence that represents actual economic realities.

 

Vinayak Krishnan (vinayak1994@gmail.com) is a PhD research scholar at the University of Sussex.
20 October 2023