Rigorous Evidence Isn’t

written by Lant Pritchett

Currently, there are many statements floating around in development about the use of “rigorous evidence” in formulating policies and programs. Nearly all of these claims are fatuous. The problem is, rigorous evidence isn’t.

That is, suppose one generates some evidence about the impact of some programmatic or policy intervention in one particular context that is agreed by all to be “rigorous” because it meets methodological criteria for internal validity of its causal claims. But the instant this evidence is used in formulating policy it isn’t rigorous evidence any more.  Evidence would be “rigorous” about predicting the future impact of the adoption of a policy only if the conditions under which the policy was to be implemented were exactly the same in every relevant dimension as that under which the “rigorous” evidence was generated.  But that can never be so because neither economics—nor any other social science—have theoretically sound and empirically validated invariance laws that specify what “exactly the same” conditions would be.

So most uses of rigorous evidence aren’t.  Take, for instance, the justly famous 2007 JPE paper by Ben Olken on the impact of certain types of monitoring on certain types of corruption. According to Google Scholar as of today, this paper has been cited 637 times.  The question is, for how many of the uses of this “rigorous evidence” is it really “rigorous evidence”?  We (well, my assistant) sampled 50 of the citing papers with 57 unique mentions of Olken (2007).  Only 8 of those papers were about Indonesia (Of course even those 8 are only even arguably “rigorous” applications as they might be about different programs or different mechanisms or different contexts.)  47 of the 57 (82%) of the mentions are neither about Indonesia nor even an East Asia or Pacific country—they might be a review of the literature about corruption in general, about another country, or methodological.  We also tracked whether the words “context” or “external validity” appeared within +/- two paragraphs of the mention. In 34 of the 57 (60%) mentions, the evidence was not about Indonesia and did not mention that the results, while “rigorous” for the time, place and programmatic/policy context, have no claim to be rigorous about any other time, place, or programmatic/policy context.

Another justly famous paper, Angrist and Lavy (1999) in the QJE uses regression discontinuity to identify the impact of class size on student achievement in Israel.  This paper has been cited 1244 times.  I looked through the first 150 citations to this paper (which Google Scholar sorts by the number of times the citing paper has itself been cited) and (other than other papers by the authors) not one mentioned Israel  (not that surprisingly, as Israel is a tiny country) in the title or abstract while China, India, Bangladesh, Cambodia, Bolivia, UK, Wales, USA (various states and cities), Kenya and South Africa all figured.  Angrist and Lavy do not, and do not claim to, provide “rigorous” evidence about any of those contexts.

If one is formulating policies or programs for attacking corruption in highway procurement in Peru or reducing class size in secondary school in Thailand, it is impossible to base those policies on “rigorous evidence” as evidence that is rigorous for Indonesia or Israel isn’t rigorous for these other countries.

Now, some might make the argument that formulation of policies or programs in context X should rely exclusively/primarily/preferentially on evidence that is “rigorous” in context Z because at least we know that in context Z in which it was generated the evidence is internally valid.  This is both fatuous and false as a general proposition.

Fatuous in that no one understands the phrase “policy based on rigorous evidence” to mean “policy based on evidence that isn’t rigorous with respect to the actual policy context to which it is being applied (because there are no rigorous claims to external validity) but rather based on evidence that is rigorous in some other context.”  No one understands it that way because that isn’t rigorous evidence.

It is also false as a general proposition.  It is easy to construct plausible empirical examples in which the evidence suggests that the bias from internal validity is much (much) smaller than the bias from external validity as the contextual variation in “true” impact is much larger than the methodological bias from lack of “clean” causal identification of simple methods.  In these instances, better policy is made using “bad” (e.g. not internally valid) evidence from the same context than “rigorous” evidence from another context (e.g. Pritchett and Sandefur 2013).

Sadly perhaps, there is no shortcut around using judgment and wisdom in assessing all of the available evidence in formulating policies and programs.  Slogans like “rigorous evidence” are an abuse, not a use, of social science.

BSC Video 13: Scaling through the Diffusion of Practice

Innovations and adaptations that occur in one place often need to be scaled in order to lead to system-level change. However, in development, the road from small to big is challenging and rife with pitfalls. In this last of the first BSC video series, Michael Woolcock, discusses our approach to building state capability, which is built around having communities of practice around the world. These are initially small teams within organizations that have the authority, latitude, opportunity and resources to tackle problems they encounter by learning, iterating and adapting. You can watch the video below or on YouTube.

Stay tuned for more BSC videos coming soon.

BSC Video 12: Maintaining your Authorizing Environment

Maintaining your support through a change process is often a challenging task which requires time and effort. In this video, Matt Andrews, explains how one does not only have to maintain the initial authorization, but also expand the number of actors who provide authorization, thus increasing the legitimacy of the project or reform. You can watch the video below or on YouTube.

If you are interested in learning more, read Escaping Capability Traps through Problem Driven Iterative Adaptation (PDIA) and Limits of Institutional Reform.

BSC Video 11: Learn Iterate Adapt

Organizations have multiple objectives. In public organizations, the search for legitimacy often clashes with the search for functionality. This is mainly because rewards are geared around form and not function. In this video, Matt Andrews, talks about how you can get both legitimacy and functionality at the same time. You can watch the video below or on YouTube.

If you are interested in learning more, read It’s All About MeE: Using Structured Experiential Learning (‘e’) to Crawl the Design Space, Looking Like a State: Techniques of Persistent Failure in State Capability for Implementation and Escaping Capability Traps through Problem Driven Iterative Adaptation (PDIA).

BSC Video 10: Specifying the Design Space

The design space of actual development projects is complex, granular, and nuanced. In this video, Lant Pritchett, uses a simple example of a design space for teacher training to illustrate this point. You can watch the video below or on YouTube.

If you are interested in learning more, read It’s All About MeE: Using Structured Experiential Learning (‘e’) to Crawl the Design Space.

PDIA and Obamacare

written by Matt Andrews

Governments often face unenviable tasks that border on the impossible, given particularly thorny political and administrative complexities. Commentators typically deride governments when they fail in their initial attempts to address such tasks. They pen new laws that are less than many had hoped for, and call public agencies inefficient (or worse) when new roll-out mechanisms go slowly or fall apart altogether. Recent experience with the Affordable Care Act (ACA) in the United States is an obvious case in point. No one seems to have an appetite for the struggles government is enduring as it tries to implement this law. We want new websites that perform miracles the first time around, and insurance reforms that solve coverage problems without rocking too many boats. The more we see government muddling in ACA execution the more we criticize it and question the President’s leadership.

This criticism reflects a view on how governments should work that is common in the world of international development, where I do most of my work. Such view reflects a belief in what I call solution and leader driven change (sldc), which holds that policy and reform solutions will work ‘if they are well planned and implemented with strong leadership from the top’. When development initiatives run into trouble, in places as diverse as Argentina, Pakistan and Zimbabwe, sldc believers typically bemoan the lack of leadership and the uselessness of government. They seem to feel that a leader should be able to do all things when armed with a good solution. Any sign of muddling in the process of making or executing change is a sign of a bad solution, weak leadership, or a flailing administration. Success comes from having the right solution at the start and just executing it properly.

This view is extremely problematic. I say this so emphatically because I find exactly the opposite storyline in most of my research examining successful government policies and reforms. The experiences I look at are diverse, ranging from civil rights reform in the United States to growth policies in South Korea and decentralization in Rwanda. Even though these experiences vary a lot, they all involved policy changes that most commentators would call successful—manifesting in more equitable service access, improved economic performance, better public sector performance, and more. I find more commonalities across the cases as well, related to the way they emerged.

Primarily, evidence suggests that these successes seldom (if ever) came about through a clean process where a leader introduced a solution and just forced implementation by edict. Rather, change was spurred by the recognition (by a group of agents) that a problem existed that warranted change; but no one knew exactly what to do. Solutions emerged over time, through many iterative experiments that provided lessons about what could be done and allowed reformers to build support and capability to do more. I call such experimentation ‘purposive muddling’ and see it fitting into an overall process of problem driven iterative adaptation (pdia) that seems more likely to characterize successful change than solution and leader driven change.

I even see purposive muddling and pdia in the story of NASA’s successful lunar missions, which some media outlets portray as a solution and leader driven change initiative (where technical experts simply did what President Kennedy told them to). In fact, the mission involved many agents (and two presidents) and emerged over a number of years; through experimentation that often looked like it was delivering more failure than success. The experimentation looked like purposive muddling that often required more budget than had been provided and required creative administrative solutions that would probably be questioned today. It spawned sad deaths on the launch pad and the messy dismissal of a legendary administrator, but also ultimately led to a number of humans doing the impossible and stepping on the lunar surface.

I believe that governments are still capable of doing great (and impossible) things, and finding solutions to our most complex problems and challenges—like those evident in the health care domain. But they will never do this in a clean, solution driven process that many commentators seem to believe in. Complex policy changes and reforms like those associated with the Affordable Care Act demand messy processes of purposive muddling. These processes can deliver great results if there is space to learn and iterate (which I wonder about with health reform in the USA). We should be grateful whenever political and administrative leaders in government recognize this, and continue to muddle despite the derision their muddling attracts. The governments we should really deride are those that don’t muddle, because they are probably side stepping the complex and demanding problems their citizens face.

This blog post also appeared in the Washington Monthly.

BSC Video 9: Constructing Problems to Drive Change

Problems are key to driving change. In this video, Matt Andrews, uses two examples about HIV in Pakistan, to illustrate how constructing local problems using data can be used to mobilize stakeholders to search for solutions that ultimately drive change. You can watch the video below or on YouTube.

If you are interested in learning more, read Looking Like a State: Techniques of Persistent Failure in State Capability for Implementation and Escaping Capability Traps through Problem Driven Iterative Adaptation (PDIA).